[超譯] pyimagesearch 的 人臉識別

 # [超譯] pyimagesearch 的 人臉識別



## 前言


* opencv

* python

* deep learning


## 用 opencv, python, deep learning 做人臉識別

在這個教程裡,你們會學到如何用 opencv, python, deep learning 做人臉識別。

我們會簡單地討論深度學習式的人臉識別如何運作,包含 "deep metric learning" 的概念。




## 了解深度學習式人臉識別


秘密是 "deep metric learning" 的技術。


* 接受一個影像

* 輸出一個 分類/標籤 給那個影像

然而,deep metric learning 不一樣。

deep metric learning 會輸出一個實數的特徵向量。

dlib 這個臉部識別網路,會輸出 128-d 的特徵向量(也就是一串數字有 128 個),該特徵向量就是用來數量化臉部特徵。訓練這個網路使用名叫 triplets 的方式來達成:

* 找三張照片,A人有兩張,

* B人有一張,調整權重讓 B人之間的兩張照片的特徵向量比較近,A人與 B人之間的特徵向量比較遠。

套用到實際例子,有三張照片,一張是 Chad Smith,兩張是 Will Ferrell。

我們的網路會數量化這些臉,為每個臉建立出 128-d 的特徵向量( embedding、quantification)

接下來,一般的想法是調整我們神經網路的權重,讓兩張 Will Ferrell 比較靠近,與 Chad Smith 比較遠。

我們的人臉識別的網路架構是取自 ResNet-34,來自 [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) 作者是 He et al.,但是層數比較少,filter 也減半。

網路是由 [Davis King](https://www.pyimagesearch.com/2017/03/13/an-interview-with-davis-king-creator-of-the-dlib-toolkit/) 所訓練,他的資料集約有 3百萬張影像,在 [Labeled Faces in the Wild](http://vis-www.cs.umass.edu/lfw/) 相較於其他現代手法有達到 99.38% 的準確度。

Davis King ([dlib](http://dlib.net/)作者) 與 [Adam Geitgey](https://www.adamgeitgey.com/) ([face_recognition](https://github.com/ageitgey/face_recognition)作者,此模組我們待會會用到) 兩人有詳細文章說明深度學習式的人臉識別的作法。

* [High Quality Face Recognition with Deep Metric Learning](http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html) (Davis)

* [Modern Face Recognition with Deep Learning](https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78) (Adam)


## 安裝人臉識別函式庫

除了 python 與 opencv 之外,還需要兩個函式庫

* [dlib](http://dlib.net/)

* [face_recognition](https://github.com/ageitgey/face_recognition)

dlib 由 [Davis King](https://www.pyimagesearch.com/2017/03/13/an-interview-with-davis-king-creator-of-the-dlib-toolkit/) 維護,包含我們人臉識別工作所需要的 "deep metric learning" 的實作。

face_recognition 由 [Adam Geitgey](https://www.adamgeitgey.com/) 所創,包裝了 dlib 的人臉識別的功能,讓它更方便使用。

我假設你已經裝了 opencv,如果沒有,我的文章 [OpenCV install tutorials](https://www.pyimagesearch.com/opencv-tutorials-resources-guides/) 有介紹。

接下來,來安裝 dlib 與 face_recognition 吧。

> 原文作者非常建議使用 `virtualenv` 加上 `virtualenvwrapper`,以免有 package 污染的問題。

### 安裝 dlib

> 有可能需要安裝 cmake,這個也可以用 `pip install cmake` 安裝

> 現在新的安裝包會自動看環境內有沒有足夠的函式庫,若有就會自己編譯成支援 GPU 的版本。

> Nvidia GPU 需要的有 CUDA Development Tools 與 cuDNN Library(這個要註冊 nvidia 開發者帳號,只要 email 即可申請)

使用 pip 安裝

`pip install dlib`

結束 (時代進步真方便)

### 安裝 face_recognition

使用 pip 安裝

`pip install face_recogntition`


### 安裝 imutils

[imutils](https://github.com/jrosebr1/imutils)這個是方便包,一些 opencv 的組合招式都打包成函式供人取用,原文作者推薦。

使用 pip 安裝

`pip install imutils`

## 我們的人臉識別資料集

因為 Jurassic Park (1993) 是我最喜愛的電影,為了致敬 Jurassic World: Fallen Kingdom (2018) 在美國上映,我們將人臉識別用在這電影的幾個角色上:

* Alan Grant

* Claire Dearing

* Ellie Sattler

* Ian Malcolm

* John Hammond

* Owen Grady

資料集可以在 30 分鐘內使用我的方法建構。參閱 [How to (quickly) build a deep learning image dataset](https://pyimagesearch.com/2018/04/09/how-to-quickly-build-a-deep-learning-image-dataset/)。


* 建立每個臉的 128-d 特徵向量

* 用這些特徵向量從靜態影像與動態影像中識別出角色們的臉

## 人臉識別專案架構



├── dataset

│   ├── alan_grant [22 entries]

│   ├── claire_dearing [53 entries]

│   ├── ellie_sattler [31 entries]

│   ├── ian_malcolm [41 entries]

│   ├── john_hammond [36 entries]

│   └── owen_grady [35 entries]

├── examples

│   ├── example_01.png

│   ├── example_02.png

│   └── example_03.png

├── output

│   └── lunch_scene_output.avi

├── videos

│   └── lunch_scene.mp4

├── search_bing_api.py

├── encode_faces.py

├── recognize_faces_image.py

├── recognize_faces_video.py

├── recognize_faces_video_file.py

└── encodings.pickle


我們專案有 4 個上層目錄:

* dataset/: 包含六個角色的臉的影像,依據名字放置

* examples/: 三個人臉影像,不在 dataset 裡,用來測試。

* output/: 這裡會存放處理後的人臉識別的動態影像

* videos/: 輸入動態影像會放在這裡。


* search_bing_api.py: 第一步是建立 dataset,(原文作者已經寫好程式,直接執行即可)。要學如何使用 Bing API 建立資料集,參閱:[這貼文](https://pyimagesearch.com/2018/04/09/how-to-quickly-build-a-deep-learning-image-dataset/)

* encode_faces.py:用來將人臉編碼成特徵向量。

* recognize_faces_image.py:識別靜態影像中的人臉(依據你的資料集的人臉特徵向量)。

* recognize_faces_video.py:識別來自 webcam 的動態影像中的人臉,並輸出成動態影像。

* recognize_faces_video_file.py:識別來自硬碟的動態影像中的人臉,並輸出成動態影像。但今天不會討論這個,因為其骨架跟 video stream file 一樣。

* encodings.pickle:人臉識別編碼,由 encode_faces.py 處理你的資料集後產生,並序列化到硬碟之中。

在建立完資料集後,我們會使用 encode_faces.py 建立特徵向量。

## 使用 opencv 與 深度學習 建立人臉特徵向量

在我們識別人臉之前,我們首先需要將人臉編碼。這裡並沒有真的訓練識別的網路,而是使用 dlib 已經訓練好的模型。


然後,在分類時,我們可以使用簡單 k-NN 模型加上投票的方式做出人臉分類。其他傳統機器學習模型也有這樣用。

### 建立臉部特徵模型,使用 encode_faces.py。


# import the necessary packages

from imutils import paths

import face_recognition

import argparse

import pickle

import cv2

import os

# construct the argument parser and parse the arguments

ap = argparse.ArgumentParser()

ap.add_argument("-i", "--dataset", required=True,

help="path to input directory of faces + images")

ap.add_argument("-e", "--encodings", required=True,

help="path to serialized db of facial encodings")

ap.add_argument("-d", "--detection-method", type=str, default="cnn",

help="face detection model to use: either `hog` or `cnn`")

args = vars(ap.parse_args())

# grab the paths to the input images in our dataset

print("[INFO] quantifying faces...")

imagePaths = list(paths.list_images(args["dataset"]))

# initialize the list of known encodings and known names

knownEncodings = []

knownNames = []

# loop over the image paths

for (i, imagePath) in enumerate(imagePaths):

# extract the person name from the image path

print("[INFO] processing image {}/{}".format(i + 1,


name = imagePath.split(os.path.sep)[-2]

# load the input image and convert it from BGR (OpenCV ordering)

# to dlib ordering (RGB)

image = cv2.imread(imagePath)

rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    # detect the (x, y)-coordinates of the bounding boxes

# corresponding to each face in the input image

boxes = face_recognition.face_locations(rgb,


# compute the facial embedding for the face

encodings = face_recognition.face_encodings(rgb, boxes)

# loop over the encodings

for encoding in encodings:

# add each encoding + name to our set of known names and

# encodings



# dump the facial encodings + names to disk

print("[INFO] serializing encodings...")

data = {"encodings": knownEncodings, "names": knownNames}

f = open(args["encodings"], "wb")




### 從靜態影像中識別出角色



# import the necessary packages

import face_recognition

import argparse

import pickle

import cv2

# construct the argument parser and parse the arguments

ap = argparse.ArgumentParser()

ap.add_argument("-e", "--encodings", required=True,

help="path to serialized db of facial encodings")

ap.add_argument("-i", "--image", required=True,

help="path to input image")

ap.add_argument("-d", "--detection-method", type=str, default="cnn",

help="face detection model to use: either `hog` or `cnn`")

args = vars(ap.parse_args())

# load the known faces and embeddings

print("[INFO] loading encodings...")

data = pickle.loads(open(args["encodings"], "rb").read())

# load the input image and convert it from BGR to RGB

image = cv2.imread(args["image"])

rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# detect the (x, y)-coordinates of the bounding boxes corresponding

# to each face in the input image, then compute the facial embeddings

# for each face

print("[INFO] recognizing faces...")

boxes = face_recognition.face_locations(rgb,


encodings = face_recognition.face_encodings(rgb, boxes)

# initialize the list of names for each face detected

names = []

# loop over the facial embeddings

for encoding in encodings:

# attempt to match each face in the input image to our known

# encodings

matches = face_recognition.compare_faces(data["encodings"],


name = "Unknown"

    # check to see if we have found a match

if True in matches:

# find the indexes of all matched faces then initialize a

# dictionary to count the total number of times each face

# was matched

matchedIdxs = [i for (i, b) in enumerate(matches) if b]

counts = {}

# loop over the matched indexes and maintain a count for

# each recognized face face

for i in matchedIdxs:

name = data["names"][i]

counts[name] = counts.get(name, 0) + 1

# determine the recognized face with the largest number of

# votes (note: in the event of an unlikely tie Python will

# select first entry in the dictionary)

name = max(counts, key=counts.get)

# update the list of names


# loop over the recognized faces

for ((top, right, bottom, left), name) in zip(boxes, names):

# draw the predicted face name on the image

cv2.rectangle(image, (left, top), (right, bottom), (0, 255, 0), 2)

y = top - 15 if top - 15 > 15 else top + 15

cv2.putText(image, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX,

0.75, (0, 255, 0), 2)

# show the output image

cv2.imshow("Image", image)



### 從 webcam 識別出角色



# import the necessary packages

from imutils.video import VideoStream

import face_recognition

import argparse

import imutils

import pickle

import time

import cv2

# construct the argument parser and parse the arguments

ap = argparse.ArgumentParser()

ap.add_argument("-e", "--encodings", required=True,

help="path to serialized db of facial encodings")

ap.add_argument("-o", "--output", type=str,

help="path to output video")

ap.add_argument("-y", "--display", type=int, default=1,

help="whether or not to display output frame to screen")

ap.add_argument("-d", "--detection-method", type=str, default="cnn",

help="face detection model to use: either `hog` or `cnn`")

args = vars(ap.parse_args())

# load the known faces and embeddings

print("[INFO] loading encodings...")

data = pickle.loads(open(args["encodings"], "rb").read())

# initialize the video stream and pointer to output video file, then

# allow the camera sensor to warm up

print("[INFO] starting video stream...")

vs = VideoStream(src=0).start()

writer = None


# loop over frames from the video file stream

while True:

# grab the frame from the threaded video stream

frame = vs.read()

# convert the input frame from BGR to RGB then resize it to have

# a width of 750px (to speedup processing)

rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

rgb = imutils.resize(frame, width=750)

r = frame.shape[1] / float(rgb.shape[1])

# detect the (x, y)-coordinates of the bounding boxes

# corresponding to each face in the input frame, then compute

# the facial embeddings for each face

boxes = face_recognition.face_locations(rgb,


encodings = face_recognition.face_encodings(rgb, boxes)

names = []

    # loop over the facial embeddings

for encoding in encodings:

# attempt to match each face in the input image to our known

# encodings

matches = face_recognition.compare_faces(data["encodings"],


name = "Unknown"

# check to see if we have found a match

if True in matches:

# find the indexes of all matched faces then initialize a

# dictionary to count the total number of times each face

# was matched

matchedIdxs = [i for (i, b) in enumerate(matches) if b]

counts = {}

# loop over the matched indexes and maintain a count for

# each recognized face face

for i in matchedIdxs:

name = data["names"][i]

counts[name] = counts.get(name, 0) + 1

# determine the recognized face with the largest number

# of votes (note: in the event of an unlikely tie Python

# will select first entry in the dictionary)

name = max(counts, key=counts.get)

# update the list of names


    # loop over the recognized faces

for ((top, right, bottom, left), name) in zip(boxes, names):

# rescale the face coordinates

top = int(top * r)

right = int(right * r)

bottom = int(bottom * r)

left = int(left * r)

# draw the predicted face name on the image

cv2.rectangle(frame, (left, top), (right, bottom),

(0, 255, 0), 2)

y = top - 15 if top - 15 > 15 else top + 15

cv2.putText(frame, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX,

0.75, (0, 255, 0), 2)

    # if the video writer is None *AND* we are supposed to write

# the output video to disk initialize the writer

if writer is None and args["output"] is not None:

fourcc = cv2.VideoWriter_fourcc(*"MJPG")

writer = cv2.VideoWriter(args["output"], fourcc, 20,

(frame.shape[1], frame.shape[0]), True)

# if the writer is not None, write the frame with recognized

# faces to disk

if writer is not None:


    # check to see if we are supposed to display the output frame to

# the screen

if args["display"] > 0:

cv2.imshow("Frame", frame)

key = cv2.waitKey(1) & 0xFF

# if the `q` key was pressed, break from the loop

if key == ord("q"):


# do a bit of cleanup



# check to see if the video writer point needs to be released

if writer is not None:



### 從影像檔中識別出角色

先前提過,recognize_faces_video_file.py 基本上跟前一個程式一模一樣,差別只在影像來源是影像檔而不是 webcam。

## 能否在樹莓派執行這些程式?


1. 樹莓派記憶體不夠使用 CNN-based 臉部偵測

2. 所以只能用 HOG 臉部偵測

3. HOG 在樹莓派上太慢,無法勝任即時臉部偵測

4. 所以需要使用 opencv haar cascades

(譯註:我的電腦 16G 也沒辦法做 CNN-based 臉部偵測)

在樹莓派上的速度約是 1-2 FPS。好消息時之後我會回來討論如何在樹莓派上執行這些程式,敬請期待。

## 結論

在這教程,你們學到了如何使用 opencv, python, deep learning 執行人臉識別。

我們利用了 Davis King 的 dlib 與 Adam Geitgey 的 face_recognition,讓實作更方便。

我們也看到,這裡提出的程式在準確度與有 GPU 的情況下即時運算的能力皆有達到水準。


