Head Motion Invariance in Three-Dimensional Video Analysis System of Facial Movement

Barbara Koneczny; Nicole M. Artner; Eva Gyoeri; Igor Pona; Chieh-Han J. Tzou; Walter G. Kropatsch

doi:10.47739/1007

Head Motion Invariance in Three-Dimensional Video Analysis System of Facial Movement

Short Communication | Open Access | Volume 2 | Issue 1

Article DOI : https://doi.org/10.47739/1007

Barbara Koneczny Nicole M. Artner Eva Gyoeri Igor Pona Chieh-Han J. Tzou Walter G. Kropatsch

^1. Department of Pattern Recognition and Image Processing Group, Austria
^2. Department of Surgery, Medical University of Vienna, Austria

+ Show More - Show Less

Corresponding Authors

Barbara Koneczny, Department of Pattern Recognition and Image Processing Group, TU Wien, Austria

Abstract

In the analysis of facial movement an objective, quantitative analysis of the expression is desirable. This can be achieved with the help of 3D systems, like the proposed mirror setup of this paper. These systems analyze the 3D motion of interest points of the face (e.g. left and right corner of the mouth) and compare their trajectories to measure progress. Unfortunately, the motion of the interest points is overlaid by the global motion of the head of the probands. This paper proposes an accurate approach to make the facial movement analysis invariant to head motion. Three static markers are used to model the head motion. These static markers are placed above bony structures so that only motion arising from the head influences the position of the markers. The presented methodology can be used to introduce an object centered coordinate system, in order to distinguish between motion that arises from the head and motion that arises from the facial movements.

Keywords

Facial movement analysis; Multi-view; Head motion invariance; Three-dimensional (3D) video analysis; Facial palsy

Citation

Koneczny B, Artner NM, Gyoeri E, Pona I, Tzou CHJ, et al. (2017) Head Motion Invariance in Three-Dimensional Video Analysis System of Facial Movement. Comput Sci Eng 2(1): 1007.

ABBREVIATIONS

MUW: Medical University of Vienna; 3D: Three-Dimensional

INTRODUCTION

In the quantitative analysis of facial movement the need arises to objectively and quantitatively evaluate the quality of the facial movement in terms of symmetry and functionality. The main issue with widely used facial grading systems e.g. House-Brackman scale [1] is, that they are based on subjective measurements and observations [2,3], which are subject to investigator rating and provide rough estimates of facial function only. Inter subject and interobserver variability with in subjective grading systems has been reported [4]. The approach presented in this paper uses mirror-reflected, multi-view videos using the hardware setup (3D video analysis system) proposed by Frey et al. [5]. The main advantage of this mirror setup is that it introduces an inexpensive virtual multi camera setup. By using the data gained with this system 3D facial movement analysis system can be introduced.

The 3D video analysis system introduced by Frey et al. [5]. is currently used at the Medical University of Vienna (MUW) to objectively and quantitatively evaluate and document the preoperative status and postoperative recover course of facial palsy patients after facial reconstructive surgery [6-8]. With reconstructive microsurgeries the paralyzed face can be reanimated to restore facial function and achieve nearsymmetrical facial expressions. Template-based particle filter, color-based particle filter and optical strain cannot provide a sufficient accuracy in order to evaluate the quality of facial gestures [9,10]. There are also 3D surface imaging systems, which are able to record 4D data. However, tracking and analyzing of the facial movement is not possible using these systems [11]. Many other applications that use techniques of facial movement analysis track facial features and estimate their 3D position in order to create facial animations [12,13]. In order to create facial animations less accuracy is needed compared to the medical evaluation of facial palsy patients.

The approach presented in this paper is part of a new system that is designed to increase time efficiency and accuracy of the 3D video analysis system. The aim of the proposed methods is to separate the 3D head motion from facial movement. Knowing the head motion assumptions about the physical strain needed to reach the climax of a facial expression can be made.

This paper is organized as follows, in section “Materials and Methods” the hardware setup and methodology are described in more detail. The preliminary results are presented and discussed in section “Results and Discussion”. In section “Conclusion” a conclusion and outline of future work are presented.

MATERIALS AND METHODS

In this section the hardware setup and the applied methodology to calculate the 3D head position are described in more detail. In the proposed system static markers are positioned above bony structures so that the position of the static markers reflects the global head motion and can be distinguished from the facial muscle motion. The used hardware setup is described in section “Hardware setup”. The position of the static markers in the video sequences is detected and then their 3D position is estimated applying the method described in section“3D position estimation”. The experimental setup is described in more detail in section “Experimental setup”.

Hardware setup

The hardware setup consists of a mirror setup, a calibration grid and a commercial video camera. The multi-view video sequences are recorded using a mirror setup described by Frey et al. [5] (Figure 1).

Figure 1: The mirror setup is composed of two mirrors at an arbitrary angle α. The subject is positioned inside this setup. The points A, B and C are the static marker points in the central view. The points AL , BL and CL are the virtual marker points in the left mirror, AR, BR and CR the virtual marker points in the right mirror. The camera recording the scene is positioned on a tripod.

The angle between the two mirror planes is flexible. Depending on the angle the camera records three (frontal and two mirror-reflected views) or more views (frontal, mirror-reflected and multiple mirror-reflected views) simultaneously. The video sequences are recorded at 30 frames per second with a Nikon Cool pix photo camera with HD resolution (1920x1080).

The method to calibrate a camera proposed by Zhang [14] is utilized to calculate the intrinsic parameters of the camera. The intrinsic camera parameters are essential for the calibration of the video sequence and further calculations.

3D position estimation

Lin et al.[15], proposed a robust and inexpensive method to estimate the 3D position of marker points using multi-view video sequences. The 3D positions are used to create a realistic 3D facial animation. In the method proposed by Lin et al. [15], the mirror properties are taken into account, which decreases the computational effort. Furthermore, the angle between the mirrors can be chosen arbitrarily.

The 3D position of marker points are estimated according to Lin et al. [15], The location and orientation of the two mirror planes can be described by a plane equation

$ax+by+cz=d$ (1)

The unit normal of the plane is $u=\left ( a,b,c \right )$ . According to Lin et al. [15] the 3D position of an arbitrary point $m_{i}=(x_{mi},y_{mi},z_{mi})$ and the corresponding virtual 3D position $m{}'_{i}=(x_{mi},y_{mi},z_{mi})$ can be presented as follows

$m_{i}=\frac{z_{mi}}{f}.p_{i}$ (2)

$m{}'_{i}=\frac{z{}'_{mi}}{f}.p{}'_{i}$ (3)

with $p_{i}=\left ( x_{i},y_{pi},f \right )$ and $p{}'_{i}=\left ( x{}'_{i},y{}'_{pi},f \right )$ representing the projections onto the mirror plane. The focal length is given . mi f z aand ' mi z can be estimated by solving the system of linear equations given in (4).

$\begin{bmatrix} \left ( \frac{2a^{2}-1}{2f} \right )x_{pi}+\left ( \frac{ab}{f} \right )y_{pi}+ac &\frac{x{}'_{pi}}{2f} \\ \left ( \frac{ab}{f} \right )x_{pi}+\left ( \frac{2b^{2}-1}{2f} \right )y_{pi}+bc&\frac{y{_{pi}}'}{2f} \\ \left ( \frac{ac}{f} \right )x_{pi}+\left ( \frac{bc}{f} \right )y_{pi}+\left ( \frac{2c^{2}-1}{2} \right )& \frac{1}{2} \end{bmatrix}\begin{bmatrix} z_{mi}\\ z{}'{_{mi}} \end{bmatrix}=d\begin{bmatrix} a\\ b \\c \end{bmatrix}$ (4)

The location and orientation of the mirror planes is estimated using a calibration grid (Figure 2).

Figure 2: The corners of the calibration grid are used as marker points i p and ' i p (see the white stars) for the estimation of the location and orientation of the mirror planes. The corners marked with white stars are the marker points that are used for the estimation of the location and orientation of the mirror planes.

The corners of the checkerboard (which are marked with white stars in (Figure 2) in the frontal view are used as marker points pi , the corners of the checkerboard in the reflected view are used as marker points ' i p . For the estimation of the position and location of the mirror only one frame is necessary as the position and location of the mirror planes does not change through the video. Knowing the location and position of the mirror planes and the positions pi and ' i p of an and o arbitrary point in the frontal and reflected view the 3D position of this point can be estimated.

Experimental setup

The videos are recorded using a Nikon Cool pix photo camera. For the camera calibration a planar checkerboard is used. The camera calibration is performed before the estimation of the 3D positions takes place. The same camera settings are used for recording the checkerboard for camera calibration, the calibration grid and the video sequence with the simulated head motion. In order to provide a sufficient ground truth the head motion was simulated using an artificial head model. 3 static markers are attached to the artificial head model (Figure 3).

Figure 3: The head motion was simulated using an artificial head model. The static markers A, B and C were attached to the head model.

The head is rotated in 5° steps around the z-axis (yaw) counter-clockwise using a rotary plate (Figure 4A).

Figure 4: The experimental setup consisted of an artificial head model on a rotary plate, 3 static markers (A, B and C) and the mirror setup. P represents an observation point or facial landmark which is influenced by the rotation . The artifical head model was manually rotated in 5° steps in counter-clockwise direction. (B) The static markers are used to calculate the 3D head motion. The 3D positions of the markers are represented by A0 , B0 and C0 . The 3D position of the observation point is represented by P0 .

RESULTS AND DISCUSSION

By introducing static markers the 3D head motion through the video sequence can be modeled. In Figure (4B) the 3D head rotation of the artificial head model can be seen. As mentioned before an artificial head model was chosen in order to provide a sufficient ground truth. Table (1) shows the rotation angle in degree versus the estimated yaw rotation in degrees. The video sequence was composed of 43 frames. In (Table 1) the mean yaw of each rotation step is presented in (Figure 4B) all 43 measurements are displayed. It also shows the relative and percentual error.

Table 1: This table shows the simulated rotation angle in degree versus the mean estimated yaw rotation (of each rotation step) in degree. The rotation plate was rotated counter-clockwise direction in 5° steps. The 3D positions of static markers were used to calculate the yaw rotation.

angle [°]	yaw [°]	relative error [°]	percentual error [%]
-30.000	-29.471	-0.018	-1.764
-25.000	-24.673	-0.013	-1.308
-20.000	-19.892	-0.005	-0.539
-15.000	-15.593	-0.040	-3.955
-10.000	-10.377	-0.038	-3.772
-5.000	-5.243	-0.049	-4.868
0.000	0.000	0.000	0.000
5.000	5.131	0.026	2.616
10.000	10.440	0.044	4.403
15.000	15.121	0.008	0.804
20.000	19.309	0.035	3.454
10.000	10.440	0.044	4.403
25.000	24.433	0.023	2.267
30.000	29.178	0.027	2.741

It can be seen in (Table 1) the calculated yaw rotation in counter clockwise direction increases in 5° steps. The slight differences in the corresponding positive and negative angles arise from a systemic error. The systemic error arises from the manual handling of the rotatory plate. In Figure (4B) can be seen that the estimated 3D positions of A and B describe and circular path, dwhich arises from the rotation motion. However, the 3D positions of C are located at one point. Therefore static marker C and is located above the center of rotation and since no translation occurred it is not affected by the rotation motion. The rotation was performed between 30° and -30° since only a small portion of the static markers A and B depending on and (dethe negative or positive rotation angle) was visible in the frontal view at a higher rotation angle.

Figure (4) also provides an example of an arbitrary observation point or facial landmark represented by P. In this example the observation point is only influenced by the global head motion. By introducing an object centered coordinate system the position of observation point can be given in relation to the static markers. Therefore the position of the observation point is independent from the global head motion.

CONCLUSION

In this paper an approach is proposed that allows estimating the global head motion through the whole video.

The advantage of this system is that the angle between the mirror planes can be arbitrarily chosen. Therefore no time consuming calibration of the mirror setup is needed, in comparison to the currently used system at the MUW. The angle between the two mirror planes should be calibrated for every video recording in order to provide a reliable result. With the static markers an object centered coordinate system can be introduced. This object centered coordinate system can be used to separate the motion that arises from the head from the motion that arises from the facial muscle tonus. This would, for example, allow a more detailed analysis of the progress after a recovery surgery since the physical strain to reach the climax and the quality of performing facial gestures can be evaluated separately. By using this system the arbitrary observation points or facial landmarks can be tracked (manual or by using an automatic or semi-automatic tracking algorithm) without the influence of the global head motion. Therefore facial movement can be analyzed without the need to keep the head motion at a minimum, this facilitates a more natural and intuitive facial expressions. It is also possible to define a custom region of interest that can be tracked through the video sequence. The results gained by this method combined with the tracking of facial landmarks while performing facial expression can be used as input parameters for facial grading systems (e.g. House-Brackman scale or Sunnybrook scale). By using this system the quality of facial movements can be evaluated objectively in comparison to the subjective measurements of the House-Brackman Scale.

JSM Computer Science and Engineering

JSM Computer Science and Engineering