Complex Derivatives, Wirtinger View and the Chain Rule

Two days ago in Julia Lab, Jarrett, Spencer, Alan and I discussed the best ways of expressing derivatives for automatic differentiation in complex-valued programs. Having inspired from this discussion, I want to share my understanding of the subject and eventually present a chain rule for complex derivatives.

Derivative

$\mathbb{R}$ealistic view: derivative is a real number that tells you how fast a value changes with respect to a variable.

$D = \frac{dy}{dx}$ Derivatives are very useful! Namely, if you know the derivative of $y$ with respect to $x$, you can write:

\[dy = Ddx\]

This means that one can calculate the change in $y$ with respect to a small change in $x$.

Derivatives of a Function: Jacobian

$\mathbb{R}$ealistic view: derivatives of a function are a collection of real numbers that tell you how fast the outputs of the function change with respect to its inputs.

Let $f: R^n \mapsto R^m$, then one can define an $n \times m$ real number matrix called Jacobian.

Example $f(x,y): R^2 \mapsto R^2$:

\[J=\begin{bmatrix}\frac{\partial f_1}{dx} & \frac{\partial f_1}{dy} \\ \frac{\partial f_2}{dx} & \frac{\partial f_2}{dy} \end{bmatrix}\]

Jacobians are very useful! If you know the Jacobian of a function, then you can calculate the change in the function given a small change in any of its inputs.

\[\begin{bmatrix}df_1 \\ df_2 \end{bmatrix} = J\begin{bmatrix}dx \\ dy \end{bmatrix}\]

Derivatives of Complex Function: Jacobian

A complex number $x+iy$ has two parts: real and imaginary. Then, for a complex-valued function we can consider the real and imaginary parts as separate both in input and output.

$\mathbb{R}$ealistic point of view: $f(z): \mathbb{C} \mapsto \mathbb{C}$ can be expressed as $f(z_{Re},z_{Im}): R^2 \mapsto R^2$

Therefore,

\[J=\begin{bmatrix}\frac{\partial f_{Re}}{dz_{Re}} & \frac{\partial f_{Re}}{dz_{Im}} \\ \frac{\partial f_{Im}}{dz_{Re}} & \frac{\partial f_{Im}}{dz_{Im}} \end{bmatrix}\]

So again: Every entry in the Jacobian matrix gives the change in the function when the corresponding input changes by a small amount.

\[\begin{bmatrix}df_{Re} \\ df_{Im} \end{bmatrix} = J\begin{bmatrix}dz_{Re} \\ dz_{Im} \end{bmatrix}\]

A Native View for Complex Functions: Wirtinger

Although the $\mathbb{R}$ealistic view is easy to grasp, a native view for complex functions could be easier to express for AD.

Previous view of Jacobian for complex functions was $R^{2n} \times R^{2m}$ in general case. It can simply be expressed with $\mathbb{C}^{n} \times \mathbb{C}^{2m}$ matrix on the $\mathbb{C}$ field. So, this is where Wirtinger view comes to play.

Instead of using $\frac{\partial f}{dz_{Re}}$ and $\frac{\partial f}{dz_{Im}}$ derivatives, we will use the derivates below:

\[\frac{\partial f}{\partial z} = \frac{1}{2} \left(\frac{\partial f}{\partial z_{Re}} - i \frac{\partial f}{\partial z_{Im}} \right)\] \[\frac{\partial f}{\partial \bar z} = \frac{1}{2} \left(\frac{\partial f}{\partial z_{Re}} + i\frac{\partial f}{\partial z_{Im}} \right)\]

Let $f: \mathbb{C} \mapsto \mathbb{C}$ and consider it as a function $z$ and $\bar z$. With the above derivatives we can express J as:

\[J = \begin{bmatrix}\frac{\partial f}{\partial z} & \frac{\partial f}{\partial \bar z}\end{bmatrix}\]

Let’s see whether this version of Jacobian matrix gives the changes in the function with respect to the changes in its inputs as it normally does.

\[df=J\begin{bmatrix}dz \\ d\bar z \end{bmatrix}\]

Here $dz = dz_{Re}+idz_{Im}$, $d\bar z = dz_{Re}-idz_{Im}$ are infinitesimal changes that we made to input and $df=df_{Re}+idf_{Im}$ are corresponding change in the output.

Once we plug in $J$ into the above equation, we achieve the total differential equation in terms of $\frac{\partial}{dz}$ and $\frac{\partial}{d \bar z}$ operators, you will get the total derivative equation:

\[df=\frac{\partial f}{\partial z}dz+\frac{\partial f}{\partial \bar z}d \bar z\]

If we also plug in $\frac{\partial f}{dz_{Re}}$ and $\frac{\partial f}{dz_{Im}}$ derivatives, we obtain the correct total differential equation with real derivative operators:

\[df=\frac{\partial f}{\partial z_{Re}}dz_{Re}+\frac{\partial f}{\partial z_{Im}}dz_{Im}\]

So, we showed that the Jacobian equation that we write for Wirtinger’s is indeed correct!

Summary: If we think of the complex function $f(z_1,z_2,...)$ as $f(z_1,\bar z_1,z_2\bar z_2,...)$, the equations for Wirtinger derivatives are exactly the same with the one we know from calculus of real functions.

Chain Rule for Wirtinger Derivatives

Given, $f: \mathbb{C} \mapsto \mathbb{C}$ and $g: \mathbb{C} \mapsto \mathbb{C}$, we would like to obtain identities for $\frac{\partial (f \circ g)}{dz}$ and $\frac{\partial (f \circ g)}{d\bar z}$.

Let’s write the total differential for $g(z)$:

\[dg=\frac{\partial g}{\partial z}dz+\frac{\partial g}{\partial \bar z}d \bar z\]

Then the total differential for $\bar g(z)$:

\[d \bar g=\frac{\partial \bar g}{\partial z}dz+\frac{\partial \bar g}{\partial \bar z}d \bar z\]

Let’s write the total differential for $f(g)$:

\[d(f \circ g)=\frac{\partial f}{\partial g}dg+\frac{\partial f}{\partial \bar g}d \bar g\]

Put $dg$ and $d\bar g$ in to the equation:

\[d(f \circ g)=(\frac{\partial f}{dg} \frac{\partial g}{dz}+\frac{\partial f}{d \bar g}\frac{\partial \bar g}{d z})dz+(\frac{\partial f}{dg} \frac{\partial g}{d \bar z}+\frac{\partial f}{d \bar g}\frac{\partial \bar g}{d \bar z})d\bar z\]

So, these are the chain rules and they are exactly same with the one we know for real functions! (as thinking $f(g,\bar g), g(z,\bar z)$ are real multi-variable functions)

\[\frac{\partial (f \circ g)}{dz}=\frac{\partial f}{dg} \frac{\partial g}{dz} + \frac{\partial f}{d \bar g}\frac{\partial \bar g}{d z}\] \[\frac{\partial (f \circ g)}{d\bar z}=\frac{\partial f}{dg} \frac{\partial g}{d\bar z} + \frac{\partial f}{d \bar g}\frac{\partial \bar g}{d\bar z}\]

Wirtinger Derivatives are Helpful

Instead of calculating derivatives in the standard $Im$ and $Re$ directions, we somehow calculate them in $\hat z$ and $\hat {\bar z}$ directions. This view will simplify your thinking in many things

Ex-I: Holomorphic Functions

For holomorphic functions $\frac{\partial f}{\partial \bar z}=0$. Rougly speaking, this means that function has no different dependencies to $z_{Re}$ and ${z_{Im}}$, it is more related with $z$ as a whole. Such functions are $f(z) = 2z, f(z) = exp(z),…$

Ex-II: Non-Holomorphic Functions

Wirtinger formulation can also be helpful in known non-holomorphic functions. Such as:

\[f(z) = |z|^2= z \bar z\]

As you may guess, Wirtinger derivatives are:

\[\frac{\partial f}{\partial z}= \bar z\] \[\frac{\partial f}{\partial \bar z}=z\]

Conclusion

So the expressiveness of Wirtinger derivatives are wonderful. Every equation with Wirtinger derivatives becomes same equations that you learned in real calculus. We will see that whether it is also helpful for AD purposes…

References

Wikipedia contributors. “Wirtinger derivatives.” Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 8 Jan. 2019. Web. 28 Jan. 2019.

Appendix

Appendix I

Just to clarify the meaning of derivative of complex-valued function w.r.t a real variable:

\[\frac{\partial f}{\partial z_{Re}} =\frac{\partial f_{Re}}{dz_{Re}}+i \frac{\partial f_{Im}}{dz_{Re}}\]

Appendix II

Rigorous derivation of chain rule in the $\mathbb{R}$ealistic view:

\[\begin{aligned} \frac{\partial(f \circ g)}{d z} &=\frac{1}{2}\left(\frac{\partial(f \circ g)}{d z_{R e}}-i \frac{\partial(f \circ g)}{d z_{\text {Im }}}\right) \\ &=\frac{1}{2}\left(\left(\frac{\partial f}{d g_{R e}} \frac{\partial g_{R e}}{d z_{R e}}+\frac{\partial f}{d g_{\text {Im }}} \frac{\partial g_{\text {Im }}}{d z_{R e}}\right)-i\left(\frac{\partial f}{d g_{R e}} \frac{\partial g_{R e}}{d z_{\text {Im }}}+\frac{\partial f}{d g_{\text {Im }}} \frac{\partial g_{\text {Im }}}{d z_{\text {Im }}}\right)\right) \\ &=\frac{1}{4}\left(\left(\frac{\partial f}{d g_{R e}} \frac{\partial(g+\bar{g})}{d z_{R e}}-i \frac{\partial f}{d g_{\text {Im }}} \frac{\partial(g-\bar{g})}{d z_{R e}}\right)-\left(i \frac{\partial f}{d g_{R e}} \frac{\partial(g+\bar{g})}{d z_{\text {Im }}}+\frac{\partial f}{d g_{\text {Im }}} \frac{\partial(g-\bar{g})}{d z_{\text {Im }}}\right)\right) \\ &=\frac{1}{4}\left(\frac{\partial f}{d g_{R e}}\left(\frac{\partial(g+\bar{g})}{d z_{R e}}-i \frac{\partial(g+\bar{g})}{d z_{\text {Im }}}\right)-\frac{\partial f}{d g_{\text {Im }}}\left(i \frac{\partial(g-\bar{g})}{d z_{R e}}+\frac{\partial(g-\bar{g})}{d z_{\text {Im }}}\right)\right) \\ &=\frac{1}{2}\left(\frac{\partial f}{d g_{R e}} \frac{\partial(g+\bar{g})}{d z}-i \frac{\partial f}{d g_{\text {Im }}} \frac{\partial(g-\bar{g})}{d \bar{z}}\right) \\ &=\frac{1}{2}\left(\left(\frac{\partial f}{d g}+\frac{\partial f}{d \bar{g}}\right) \frac{\partial(g+\bar{g})}{d z}+\left(\frac{\partial f}{d g}-\frac{\partial f}{d \bar{g}}\right) \frac{\partial(g-\bar{g})}{d z}\right) \\ &=\frac{\partial f}{d g} \frac{\partial g}{d z}+\frac{\partial f}{d \bar{g}} \frac{\partial \bar{g}}{d z} \end{aligned}\]