normalized cross-correlation (Trac #1714) #2310

numpy-gitbot · 2012-10-19T22:21:50Z

Original ticket http://projects.scipy.org/numpy/ticket/1714 on 2011-01-15 by trac user bubla, assigned to unknown.

Hello,
I have made a patch that extends the correlate function so it can compute normalized cross-correlation now
See [http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation the Wikipedia article].

I have added documentation and simple doctest too, of course.
The patch is against the latest master Git

The text was updated successfully, but these errors were encountered:

numpy-gitbot · 2012-10-19T22:21:51Z

Attachment added by trac user bubla on 2011-01-15: correlate.parch

numpy-gitbot · 2012-10-19T22:21:51Z

@rgommers wrote on 2011-03-29

Since this is new functionality, you should ask on the mailing list whether this can be included.

A unit test is necessary, doctests are not executed by default in the test suite.

charris · 2014-02-20T15:43:48Z

This could be useful.

plurch · 2014-04-06T04:47:15Z

Can someone verify if this is correct?

seberg · 2014-04-06T09:08:51Z

I am wondering about what is right in combination with mode='full'.

jamessynge · 2014-06-19T12:16:38Z

I happen to need this feature, but am not sure that it is correct except when mode='valid' AND both arrays are the same size. In other words, I think the normalization has to be applied to each window.

VlamV · 2014-10-20T15:21:17Z

any updates on the matter?
Has the patch been tested for all correlate modes?

seberg · 2014-10-20T15:35:04Z

@VlamV, as there is nothing here, no, I don't believe so. If you have time to fix it for the other modes, I think we could add it.

hadim · 2015-02-02T16:09:52Z

Any news on this ?

wilberth · 2015-10-01T09:41:47Z

would it not make more sense to divide by the standard deviation of the portion of a that is actually used (in case of mode=same or mode=full)

also would it not make sense to use the bias and ddof options of numpy.corrcoef instead of simply dividing by len(a)

ElpyDE · 2017-07-10T22:16:30Z

There is an answer at SO that points here: https://stackoverflow.com/a/5639626/5392420

It also has the most relevant code included. As the attachment above is no longer available (at least to me), I thought it could be helpful to have this as a start for implementation.

I can also confirm that - at least in my case - it works to "normalize" the input vectors before using np.correlate like this and reasonable values will be returned within a range of [-1,1]:

a = (a - np.mean(a)) / (np.std(a) * len(a))
b = (b - np.mean(b)) / (np.std(b))
c = np.correlate(a, b, 'full')

trichter · 2019-03-26T11:43:01Z

The normalization by the portion of signal a that is actually used (see @wilberth's comment) is implemented with numpy methods in the correlate_template function in this repository: https://github.com/trichter/xcorr

yy-yuichi · 2021-05-29T09:25:55Z

Hi all, I'm a beginner of OSS, but maybe I have a comment about this open issue. Because the link is dead, I can't check the implementation progress of the normalized cross-correlation function. The normalized option is useful, and I believe this functionality should be added.
As @ElpyDE commented,the normalized cross-correlation function $C_\mathrm{norm}^\prime$ is calculated as,
$\begin{align*} C_{\mathrm{norm}}^\prime (j) =& \frac{1}{n}\frac{1}{{\sigma_{X}} \sigma_Y} \sum_{i=0}^{m} (X( i+j) - \bar{X} ) (Y(i) - \bar{Y})\\ \end{align*}$
where
$\begin{align*} \bar{X} = \frac{1}{m}\sum_{i=0}^{m}{X(i)} \quad &\sigma_X = \sum_{i=0}^{m} (X( i) - \bar{X}) \\ \end{align*}$

In this case, sometimes we can calculate the normalized cross-correlation function.
However, I guess that this definition is slightly different from the exact definition of normalized cross-correlation.
In general, the $C_\mathrm{norm}^\prime$ can calculate ,
may produce the values of out of range [-1,1] . (at least I could not calculate the cross-correlation ranged [-1, 1] , for example, between the attachmentsa.npy and b.npy).

import numpy as np
import maptlotlib.pyplot as plt

a = np.load("a.npy")
b = np.load("b.npy")

a = (a - np.mean(a))/(np.std(a)*len(a))
b = (b - np.mean(b))/(np.std(b))
corr = np.correlate(a, b, "full")

fig=plt.figure()
plt.plot(corr)
plt.show()
fig.savefig("test.png")

The exaxt normalized cross-correlation $C_\mathrm{norm}$ is defined as,
$\begin{align*} C_{\mathrm{norm}}(j) =& \frac{1}{n}\frac{1}{\sigma_{X}(j) \sigma_Y} \sum_{i=0}^{m} (X( i+j) - \bar{X}(j) ) (Y(i) - \bar{Y}) \end{align*}$

where,

$\begin{align*} \bar{X}(j) &=\frac{1}{m}\sum_{i=0}^{m}{X(i+j)}, \quad &\bar{Y} &= \frac{1}{m}\sum_{i=0}^{m}{Y(i)} \\ \sigma_X(j) &=\sum_{i=0}^{m} (X( i+j) - \bar{X}(i) ), \quad &\sigma_Y &= \sum_{i=0}^{m} (Y( i) - \bar{Y}) \\ \end{align*}$

We can see the difference in calculation range of mean value and standard deviation of X.
We probably can not calculate $C_\mathrm{norm}$ with current np.correlate(x, y) by preprocessing the input datx, y.
Therefore, I think the option of normalized cross-correlation function is necessary.
If you think this normalized option is fluitfull, I will try to implement it or the patch already exist?).

Please point out any mistakes
attachment.zip

ravwojdyla · 2021-08-19T22:24:13Z

For a full mode, would it make sense to compute corrcoef for each lag (zero padded)?

from dataclasses import dataclass
from typing import Any, Optional, Sequence

import numpy as np

ArrayLike = Any


@dataclass
class XCorr:
    cross_correlation: np.ndarray
    lags: np.ndarray


def cross_correlation(
    signal: ArrayLike, feature: ArrayLike, lags: Optional[Sequence[int]] = None
) -> XCorr:
    """
    Computes normalized cross correlation between the `signal` and the `feature`.
    Current implementation assumes the `feature` can't be longer than the `signal`.
    You can optionally provide specific lags, if not provided `signal` is padded
    with the length of the `feature` - 1, and the `feature` is slid/padded (creating lags)
    with 0 padding to match the length of the new signal. Pearson product-moment
    correlation coefficients is computed for each lag.

    :param signal: observed signal
    :param feature: feature you are looking for
    :param lags: optional lags, if not provided equals to (-len(feature), len(signal))
    """
    signal_ar = np.asarray(signal)
    feature_ar = np.asarray(feature)
    if np.count_nonzero(feature_ar) == 0:
        raise ValueError("Unsupported - feature contains only zeros")
    assert (
        signal_ar.ndim == feature_ar.ndim == 1
    ), "Unsupported - only 1d signal/feature supported"
    assert len(feature_ar) <= len(
        signal
    ), "Unsupported - signal should be at least as long as the feature"
    padding_sz = len(feature_ar) - 1
    padded_signal = np.pad(
        signal_ar, (padding_sz, padding_sz), "constant", constant_values=0
    )
    lags = lags if lags is not None else range(-padding_sz, len(signal_ar), 1)
    if np.max(lags) >= len(signal_ar):
        raise ValueError("max positive lag must be shorter than the signal")
    if np.min(lags) <= -len(feature_ar):
        raise ValueError("max negative lag can't be longer than the feature")
    assert np.max(lags) < len(signal_ar), ""
    lagged_patterns = np.asarray(
        [
            np.pad(
                feature_ar,
                (padding_sz + lag, len(signal_ar) - lag - 1),
                "constant",
                constant_values=0,
            )
            for lag in lags
        ]
    )
    return XCorr(
        cross_correlation=np.corrcoef(padded_signal, lagged_patterns)[0, 1:],
        lags=np.asarray(lags),
    )

Example:

sig = np.repeat([0., 1., 1., 0., 1., 0., 0., 1.], 128)
sig_noise = sig + rng.standard_normal(len(sig))
feature = np.ones(128)
corr = cross_correlation(sig_noise, feature)

sig looks like this:

+ noise:

cross correlation:

Rigel-Alves · 2023-06-15T13:12:23Z

Hi,

Any updates on this? I think it is really important to have the normalized version of the cross-correlation function, either in numpy or scipy, as we have in Matlab (https://www.mathworks.com/help/matlab/ref/xcorr.html). At the moment, I am getting values greater than 1 or smaller than -1, whereas (as pointed out by @yy-yuichi) the approach mentioned at https://stackoverflow.com/questions/5639280/why-numpy-correlate-and-corrcoef-return-different-values-and-how-to-normalize does not work all the times.

Kind regards,

trichter · 2023-06-15T21:20:17Z

Developers may be reluctant to implement a normalized cross-correlation because there are different definitions. The Matlab function you reference also only allows normalization of signals of equal length.

For example, in seismology we regularly use two different methods of normalization. The first case is used in template matching. You have a short signal (of an earthquake, len N1) and you want to find the same or similar signature in a longer time series. For a perfect match, ignoring amplitudes, you would like to get a 1 (or -1). For this "full" normalization you need to compute the root of the zero-lag autocorrelation (RAC) of all len N1 portions of the longer signal, and use only this and the RAC of the short signal to normalize each cc value. It may not be cheap to compute all the RACs in a loop (as suggested in the comment by @ravwojdyla).

In the second case, we usually have two long time series (e.g. one day, 100Hz sampling rate) of the same duration (or similar duration, difference in duration much shorter than duration of signals) and also we are not interested in the full xcorr, but only in the xcorr up to a certain time lag (again short compared to the full duration of the signals, note that the correlate options for mode are not practical in this use case, would be nice to have a maxlag option like Matlabs xcorr). By the way, these correlations are used to filter out similar signals in ambient vibrations at two different stations and "construct" waves between two receivers, a method that has gained some popularity. In this case, a "naive" normalization, which simply divides by the RAC of the two signals, can be used. It is less complicated than the "full" normalization, still normalized cc values will be in the range [-1, 1] (if signals have the same length). This is the normalization which is used by Matlab and corrcoef.

Implementations of both described normalizations, "full" and "naive", can be found in the repo mentioned above.

endolith · 2023-06-19T01:45:04Z

I imagine scipy is the more appropriate place for this, and both definitions could be included as a parameter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

normalized cross-correlation (Trac #1714) #2310

normalized cross-correlation (Trac #1714) #2310

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

normalized cross-correlation (Trac #1714) #2310

normalized cross-correlation (Trac #1714) #2310

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!