8000 normalized cross-correlation (Trac #1714) · Issue #2310 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

normalized cross-correlation (Trac #1714) #2310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
numpy-gitbot opened this issue Oct 19, 2012 · 17 comments
Open

normalized cross-correlation (Trac #1714) #2310

numpy-gitbot opened this issue Oct 19, 2012 · 17 comments

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/1714 on 2011-01-15 by trac user bubla, assigned to unknown.

Hello,
I have made a patch that extends the correlate function so it can compute normalized cross-correlation now
See [http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation the Wikipedia article].

I have added documentation and simple doctest too, of course.
The patch is against the latest master Git

@numpy-gitbot
Copy link
Author

Attachment added by trac user bubla on 2011-01-15: correlate.parch

@numpy-gitbot
Copy link
Author

@rgommers wrote on 2011-03-29

Since this is new functionality, you should ask on the mailing list whether this can be included.

A unit test is necessary, doctests are not executed by default in the test suite.

@charris charris added the Patch label Feb 20, 2014
@charris
Copy link
Member
charris commented Feb 20, 2014

This could be useful.

@plurch
Copy link
plurch commented Apr 6, 2014

Can someone verify if this is correct?

@seberg
Copy link
Member
seberg commented Apr 6, 2014

I am wondering about what is right in combination with mode='full'.

@jamessynge
Copy link

I happen to need this feature, but am not sure that it is correct except when mode='valid' AND both arrays are the same size. In other words, I think the normalization has to be applied to each window.

@VlamV
Copy link
VlamV commented Oct 20, 2014

any updates on the matter?
Has the patch been tested for all correlate modes?

@seberg
Copy link
Member
seberg commented Oct 20, 2014

@VlamV, as there is nothing here, no, I don't believe so. If you have time to fix it for the other modes, I think we could add it.

@hadim
Copy link
hadim commented Feb 2, 2015

Any news on this ?

@wilberth
Copy link
wilberth commented Oct 1, 2015

would it not make more sense to divide by the standard deviation of the portion of a that is actually used (in case of mode=same or mode=full)

also would it not make sense to use the bias and ddof options of numpy.corrcoef instead of simply dividing by len(a)

@ElpyDE
Copy link
ElpyDE commented Jul 10, 2017

There is an answer at SO that points here: https://stackoverflow.com/a/5639626/5392420

It also has the most relevant code included. As the attachment above is no longer available (at least to me), I thought it could be helpful to have this as a start for implementation.

I can also confirm that - at least in my case - it works to "normalize" the input vectors before using np.correlate like this and reasonable values will be returned within a range of [-1,1]:

a = (a - np.mean(a)) / (np.std(a) * len(a))
b = (b - np.mean(b)) / (np.std(b))
c = np.correlate(a, b, 'full')

@trichter
Copy link

The normalization by the portion of signal a that is actually used (see @wilberth's comment) is implemented with numpy methods in the correlate_template function in this repository: https://github.com/trichter/xcorr

@yy-yuichi
Copy link

Hi all, I'm a beginner of OSS, but maybe I have a comment about this open issue. Because the link is dead, I can't check the implementation progress of the normalized cross-correlation function. The normalized option is useful, and I believe this functionality should be added.
As @ElpyDE commented,the normalized cross-correlation function is calculated as,

where

In this case, sometimes we can calculate the normalized cross-correlation function.
However, I guess that this definition is slightly different from the exact definition of normalized cross-correlation.
In general, the can calculate ,
may produce the values of out of range [-1,1] . (at least I could not calculate the cross-correlation ranged [-1, 1] , for example, between the attachmentsa.npy and b.npy).

import numpy as np
import maptlotlib.pyplot as plt

a = np.load("a.npy")
b = np.load("b.npy")

a = (a - np.mean(a))/(np.std(a)*len(a))
b = (b - np.mean(b))/(np.std(b))
corr = np.correlate(a, b, "full")

fig=plt.figure()
plt.plot(corr)
plt.show()
fig.savefig("test.png")

test

The exaxt normalized cross-correlation is defined as,

where,

We can see the difference in calculation range of mean value and standard deviation of X.
We probably can not calculate with current np.correlate(x, y) by preprocessing the input datx, y.
Therefore, I think the option of normalized cross-correlation function is necessary.
If you think this normalized option is fluitfull, I will try to implement it or the patch already exist?).

Please point out any mistakes
attachment.zip

@ravwojdyla
Copy link

For a full mode, would it make sense to compute corrcoef for each lag (zero padded)?

from dataclasses import dataclass
from typing import Any, Optional, Sequence

import numpy as np

ArrayLike = Any


@dataclass
class XCorr:
    cross_correlation: np.ndarray
    lags: np.ndarray


def cross_correlation(
    signal: ArrayLike, feature: ArrayLike, lags: Optional[Sequence[int]] = None
) -> XCorr:
    """
    Computes normalized cross correlation between the `signal` and the `feature`.
    Current implementation assumes the `feature` can't be longer than the `signal`.
    You can optionally provide specific lags, if not provided `signal` is padded
    with the length of the `feature` - 1, and the `feature` is slid/padded (creating lags)
    with 0 padding to match the length of the new signal. Pearson product-moment
    correlation coefficients is computed for each lag.

    :param signal: observed signal
    :param feature: feature you are looking for
    :param lags: optional lags, if not provided equals to (-len(feature), len(signal))
    """
    signal_ar = np.asarray(signal)
    feature_ar = np.asarray(feature)
    if np.count_nonzero(feature_ar) == 0:
        raise ValueError("Unsupported - feature contains only zeros")
    assert (
        signal_ar.ndim == feature_ar.ndim == 1
    ), "Unsupported - only 1d signal/feature supported"
    assert len(feature_ar) <= len(
        signal
    ), "Unsupported - signal should be at least as long as the feature"
    padding_sz = len(feature_ar) - 1
    padded_signal = np.pad(
        signal_ar, (padding_sz, padding_sz), "constant", constant_values=0
    )
    lags = lags if lags is not None else range(-padding_sz, len(signal_ar), 1)
    if np.max(lags) >= len(signal_ar):
        raise ValueError("max positive lag must be shorter than the signal")
    if np.min(lags) <= -len(feature_ar):
        raise ValueError("max negative lag can't be longer than the feature")
    assert np.max(lags) < len(signal_ar), ""
    lagged_patterns = np.asarray(
        [
            np.pad(
                feature_ar,
                (padding_sz + lag, len(signal_ar) - lag - 1),
                "constant",
                constant_values=0,
            )
            for lag in lags
        ]
    )
    return XCorr(
        cross_correlation=np.corrcoef(padded_signal, lagged_patterns)[0, 1:],
        lags=np.asarray(lags),
    )

Example:

sig = np.repeat([0., 1., 1., 0., 1., 0., 0., 1.], 128)
sig_noise = sig + rng.standard_normal(len(sig))
feature = np.ones(128)
corr = cross_correlation(sig_noise, feature)

sig looks like this:

image

+ noise:

image

cross correlation:

image

@Rigel-Alves
Copy link

Hi,

Any updates on this? I think it is really important to have the normalized version of the cross-correlation function, either in numpy or scipy, as we have in Matlab (https://www.mathworks.com/help/matlab/ref/xcorr.html). At the moment, I am getting values greater than 1 or smaller than -1, whereas (as pointed out by @yy-yuichi) the approach mentioned at https://stackoverflow.com/questions/5639280/why-numpy-correlate-and-corrcoef-return-different-values-and-how-to-normalize does not work all the times.

Kind regards,

@trichter
Copy link

Developers may be reluctant to implement a normalized cross-correlation because there are different definitions. The Matlab function you reference also only allows normalization of signals of equal length.

For example, in seismology we regularly use two different methods of normalization. The first case is used in template matching. You have a short signal (of an earthquake, len N1) and you want to find the same or similar signature in a longer time series. For a perfect match, ignoring amplitudes, you would like to get a 1 (or -1). For this "full" normalization you need to compute the root of the zero-lag autocorrelation (RAC) of all len N1 portions of the longer signal, and use only this and the RAC of the short signal to normalize each cc value. It may not be cheap to compute all the RACs in a loop (as suggested in the comment by @ravwojdyla).

In the second case, we usually have two long time series (e.g. one day, 100Hz sampling rate) of the same duration (or similar duration, difference in duration much shorter than duration of signals) and also we are not interested in the full xcorr, but only in the xcorr up to a certain time lag (again short compared to the full duration of the signals, note that the correlate options for mode are not practical in this use case, would be nice to have a maxlag option like Matlabs xcorr). By the way, these correlations are used to filter out similar signals in ambient vibrations at two different stations and "construct" waves between two receivers, a method that has gained some popularity. In this case, a "naive" normalization, which simply divides by the RAC of the two signals, can be used. It is less complicated than the "full" normalization, still normalized cc values will be in the range [-1, 1] (if signals have the same length). This is the normalization which is used by Matlab and corrcoef.

Implementations of both described normalizations, "full" and "naive", can be found in the repo mentioned above.

@endolith
Copy link
Contributor

I imagine scipy is the more appropriate place for this, and both definitions could be included as a parameter.

See also scipy/scipy#4940

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

0