Привет Илья,

I'm actually really not excited about affinity propagation. Firstly, it's
slow. Clustering has pretty much 2 usecases. The first one is to find
latent meaningful structure. This is a hard problem in the sens of
learning theory, thus to be able to trust the solution one needs many
sample. The second problem is to be able to reduce the problem size, by
assigning replacing samples by centers. Both of these usecases are really
relevant only when there are many samples. Thus a slow clustering method
is not very useful. The second reason that I don't like affinity
propagation, is that it has many parameters to set, and gives very
strange/unstable results.

I think that the empirical comparison of clustering algorithms that we
have on top of the clustering page:
http://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods
is quite telling in terms of what are the limitation of affinity
propagation. I have personally not seen it used in any non trivial
application (or academic papers interested in it theoretically).

Now, the enhancements that you are proposing are trying to tackle both
limitations of affinity propagation. So, on paper, they look great.
However, I am a computer scientist that publishes papers on methods, and
thus I know how weak a claim is when it is in a paper by the authors of a
method. Thus I don't trust that a method actually has the benefits that
it claims it has, unless I see it proved on many different applications,
by many different people. Experience has really taught me this, and I
must say that there are some methods that I regret pushing in
scikit-learn. That's why we have the requirements on the number of
citations. We find that a method that is really useful gets used, and
thus cited. Of one of proving us wrong, is to do an implementation
outside of scikit-learn, in a separate package, and in the examples of
this package, show that the method really solves very well problems that
are not solved way by the methods in scikit-learn.


Do you understand our line of thought's? It's not against methods in
general, it's just that we are trying hard to find the right subset of
the literature that we should be struggling to keep alive and kicking.

Cheers,

Gaël


On Wed, Dec 03, 2014 at 01:08:17PM +0000, Илья Патрушев wrote:
> Hi Andy,

> Adaptive Affinity Propagation is essentially an additional optimisation layer
> on top of the original Affinity Propagation algorithm.
> Affinity Propagation algorithm works off the similarity matrix and tries to
> identify a number of data points that would be "centres" of clusters. The
> behaviour of Affinity Propagation algorithm is governed by two parameters:
> preferences (a vector of n_samples size) and damping.
> The preferences on one hand are the way to incorporate prior knowledge about
> likely cluster centres, on the other hand they control the number of clusters
> produced by the algorithm. When there is no prior knowledge, preferences are
> set to the same value for all sample points. The general relationship between
> the preference value and the number of clusters is; the greater the value the
> greater the number of clusters. The authors of the Affinity Propagation
> algorithm recommend using the median similarity value, but in the end one has
> to find the right preference value for each new clustering problem.
> The damping parameter defines speed at which the algorithm updates its
> responsibility/availability evidence. The higher the damping parameter is the
> less the algorithm prone to oscillations, but this slows down convergence.
> The Wang's solution is to run Affinity Propagation algorithm starting with
> quite high preference value (like .5 of median similarity). As it converges,
> the goodness of clustering is measured (they suggested Silhouette index) the
> preference is decreased, and these steps are repeated until the algorithm
> produces some minimal number of clusters. Along with that, the presence of
> oscillations is monitored and should they appear they are controlled by
> adjusting the damping parameter, should it reaches maximum value by reducing
> the preference value.
> The pdf in arXiv is the English translation of the original paper published in
> Chinese.
> I agree, Adaptive Affinity Propagation is not as widely used method as defined
> in FAQ, I should have looked in it beforehand. May be it can be considered a
> clear-cut improvement of the Affinity Propagation algorithm?
> Any way if it is not to be added in sklearn, I am quite happy to release it 
> via
> PyPI.

> Best wishes,
> ilya


> 2014-12-02 14:34 GMT+00:00 Andy <t3k...@gmail.com>:

>     Hi Ilya.

>     Thanks for your interest in contributing.
>     I am not expert in affinity propagation, so it would be great if you could
>     give some details of what the advantage of the method is.
>     The reference paper seems to be an arxiv preprint with 88 citations, which
>     would probably not qualify for inclusion in scikit-learn,
>     see the FAQ http://scikit-learn.org/dev/faq.html#
>     can-i-add-this-new-algorithm-that-i-or-someone-else-just-published

>     It might be a candidate for an external experimental / contribution
>     project, which has been an idea that has been floating around for a while.

>     Cheers,
>     Andy



>     On 12/02/2014 09:06 AM, Илья Патрушев wrote:

>         Hi everybody,

>         As far as I am aware, there is no adaptive affinity propagation
>         clustering algorithm implementation in neither the stable nor the
>         development version of sklearn.
>         I have recently implemented the adaptive affinity propagation 
> algorithm
>         as a part of my image analysis project. I based my implementation on
>         the paper by Wang et al., 2007 and their Matlab code, and sklearn's
>         affinity propagation algorithm. This is not exactly a port of Matlab
>         code since I have slightly modified the Wang's approach to deal with
>         oscillations and added an optional upper limit on number of clusters.
>         I am planning to submit the code to sklearn eventually. So please let
>         me know if anybody already works on the algorithm, as we could join 
> our
>         efforts and save some time.

>         Best wishes,
>         ilya.
-- 
    Gael Varoquaux
    Researcher, INRIA Parietal
    Laboratoire de Neuro-Imagerie Assistee par Ordinateur
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to