NelleV
diff --git a/‎doc/modules/tree.rst
Lines changed: 13 additions & 3 deletions b/‎doc/modules/tree.rst
Lines changed: 13 additions & 3 deletions
diff --git a/‎examples/tree/plot_tree_regression_multioutput.py
Lines changed: 55 additions & 0 deletions b/‎examples/tree/plot_tree_regression_multioutput.py
Lines changed: 55 additions & 0 deletions
@@ -191,7 +191,7 @@ A multi-output problem is a supervised learning problem with several outputs
 to predict, that is when Y is a 2d array of size ``[n_samples, n_outputs]``.
 
 When there is no correlation between the outputs, a very simple way to solve
-this kind of problems is to build n independent models, i.e. one for each
+this kind of problem is to build n independent models, i.e. one for each
 output, and then to use those models to independently predict each one of the n
 outputs. However, because it is likely that the output values related to the
 same input are themselves correlated, an often better way is to build a single
@@ -200,7 +200,7 @@ lower training time since only a single estimator is built. Second, the
 generalization accuracy of the resulting estimator may often be increased.
 
 With regard to decision trees, this strategy can readily be used to support
-multi-output problems. This indeed amounts to:
+multi-output problems. This requires the following changes:
 
   - Store n output values in leaves, instead of 1;
   - Use splitting criteria that compute the average reduction across all
@@ -215,7 +215,16 @@ of size ``[n_samples, n_outputs]`` then the resulting estimator will:
   - Output a list of n_output arrays of class probabilities upon
     ``predict_proba``.
 
-The use of multi-output trees is demonstrated in
+The use of multi-output trees for regression is demonstrated in
+:ref:`example_tree_plot_tree_regression_multioutput.py`. In this example, the input
+X is a single real value and the outputs Y are the sine and cosine of X.
+
+.. figure:: ../auto_examples/tree/images/plot_tree_regression_multioutput_1.png
+   :target: ../auto_examples/tree/plot_tree_regression_multioutput.html
+   :scale: 75
+   :align: center
+
+The use of multi-output trees for classification is demonstrated in
 :ref:`example_ensemble_plot_forest_multioutput.py`. In this example, the inputs
 X are the pixels of the upper half of faces and the outputs Y are the pixels of
 the lower half of those faces.
@@ -227,6 +236,7 @@ the lower half of those faces.
 
 .. topic:: Examples:
 
+ * :ref:`example_tree_plot_tree_regression_multioutput.py`
  * :ref:`example_ensemble_plot_forest_multioutput.py`
 
 
 
@@ -0,0 +1,55 @@
+"""
+===================================================================
+Multi-output Decision Tree Regression 
+===================================================================
+
+Multi-output regression with :ref:`decision trees <tree>`: the decision tree
+is used to predict simultaneously the noisy x and y observations of a circle
+given a single underlying feature. As a result, it learns local linear
+regressions approximating the circle.
+
+We can see that if the maximum depth of the tree (controlled by the
+`max_depth` parameter) is set too high, the decision trees learn too fine
+details of the training data and learn from the noise, i.e. they overfit.
+"""
+print __doc__
+
+import numpy as np
+
+# Create a random dataset
+rng = np.random.RandomState(1)
+X = np.sort(200 * rng.rand(100, 1) - 100, axis=0)
+y = np.array([np.pi * np.sin(X).ravel(), np.pi * np.cos(X).ravel()]).T
+y[::5,:] += (0.5 - rng.rand(20,2))
+
+# Fit regression model
+from sklearn.tree import DecisionTreeRegressor
+
+clf_1 = DecisionTreeRegressor(max_depth=2)
+clf_2 = DecisionTreeRegressor(max_depth=5)
+clf_3 = DecisionTreeRegressor(max_depth=8)
+clf_1.fit(X, y)
+clf_2.fit(X, y)
+clf_3.fit(X, y)
+
+# Predict
+X_test = np.arange(-100.0, 100.0, 0.01)[:, np.newaxis]
+y_1 = clf_1.predict(X_test)
+y_2 = clf_2.predict(X_test)
+y_3 = clf_3.predict(X_test)
+
+# Plot the results
+import pylab as pl
+
+pl.figure()
+pl.scatter(y[:,0], y[:,1], c="k", label="data")
+pl.scatter(y_1[:,0], y_1[:,1], c="g", label="max_depth=2")
+pl.scatter(y_2[:,0], y_2[:,1], c="r", label="max_depth=5")
+pl.scatter(y_3[:,0], y_3[:,1], c="b", label="max_depth=8")
+pl.xlim([-6, 6])
+pl.ylim([-6, 6])
+pl.xlabel("data")
+pl.ylabel("target")
+pl.title("Multi-output Decision Tree Regression")
+pl.legend()
+pl.show()