[go: up one dir, main page]

0% found this document useful (0 votes)
4 views5 pages

Dec Tree

Uploaded by

shalini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

Dec Tree

Uploaded by

shalini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Decision tree:

A Decision Tree is a non-parametric supervised learning and non-linear model used in machine
learning for classification and regression tasks. It represents decisions and their possible
consequences, including chance event outcomes, resource costs, and utility. Essentially, it's a
flowchart-like structure where:

• Each internal node represents a decision based on a feature (or attribute).


• Each branch represents the outcome of that decision.
• Each leaf node represents a class label (in classification) or a continuous value (in regression).

Decision Tree Induction

Decision Tree Induction refers to the process of learning (or constructing) a decision tree from
training data. Here's a detailed explanation of how it works:

Feature Selection Criterion:

To decide which feature to split on at each step in the tree, a feature selection criterion is used.
Common criteria include:

Gini Index: Measures the impurity of a node. The goal is to minimize the Gini Index for each split.
Information Gain: Based on entropy, it measures the reduction in entropy before and after the split.
Higher information gain indicates a better split.

Gain Ratio: Adjusts information gain by taking into account the intrinsic information of a split.

Chi-square: Measures the statistical significance of the differences in the distributions of classes
among different branches.

Tree Construction Process:

Step 1: Start with the entire dataset as the root.

Step 2: At each node, select the best feature to split the data based on the chosen criterion.

Step 3: Split the dataset into subsets that contain instances with similar values for the selected
feature.

Step 4: Repeat the process recursively for each subset, using only the remaining features.

Stopping Criteria:

The recursive process stops when one of the following conditions is met:

All instances in a node belong to the same class (pure node).

There are no remaining features to split upon.

The predefined depth limit (maximum depth) of the tree is reached.

A minimum number of instances for a node to be considered for splitting is reached.

Pruning:

After the tree is fully grown, it might be too complex and overfit the training data. Pruning helps to
reduce this complexity:
Pre-pruning: Stops the tree growth early based on predefined thresholds (like maximum depth or
minimum samples per leaf).

Post-pruning: Removes branches that have little importance or provide less information. Methods
include cost-complexity pruning (e.g., reduced error pruning).

Eg:

You might also like