DataManipulation
• DataSorting
• Finding and Removing Duplicate Records
• Cleaning data
• Recording data
• Merging data
Data Sorting
• order( )
• sort( )
• dplyr()
R provides a different way to sort the data either in ascending or descending order; Data-
analysts, and Data scientists use order(), sort() and packages like dplyr to sort data depending
upon the structure of the obtained data.
order() can sort vector, matrix, and also a dataframe can be sorted in ascending and descending
order with its help, which is shown in the final section of this tutorial.
The syntax of order() is
order(x, decreasing = TRUE or FALSE, na.last = TRUE or FLASE, method = c("auto", "shell",
"quick", "radix") )
• x: data-frames, matrices, or vectors
• decreasing: boolean value; TRUE then sort in descending order
or FALSE then sort in ascending order.
• na.last: boolean value; TRUE then NA indices are put at last or
FLASE THEN NA indices are put first.
• method: sorting method to be used.
order() in R
An example of order() in action.
Below the code contains variable x, which includes a vector with a list of numbers. The numbers
are ordered according to its index by using order(x).
y = c(4,12,6,7,2,9,5)
order(y)
The above code gives the following output:
5173462
Here the order() will sort the given numbers according to its index in the ascending order. Since
number 2 is the smallest, which has an index as five and number 4 is index 1, and similarly, the
process moves forward in the same pattern.
y = c(4,12,6,7,2,9,5)
y[order(y)]
The above code gives the following output:
2 4 5 6 7 9 12
Here the indexing of order is done where the actual values are printed in the ascending order.
The values are ordered according to the index using order() then after each value accessed
using y[some-value].
Sorting vector using different parameters in order()
Let's look at an example where the datasets contain the value as symbol NA(Not available).
order(x,na.last=TRUE)
x <- c(8,2,4,1,-4,NA,46,8,9,5,3)
order(x,na.last = TRUE)
The above code gives the following output:
5 4 2 11 3 10 1 8 9 7 6
Here the order() will also sort the given list of numbers according to its index in the ascending
order. Since NA is present, its index will be placed last, where 6 will be placed last because
of na.last=TRUE.
order(x,na.last=FALSE)
order(x,na.last=FALSE)
The above code gives the following output:
6 5 4 2 11 3 10 1 8 9 7
Here the order() will also sort the given list of numbers according to its index in the ascending
order. Since NA is present, it's index, which is 6, will be placed first because of na.last=FALSE.
order(x,decreasing=TRUE,na.last=TRUE)
order(x,decreasing=TRUE,na.last=TRUE)
The above code gives the following output:
7 9 1 8 10 3 11 2 4 5 6
Here order() will sort a given list of numbers according to its index in the descending order
because of decreasing=TRUE: 46. The largest is placed at index 7, and the other values are
arranged in a decreasing manner. Since NA is present, index 6 will be placed last because
of na.last=TRUE.
order(x,decreasing=FALSE,na.last=FALSE)
order(x,decreasing=FALSE,na.last=FALSE)
The above code gives the following output:
6 5 4 2 11 3 10 1 8 9 7
Here NA is present which index is 6 will be placed at first because
of na.last=FALSE. order() will sort a given list of numbers according to its index in the
ascending order because of decreasing=FALSE: -4, which is smallest placed at index 5, and the
other values are arranged increasingly.
Sorting a dataframe by using order()
Let's create a dataframe where the population value is 10. The variable gender consists of vector
values 'male' and 'female' where 10 sample values could be obtained with the help of sample(),
whereas replace = TRUE will generate only the unique values. Similarly, the age consists of
value from 25 to 75, along with a degree of possible value as c("MA," "ME," "BE," "BSCS"),
which again will generate unique values.
Task: To sort the given data in the ascending order based on the given population's age.
Note: The sample data shown may differ while you're trying to use it in your local machine
because each time running a code will create a unique dataframe.
population = 10
gender=sample(c("male","female"),population,replace=TRUE)
age = sample(25:75, population, replace=TRUE)
degree = sample(c("MA","ME","BE","BSCS"), population, replace=TRUE)
(final.data = data.frame(gender=gender, age=age, degree=degree))
gender age degree
male 40 MA
female 57 BSCS
gender age degree
male 66 BE
female 61 BSCS
female 48 MA
male 25 MA
female 49 BE
male 52 ME
female 57 MA
female 35 MA
The above code gives the following output, which shows a newly created dataframe.
gender age degree
male 40 MA
female 57 BSCS
male 66 BE
female 61 BSCS
female 48 MA
male 25 MA
female 49 BE
male 52 ME
female 57 MA
female 35 MA
Let's sort the dataframe in the ascending order by using order() based on the variable age.
order(final.data$age)
The above code gives the following output:
6 10 3 9 5 8 4 2 7 1
Since age 25 is at index 6 followed by age 35 at index 10 and similarly, all the age-related values
are arranged in ascending order.
The code below contains the [] order with variable age, is used to arrange in ascending order
where the gender, along with degree information is also printed.
final.data[order(final.data$age),]
gender age degr
6 male 25 MA
10 female 35 MA
1 male 40 MA
5 female 48 MA
gender age degr
7 female 49 BE
8 male 52 ME
2 female 57 BSCS
9 female 57 MA
4 female 61 BSCS
3 male 66 BE
The above code gives the following output:
gender age degree
6 male 25 MA
10 female 35 MA
1 male 40 MA
5 female 48 MA
7 female 49 BE
8 male 52 ME
2 female 57 BSCS
9 female 57 MA
4 female 61 BSCS
3 male 66 BE
The output above shows that age is arranged in ascending order along with its corresponding
gender and degree information is obtained.
Sorting in vector
x<- c(6,7,1,2,5,9,8)
x
[1] 6 7 1 2 5 9 8
sort(x)
[1] 1 2 5 6 7 8 9
rank(x)
[1] 4 5 1 2 3 7 6
order(x)
[1] 3 4 5 1 2 7 6
x[order(x)]
[1] 1 2 5 6 7 8 9
x[order(-x)]
[1] 9 8 7 6 5 2 1
x[order(rank(x))]
[1] 1 2 5 6 7 8 9
x[order(rank(-x))]
[1] 9 8 7 6 5 2 1
To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend
the sorting variable by a minus sign to indicate DESCENDING order. Here are some examples.
mtcars
dim(mtcars)
head(mtcars)
mtcars1=tail(mtcars)
attach(mtcars1)
newdata<-mtcars1[order(mpg),] ascending order based on mpg
newdata
newdata<-mtcars1[order(-mpg),] descending order based on mpg
newdata
newdata<-mtcars1[order(hp),] ascending order based on hp
newdata
newdata<-mtcars1[order(gear,carb),] ascending order based on gear,carb
newdata
newdata<-mtcars1[order(gear,-carb),] ascending based on gear and descending carb
detach(mtcars)
Without attach(mtcars)
We need to mention dataset name with $ symbol before attribute name
newdata<-mtcars1[order(mtcars1$mpg),]
newdata<-mtcars1[order(mtcars1$mpg, mtcars1$cyl),]
newdata<-mtcars1[order(mtcars1$mpg,- mtcars1$cyl),]