Data Transformation

This document provides information on data transformation and manipulation techniques in Stata. It discusses reshaping data from wide to long format using the reshape command. It also covers selecting, filtering, and replacing parts of data through commands like drop, keep, and replace. Finally, it shows how to combine data by appending new observations to an existing dataset.

Uploaded by

light

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views1 page

Data Transformation

Uploaded by

light

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Data Transformation Reshape Data Manipulate Strings

with Stata 14.1 Cheat Sheet webuse set https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data GET STRING PROPERTIES
For more info see Stata’s reference manual (stata.com) webuse "coffeeMaize.dta" load demo dataset display length("This string has 29 characters")
MELT DATA (WIDE → LONG) return the length of the string
Select Parts of Data (Subsetting) reshape variables starting unique id create new variable which captures charlist make * user-defined package
with coffee and maize variable (key) the info in the column names
SELECT SPECIFIC COLUMNS display the set of unique characters within a string
drop make reshape long coffee@ maize@, i(country) j(year) new variable display strpos("Stata", "a")
remove the 'make' variable convert a wide dataset to long return the position in Stata where a is first found
TIDY DATASETS have
keep make price WIDE LONG (TIDY) each observation FIND MATCHING STRINGS
opposite of drop; keep only variables 'make' and 'price' country coffee coffee maize maize melt country year coffee maize
in its own row and display strmatch("123.89", "1??.?9")
2011 2012 2011 2012 Malawi 2011
FILTER SPECIFIC ROWS Malawi Malawi 2012 each variable in its return true (1) or false (0) if string matches pattern
drop if mpg < 20 drop in 1/4 Rwanda Rwanda 2011 own column. display substr("Stata", 3, 5)
Rwanda 2012
drop observations based on a condition (left)
Uganda cast Uganda 2011
return the string located between characters 3-5
or rows 1-4 (right) CAST DATA (LONG → WIDE)
Uganda 2012
When datasets are list make if regexm(make, "[0-9]")
keep in 1/30 what will be create new variables tidy, they have a list observations where make matches the regular
opposite of drop; keep only rows 1-30 create new variables named unique id with the year added consistent, expression (here, records that contain a number)
keep if inrange(price, 5000, 10000) coffee2011, maize2012... variable (key) to the column name standard format
that is easier to list if regexm(make, "(Cad.|Chev.|Datsun)")
keep values of price between $5,000 – $10,000 (inclusive) reshape wide coffee maize, i(country) j(year) return all observations where make contains
keep if inlist(make, "Honda Accord", "Honda Civic", "Subaru") manipulate and
convert a long dataset to wide analyze. "Cad.", "Chev." or "Datsun"
keep the specified values of make compare the given list against the first word in make
xpose, clear varname
sample 25 transpose rows and columns of data, clearing the data and saving list if inlist(word(make, 1), "Cad.", "Chev.", "Datsun")
sample 25% of the observations in the dataset old column names as a new variable called "_varname" return all observations where the first word of the
(use set seed # command for reproducible sampling) make variable contains the listed words
Replace Parts of Data Combine Data TRANSFORM STRINGS
display regexr("My string", "My", "Your")
CHANGE COLUMN NAMES ADDING (APPENDING) NEW DATA replace string1 ("My") with string2 ("Your")
rename (rep78 foreign) (repairRecord carType) id blue pink
webuse coffeeMaize2.dta, clear replace make = subinstr(make, "Cad.", "Cadillac", 1)
rename one or multiple variables id blue pink save coffeeMaize2.dta, replace load demo data replace first occurrence of "Cad." with Cadillac
should webuse coffeeMaize.dta, clear in the make variable
CHANGE ROW VALUES contain
replace price = 5000 if price < 5000 + the same
variables
append using "coffeeMaize2.dta", gen(filenum) display stritrim(" Too much Space")
id blue pink
(columns) add observations from "coffeeMaize2.dta" to replace consecutive spaces with a single space
replace all values of price that are less than $5,000 with 5000 current data and create variable "filenum" to display trim(" leading / trailing spaces ")
recode price (0 / 5000 = 5000) track the origin of each observation
remove extra spaces before and after a string
change all prices less than 5000 to be $5,000
MERGING TWO DATASETS TOGETHER display strlower("STATA should not be ALL-CAPS")
recode foreign (0 = 2 "US")(1 = 1 "Not US"), gen(foreign2) webuse ind_age.dta, clear
save ind_age.dta, replace change string case; see also strupper, strproper
change the values and value labels then store in a new must contain a
ONE-TO-ONE
variable, foreign2 common variable webuse ind_ag.dta, clear display strtoname("1Var name")
id blue pink (id) id brown
REPLACE MISSING VALUES
id blue pink brown _merge
merge 1:1 id using "ind_age.dta" convert string to Stata-compatible variable name
+ =
3
one-to-one merge of "ind_age.dta" display real("100")
mvdecode _all, mv(9999) useful for cleaning survey datasets 3
3 into the loaded dataset and create convert string to a numeric or missing value
replace the number 9999 with missing value in all variables
variable "_merge" to track the origin
MANY-TO-ONE
mvencode _all, mv(9999) useful for exporting data
replace missing values with the number 9999 for all variables
Save & Export Data
id blue pink id brown id blue pink brown _merge
webuse hh2.dta, clear
save hh2.dta, replace compress
=
3

Label Data + .
3
webuse ind2.dta, clear compress data in memory
1
save "myData.dta", replace Stata 12-compatible file

Value labels map string descriptions to numbers. They allow the _merge code
1 row only
3
3
merge m:1 hid using "hh2.dta" saveold "myData.dta", replace version(12)
underlying data to be numeric (making logical tests simpler) (master) in ind2
2 row only
. 1 many-to-one merge of "hh2.dta" save data in Stata format, replacing the data if
while also connecting the values to human-understandable text. (using) in hh2 . . 2 into the loaded dataset and create a file with same name exists
label define myLabel 0 "US" 1 "Not US"
3 row in
(match) both variable "_merge" to track the origin export excel "myData.xls", /*
label values foreign myLabel FUZZY MATCHING: COMBINING TWO DATASETS WITHOUT A COMMON ID */ firstrow(variables) replace
define a label and apply it the values in foreign export data as an Excel file (.xls) with the
reclink match records from different data sets using probabilistic matching ssc install reclink variable names as the first row
label list note: data note here jarowinkler create distance measure for similarity between two strings ssc install jarowinkler export delimited "myData.csv", delimiter(",") replace
list all labels within the dataset place note in dataset export data as a comma-delimited file (.csv)
Tim Essam (tessam@usaid.gov) • Laura Hughes (lhughes@usaid.gov) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) geocenter.github.io/StataTraining updated March 2016
follow us @StataRGIS and @flaneuseks Disclaimer: we are not affiliated with Stata. But we like it. CC BY 4.0

Stata Commands Cheat Sheet
No ratings yet
Stata Commands Cheat Sheet
1 page
On The Theory of Scales of Measurement - S. S. Stevens
100% (3)
On The Theory of Scales of Measurement - S. S. Stevens
5 pages
Glint Ebook PS 191203
No ratings yet
Glint Ebook PS 191203
45 pages
2003 Makipaa 1
No ratings yet
2003 Makipaa 1
15 pages
Research Variables
100% (1)
Research Variables
108 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
Proc Report
No ratings yet
Proc Report
32 pages
Semantic Usage in SAP Datasphere - A Simplified Guide
No ratings yet
Semantic Usage in SAP Datasphere - A Simplified Guide
11 pages
Complex Interventions Guidance
100% (1)
Complex Interventions Guidance
39 pages
Approaches To The Analysis of Survey Data PDF
No ratings yet
Approaches To The Analysis of Survey Data PDF
28 pages
Decision Science Helps Boost Business
No ratings yet
Decision Science Helps Boost Business
5 pages
Data Forecast - What If Analysis - Few More Examples
No ratings yet
Data Forecast - What If Analysis - Few More Examples
20 pages
BTS Navigating Strategy Execution The Case For Business Simulations
No ratings yet
BTS Navigating Strategy Execution The Case For Business Simulations
10 pages
Chapter 9 Fundamental of Hypothesis Testing
No ratings yet
Chapter 9 Fundamental of Hypothesis Testing
26 pages
Guide To Data Cleaning in Ms Excel
No ratings yet
Guide To Data Cleaning in Ms Excel
6 pages
IGNOU PG Diploma in Applied Statistics Assignment Booklet 2020
No ratings yet
IGNOU PG Diploma in Applied Statistics Assignment Booklet 2020
30 pages
Finance Profiles
No ratings yet
Finance Profiles
24 pages
Statistics
No ratings yet
Statistics
27 pages
Expert Cube Development With SSAS Multidimensional Models Sample Chapter
No ratings yet
Expert Cube Development With SSAS Multidimensional Models Sample Chapter
38 pages
Pivot Table Exercise
No ratings yet
Pivot Table Exercise
26 pages
DBA Interview Questions
100% (1)
DBA Interview Questions
21 pages
Handson VBA
No ratings yet
Handson VBA
27 pages
Linear Regression Guide for Analysts
No ratings yet
Linear Regression Guide for Analysts
46 pages
Data Profiling Overview: What Is Data Profiling, and How Can It Help With Data Quality?
No ratings yet
Data Profiling Overview: What Is Data Profiling, and How Can It Help With Data Quality?
3 pages
(Ebook PDF) Statistics For Political Analysis: Understanding The Numbers Revised Edition All Chapters Available
No ratings yet
(Ebook PDF) Statistics For Political Analysis: Understanding The Numbers Revised Edition All Chapters Available
124 pages
It - Skill - Lab - Practical (1) .Docx Tanishka Mathur
No ratings yet
It - Skill - Lab - Practical (1) .Docx Tanishka Mathur
70 pages
SAS 9.2 Statistical Graphics Guide
No ratings yet
SAS 9.2 Statistical Graphics Guide
11 pages
VBScript Guide for Web Developers
No ratings yet
VBScript Guide for Web Developers
66 pages
Data Visualization Using Ggplot2
No ratings yet
Data Visualization Using Ggplot2
21 pages
Augmented Analytics for BI Experts
No ratings yet
Augmented Analytics for BI Experts
8 pages
Assignment 1&2
No ratings yet
Assignment 1&2
4 pages
A Lesson 1 Introduction To Statistics & SPSS
100% (1)
A Lesson 1 Introduction To Statistics & SPSS
8 pages
Data Manipulation
No ratings yet
Data Manipulation
24 pages
PSSC Maths Statistics Project Handbook Eff08 PDF
No ratings yet
PSSC Maths Statistics Project Handbook Eff08 PDF
19 pages
Decision Support Systems Guide
No ratings yet
Decision Support Systems Guide
9 pages
Detecting Data Outliers Guide
No ratings yet
Detecting Data Outliers Guide
7 pages
L1 2 Introduction
No ratings yet
L1 2 Introduction
79 pages
CODE201911 Practices DataVisualizations
No ratings yet
CODE201911 Practices DataVisualizations
19 pages
Common Tables For Financial Systems PeopleSoft
No ratings yet
Common Tables For Financial Systems PeopleSoft
10 pages
Self Service BI
No ratings yet
Self Service BI
6 pages
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
No ratings yet
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
15 pages
ST Lucia March 2022 TB
No ratings yet
ST Lucia March 2022 TB
26 pages
Homework 4
No ratings yet
Homework 4
4 pages
Excel Training Deck (Basic To Intermediate Level)
No ratings yet
Excel Training Deck (Basic To Intermediate Level)
118 pages
R For Data Science
No ratings yet
R For Data Science
4 pages
INTRODUCTION TO WHAT - If ANALYSIS
No ratings yet
INTRODUCTION TO WHAT - If ANALYSIS
15 pages
Books Available From SAS Press
No ratings yet
Books Available From SAS Press
4 pages
Stats Statcrunch Card PDF
No ratings yet
Stats Statcrunch Card PDF
2 pages
Datasphere Data Flow Best Practices With SAP Remote Tables
No ratings yet
Datasphere Data Flow Best Practices With SAP Remote Tables
10 pages
Big Data Capabilities Create Business Value - The Mediating Role of Decision-Making Impact
No ratings yet
Big Data Capabilities Create Business Value - The Mediating Role of Decision-Making Impact
11 pages
StataCheatsheet Transformation15 June 2016 TE-REV
No ratings yet
StataCheatsheet Transformation15 June 2016 TE-REV
1 page
SAS R::: Cheat Sheet
No ratings yet
SAS R::: Cheat Sheet
2 pages
SAS to R: A User's Cheat Sheet
No ratings yet
SAS to R: A User's Cheat Sheet
2 pages
AllCheatSheets Stata v15
No ratings yet
AllCheatSheets Stata v15
6 pages
Introduction To Stata Data Management: Chang Y. Chung Office of Population Research Princeton University September 2013
100% (1)
Introduction To Stata Data Management: Chang Y. Chung Office of Population Research Princeton University September 2013
24 pages
Stat A Tutorial
No ratings yet
Stat A Tutorial
40 pages
Stata Data Processing Cheat Sheet
100% (1)
Stata Data Processing Cheat Sheet
6 pages
Stata Commands Cheat Sheet
No ratings yet
Stata Commands Cheat Sheet
6 pages
Stata Commands Cheat Sheet
No ratings yet
Stata Commands Cheat Sheet
6 pages
Smartphone Use & Grip Strength Study
No ratings yet
Smartphone Use & Grip Strength Study
30 pages
Machine Tool 2007
No ratings yet
Machine Tool 2007
15 pages
Asma - Report For QOLI
100% (1)
Asma - Report For QOLI
5 pages
MSG's Impact on Plant Growth
No ratings yet
MSG's Impact on Plant Growth
9 pages
Merged 20231031 1841
No ratings yet
Merged 20231031 1841
150 pages
Dale's Cone of Experiment
100% (1)
Dale's Cone of Experiment
23 pages
Areas Facing Physical and Economic Water Scarcity
No ratings yet
Areas Facing Physical and Economic Water Scarcity
3 pages
"Haps - Part Viii: Bank Shots" Illustrated Principles David Alciatore, PHD ("Dr. Dave")
No ratings yet
"Haps - Part Viii: Bank Shots" Illustrated Principles David Alciatore, PHD ("Dr. Dave")
4 pages
Level 2 Biochemistry of Diabetes Mellitus
No ratings yet
Level 2 Biochemistry of Diabetes Mellitus
60 pages
Avaliação Final de Inglês 3 ° Ano.
No ratings yet
Avaliação Final de Inglês 3 ° Ano.
4 pages
PDF Personal Best A2 Elm Teacherx27s Book DL
75% (4)
PDF Personal Best A2 Elm Teacherx27s Book DL
120 pages
Inset 2018 - 2019 Narrative Report
100% (3)
Inset 2018 - 2019 Narrative Report
4 pages
I Messages
No ratings yet
I Messages
2 pages
99 Suspension SM
No ratings yet
99 Suspension SM
74 pages
S3-4 Rotameter 010113
No ratings yet
S3-4 Rotameter 010113
21 pages
Airport Vocabulary Guide
No ratings yet
Airport Vocabulary Guide
12 pages
Database System Concepts 7th Edition Avi Silberschatz 2024 Scribd Download
100% (8)
Database System Concepts 7th Edition Avi Silberschatz 2024 Scribd Download
49 pages
Hbo Communication Polinar
No ratings yet
Hbo Communication Polinar
24 pages
101 Guide To Product Sourcing Selling On EBay
100% (2)
101 Guide To Product Sourcing Selling On EBay
40 pages
Sample Preparation For Bioanalysis
No ratings yet
Sample Preparation For Bioanalysis
26 pages
Brazil and Africa José Honório Rodrigues Full Chapters Included
100% (11)
Brazil and Africa José Honório Rodrigues Full Chapters Included
118 pages
Bodebrown Imperial IPA PDF
No ratings yet
Bodebrown Imperial IPA PDF
2 pages
Construction Site Code & Guidelines
No ratings yet
Construction Site Code & Guidelines
37 pages
Lec No.5
No ratings yet
Lec No.5
24 pages
No - Rules Example Present Tense Future Tense 사다 ㅂ니다 삽니다 ㄹ겁니다 살겁니다 팔다 습니다 팔습니다 을겁니다 팔을겁니다 Past Tense Consonant With 았습니다
No ratings yet
No - Rules Example Present Tense Future Tense 사다 ㅂ니다 삽니다 ㄹ겁니다 살겁니다 팔다 습니다 팔습니다 을겁니다 팔을겁니다 Past Tense Consonant With 았습니다
1 page
Fiji Year 10 Basic Technology Textbook
84% (64)
Fiji Year 10 Basic Technology Textbook
88 pages
Lesson Plan Technology
No ratings yet
Lesson Plan Technology
3 pages
CBSE Class 11 Sociology Revision Notes Sociology and Society
100% (1)
CBSE Class 11 Sociology Revision Notes Sociology and Society
9 pages
Rozerplasty
No ratings yet
Rozerplasty
4 pages
Dasar K3 2014 - Psychosocial Hazards and Occupational Stress
No ratings yet
Dasar K3 2014 - Psychosocial Hazards and Occupational Stress
24 pages

Data Transformation

Uploaded by

Data Transformation

Uploaded by

Data Transformation Reshape Data Manipulate Strings

You might also like