[go: up one dir, main page]

0% found this document useful (0 votes)
15 views31 pages

Class 02

The document discusses the origins and evolution of data science. It describes how data science emerged from statistics and was influenced by statisticians like John Tukey and John Chambers who helped develop methods and software for data analysis. It then discusses how machine learning and neural networks grew in popularity, led by statistician Leo Breiman, and how this marked a shift from statistical modeling to algorithmic modeling. The document advocates for data science to take an interdisciplinary approach, combining areas like mathematics, statistics, computer science and domain-specific fields to extract knowledge and insights from data.

Uploaded by

tatis.re.11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views31 pages

Class 02

The document discusses the origins and evolution of data science. It describes how data science emerged from statistics and was influenced by statisticians like John Tukey and John Chambers who helped develop methods and software for data analysis. It then discusses how machine learning and neural networks grew in popularity, led by statistician Leo Breiman, and how this marked a shift from statistical modeling to algorithmic modeling. The document advocates for data science to take an interdisciplinary approach, combining areas like mathematics, statistics, computer science and domain-specific fields to extract knowledge and insights from data.

Uploaded by

tatis.re.11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Orígenes, evolución e interdisciplinariedad

CIENCIA DE DATOS

INTRO DATA SCIENCE Lucy Jiménez 02-08-2023


¿QUÉ ES?
CIENCIA DE DATOS
Es un campo multidisciplinario que
usa métodos cientí cos, procesos,
algoritmos y sistemas para
extraer conocimientos, ideas y
valor de los datos.
fi
ORIGEN Y EVOLUCIÓN
JOHN W. TUKEY
1915-2000

Matemático, estadístico

Algoritmos (FFT), diagrama de caja y


bigotes

Profesor Universidad Princeton

Métodos estadísticos para Bell Labs


“WE NEED TO FACE UP TO THE NECESSARILY APPROXIMATE
NATURE OF USEFUL RESULTS IN DATA ANALYSIS.”

“WE NEED TO FACE UP TO THE NEED FOR COLLECTING THE


RESULTS OF ACTUAL EXPERIENCE WITH SPECIFIC DATA-
ANALYTIC TECHNIQUES.”

“WE NEED TO FACE UP TO THE NEED FOR ITERATIVE


PROCEDURES IN DATA ANALYSIS.”

“WE NEED TO FACE UP TO THE NEED FOR BOTH INDICATION AND


CONCLUSION IN THE SAME ANALYSIS.”

John W. Tukey
“WE NEED TO FACE UP TO THE NECESSARILY APPROXIMATE
NATURE OF USEFUL RESULTS IN DATA ANALYSIS.”

“WE NEED TO FACE UP TO THE NEED FOR COLLECTING THE


RESULTS OF ACTUAL EXPERIENCE WITH SPECIFIC DATA-
ANALYTIC TECHNIQUES.”

“WE NEED TO FACE UP TO THE NEED FOR ITERATIVE


PROCEDURES IN DATA ANALYSIS.”

“WE NEED TO FACE UP TO THE NEED FOR BOTH INDICATION AND


CONCLUSION IN THE SAME ANALYSIS.”

John W. Tukey
“WE NEED TO FACE UP TO THE NECESSARILY APPROXIMATE
NATURE OF USEFUL RESULTS IN DATA ANALYSIS.”

“WE NEED TO FACE UP TO THE NEED FOR COLLECTING THE


RESULTS OF ACTUAL EXPERIENCE WITH SPECIFIC DATA-
ANALYTIC TECHNIQUES.”

“WE NEED TO FACE UP TO THE NEED FOR ITERATIVE


PROCEDURES IN DATA ANALYSIS.”

“WE NEED TO FACE UP TO THE NEED FOR BOTH INDICATION AND


CONCLUSION IN THE SAME ANALYSIS.”

John W. Tukey
“WE NEED TO FACE UP TO THE NECESSARILY APPROXIMATE
NATURE OF USEFUL RESULTS IN DATA ANALYSIS.”

“WE NEED TO FACE UP TO THE NEED FOR COLLECTING THE


RESULTS OF ACTUAL EXPERIENCE WITH SPECIFIC DATA-
ANALYTIC TECHNIQUES.”

“WE NEED TO FACE UP TO THE NEED FOR ITERATIVE


PROCEDURES IN DATA ANALYSIS.”

“WE NEED TO FACE UP TO THE NEED FOR BOTH INDICATION AND


CONCLUSION IN THE SAME ANALYSIS.”

John W. Tukey
JOHN CHAMBERS
Estadístico

Creador del programa S y miembro principal


del proyecto del lenguaje de programación R

ACM Software System Award (1998): “For


the S system, which has forever altered how
people analyze, visualize, and manipulate
data.”
ESTADISTAS MENORES
ESTADISTAS MAYORES

“El cuerpo de metodología


"Todo lo relacionado con el
especí camente estadística
aprendizaje de los datos,
que ha evolucionado
desde la primera
dentro de la profesión, la
plani cación o recopilación
estadística según lo
hasta la última
de nido por textos,
presentación o informe"
revistas y tesis doctorales”
fi
fi
fi
LEO BREIMAN
1928 - 2005

Estadístico - Teoría de la
probabilidad

13 años de experiencia como


consultor independiente - Dept.
Estadística UC Berkeley
La cultura de modelado de
datos -98%

La cultura de modelado
algorítmico - 2%
“IN THE PAST FIVE OR SIX YEARS, I’VE BECOME CLOSE TO THE PEOPLE
IN THE MACHINE LEARNING AND NEURAL NETS AREAS BECAUSE
THEY ARE DOING IMPORTANT APPLIED WORK ON BIG, TOUGH
PREDICTION PROBLEMS. THEY’RE DATA ORIENTED AND WHAT THEY
ARE DOING CORRESPONDS EXACTLY TO WEBSTER’S DEFINITION OF
STATISTICS, BUT ALMOST NONE OF THEM ARE STATISTICIANS BY
TRAINING.

SO I THINK IF I WERE ADVISING A YOUNG PERSON TODAY, I WOULD


HAVE SOME RESERVATIONS ABOUT ADVISING HIM OR HER TO GO
INTO STATISTICS, BUT PROBABLY, IN THE END, I WOULD SAY, ‘TAKE
STATISTICS, BUT REMEMBER THAT THE GREAT ADVENTURE OF
STATISTICS IS IN GATHERING AND USING DATA TO SOLVE
INTERESTING AND IMPORTANT REAL WORLD PROBLEMS.”
Leo Breiman
“IN THE PAST FIVE OR SIX YEARS, I’VE BECOME CLOSE TO THE PEOPLE
IN THE MACHINE LEARNING AND NEURAL NETS AREAS BECAUSE
THEY ARE DOING IMPORTANT APPLIED WORK ON BIG, TOUGH
PREDICTION PROBLEMS. THEY’RE DATA ORIENTED AND WHAT THEY
ARE DOING CORRESPONDS EXACTLY TO WEBSTER’S DEFINITION OF
STATISTICS, BUT ALMOST NONE OF THEM ARE STATISTICIANS BY
TRAINING.

SO I THINK IF I WERE ADVISING A YOUNG PERSON TODAY, I WOULD


HAVE SOME RESERVATIONS ABOUT ADVISING HIM OR HER TO GO
INTO STATISTICS, BUT PROBABLY, IN THE END, I WOULD SAY, ‘TAKE
STATISTICS, BUT REMEMBER THAT THE GREAT ADVENTURE OF
STATISTICS IS IN GATHERING AND USING DATA TO SOLVE
INTERESTING AND IMPORTANT REAL WORLD PROBLEMS.”
Leo Breiman
WILLIAM
CLEVELAND
Cientí co computacional

Profesor de estadística y CS, Purdue


University, Indiana

Director del departamento estadísticas


de Bell Labs

Visualización de datos
fi
DATA SCIENCE: AN ACTION PLAN FOR EXPANDING THE
TECHNICAL AREAS OF THE FIELD OF STATISTICS
2001

Multidisciplinary Investigations (25%): Data analysis collaborations in a collection of subject matter


areas.

Models and Methods for Data (20%): Statistical models; methods of model building; methods of
estimation and distribution based on probabilistic inference.

Computing with Data (15%): Hardware systems; software systems; computational algorithms.

Pedagogy (15%): Curriculum planning and approaches to teaching for elementary school, secondary
school, college, graduate school, continuing education, and corporate training.

Tool Evaluation (5%): Surveys of tools in use in practice, surveys of perceived needs for new tools, and
studies of the processes for developing new tools.

Theory (20%): Foundations of data science; general approaches to models and methods, to computing
with data, to teaching, and to tool evaluation; mathematical investigations of models and methods, of
computing with data, of teaching, and of evaluation.
INTERDISCIPLINARIEDAD
MATHEMATICS
FOR ML AND DS
Linear Algebra

Calculus

Probability & Statistics


TRABAJO EN CLASE

You might also like