Orígenes, evolución e interdisciplinariedad
CIENCIA DE DATOS
INTRO DATA SCIENCE Lucy Jiménez 02-08-2023
¿QUÉ ES?
CIENCIA DE DATOS
Es un campo multidisciplinario que
usa métodos cientí cos, procesos,
algoritmos y sistemas para
extraer conocimientos, ideas y
valor de los datos.
fi
ORIGEN Y EVOLUCIÓN
JOHN W. TUKEY
1915-2000
Matemático, estadístico
Algoritmos (FFT), diagrama de caja y
bigotes
Profesor Universidad Princeton
Métodos estadísticos para Bell Labs
“WE NEED TO FACE UP TO THE NECESSARILY APPROXIMATE
NATURE OF USEFUL RESULTS IN DATA ANALYSIS.”
“WE NEED TO FACE UP TO THE NEED FOR COLLECTING THE
RESULTS OF ACTUAL EXPERIENCE WITH SPECIFIC DATA-
ANALYTIC TECHNIQUES.”
“WE NEED TO FACE UP TO THE NEED FOR ITERATIVE
PROCEDURES IN DATA ANALYSIS.”
“WE NEED TO FACE UP TO THE NEED FOR BOTH INDICATION AND
CONCLUSION IN THE SAME ANALYSIS.”
John W. Tukey
“WE NEED TO FACE UP TO THE NECESSARILY APPROXIMATE
NATURE OF USEFUL RESULTS IN DATA ANALYSIS.”
“WE NEED TO FACE UP TO THE NEED FOR COLLECTING THE
RESULTS OF ACTUAL EXPERIENCE WITH SPECIFIC DATA-
ANALYTIC TECHNIQUES.”
“WE NEED TO FACE UP TO THE NEED FOR ITERATIVE
PROCEDURES IN DATA ANALYSIS.”
“WE NEED TO FACE UP TO THE NEED FOR BOTH INDICATION AND
CONCLUSION IN THE SAME ANALYSIS.”
John W. Tukey
“WE NEED TO FACE UP TO THE NECESSARILY APPROXIMATE
NATURE OF USEFUL RESULTS IN DATA ANALYSIS.”
“WE NEED TO FACE UP TO THE NEED FOR COLLECTING THE
RESULTS OF ACTUAL EXPERIENCE WITH SPECIFIC DATA-
ANALYTIC TECHNIQUES.”
“WE NEED TO FACE UP TO THE NEED FOR ITERATIVE
PROCEDURES IN DATA ANALYSIS.”
“WE NEED TO FACE UP TO THE NEED FOR BOTH INDICATION AND
CONCLUSION IN THE SAME ANALYSIS.”
John W. Tukey
“WE NEED TO FACE UP TO THE NECESSARILY APPROXIMATE
NATURE OF USEFUL RESULTS IN DATA ANALYSIS.”
“WE NEED TO FACE UP TO THE NEED FOR COLLECTING THE
RESULTS OF ACTUAL EXPERIENCE WITH SPECIFIC DATA-
ANALYTIC TECHNIQUES.”
“WE NEED TO FACE UP TO THE NEED FOR ITERATIVE
PROCEDURES IN DATA ANALYSIS.”
“WE NEED TO FACE UP TO THE NEED FOR BOTH INDICATION AND
CONCLUSION IN THE SAME ANALYSIS.”
John W. Tukey
JOHN CHAMBERS
Estadístico
Creador del programa S y miembro principal
del proyecto del lenguaje de programación R
ACM Software System Award (1998): “For
the S system, which has forever altered how
people analyze, visualize, and manipulate
data.”
ESTADISTAS MENORES
ESTADISTAS MAYORES
“El cuerpo de metodología
"Todo lo relacionado con el
especí camente estadística
aprendizaje de los datos,
que ha evolucionado
desde la primera
dentro de la profesión, la
plani cación o recopilación
estadística según lo
hasta la última
de nido por textos,
presentación o informe"
revistas y tesis doctorales”
fi
fi
fi
LEO BREIMAN
1928 - 2005
Estadístico - Teoría de la
probabilidad
13 años de experiencia como
consultor independiente - Dept.
Estadística UC Berkeley
La cultura de modelado de
datos -98%
La cultura de modelado
algorítmico - 2%
“IN THE PAST FIVE OR SIX YEARS, I’VE BECOME CLOSE TO THE PEOPLE
IN THE MACHINE LEARNING AND NEURAL NETS AREAS BECAUSE
THEY ARE DOING IMPORTANT APPLIED WORK ON BIG, TOUGH
PREDICTION PROBLEMS. THEY’RE DATA ORIENTED AND WHAT THEY
ARE DOING CORRESPONDS EXACTLY TO WEBSTER’S DEFINITION OF
STATISTICS, BUT ALMOST NONE OF THEM ARE STATISTICIANS BY
TRAINING.
SO I THINK IF I WERE ADVISING A YOUNG PERSON TODAY, I WOULD
HAVE SOME RESERVATIONS ABOUT ADVISING HIM OR HER TO GO
INTO STATISTICS, BUT PROBABLY, IN THE END, I WOULD SAY, ‘TAKE
STATISTICS, BUT REMEMBER THAT THE GREAT ADVENTURE OF
STATISTICS IS IN GATHERING AND USING DATA TO SOLVE
INTERESTING AND IMPORTANT REAL WORLD PROBLEMS.”
Leo Breiman
“IN THE PAST FIVE OR SIX YEARS, I’VE BECOME CLOSE TO THE PEOPLE
IN THE MACHINE LEARNING AND NEURAL NETS AREAS BECAUSE
THEY ARE DOING IMPORTANT APPLIED WORK ON BIG, TOUGH
PREDICTION PROBLEMS. THEY’RE DATA ORIENTED AND WHAT THEY
ARE DOING CORRESPONDS EXACTLY TO WEBSTER’S DEFINITION OF
STATISTICS, BUT ALMOST NONE OF THEM ARE STATISTICIANS BY
TRAINING.
SO I THINK IF I WERE ADVISING A YOUNG PERSON TODAY, I WOULD
HAVE SOME RESERVATIONS ABOUT ADVISING HIM OR HER TO GO
INTO STATISTICS, BUT PROBABLY, IN THE END, I WOULD SAY, ‘TAKE
STATISTICS, BUT REMEMBER THAT THE GREAT ADVENTURE OF
STATISTICS IS IN GATHERING AND USING DATA TO SOLVE
INTERESTING AND IMPORTANT REAL WORLD PROBLEMS.”
Leo Breiman
WILLIAM
CLEVELAND
Cientí co computacional
Profesor de estadística y CS, Purdue
University, Indiana
Director del departamento estadísticas
de Bell Labs
Visualización de datos
fi
DATA SCIENCE: AN ACTION PLAN FOR EXPANDING THE
TECHNICAL AREAS OF THE FIELD OF STATISTICS
2001
Multidisciplinary Investigations (25%): Data analysis collaborations in a collection of subject matter
areas.
Models and Methods for Data (20%): Statistical models; methods of model building; methods of
estimation and distribution based on probabilistic inference.
Computing with Data (15%): Hardware systems; software systems; computational algorithms.
Pedagogy (15%): Curriculum planning and approaches to teaching for elementary school, secondary
school, college, graduate school, continuing education, and corporate training.
Tool Evaluation (5%): Surveys of tools in use in practice, surveys of perceived needs for new tools, and
studies of the processes for developing new tools.
Theory (20%): Foundations of data science; general approaches to models and methods, to computing
with data, to teaching, and to tool evaluation; mathematical investigations of models and methods, of
computing with data, of teaching, and of evaluation.
INTERDISCIPLINARIEDAD
MATHEMATICS
FOR ML AND DS
Linear Algebra
Calculus
Probability & Statistics
TRABAJO EN CLASE