Data-Science Content
Data-Science Content
I'm wondering if there are a lot of members who are interested in / work in -- this field.
For example, which book / learning-courses you prefer, which apps you prefer, etc.
With so many of these apps hosted here on SoftArchive, I hope I'm not the only data-science nerd here!
Danfo.js is a nice tool for Javascript, seems like Panda for Python
https://danfo.jsdata.org/
Data science has evolved from many disciplines, so a good reference book is helpful:
https://sanet.st/blogs/kingbooks/encyclopedia_of_machine_learning_and_data_mining_second_edition_epub.
2348680.html
@stefcab - Thank you for the note about Danfo.js -- although I'm not a javascript person, that is a brilliant library for
anyone that uses JS.
Although I code primarily in C#, I prefer Python for data science, because Python is just so damn easy and it's
basically made for it, with its unique structures.
Plus, the actual code (not the libraries, but the "glue" code that we write as data scientists) doesn't need to be
performant, because 99.99999% of the processing time is in the data side. It doesn't need a GUI or multi-threaded,
etc. In short, there really isn't any benefit (for me) to use anything else.
Of course I spent more time in SQL than anything else (when working in data science), but that's just a given, lol!
shikigami2010 - If you are interested in learning, I would download a tutorial on Jupyter Notebook, along with the
install. Basically this is a turnkey setup that will get you everything you need to get started, with the best libraries
and the easiest/GUI-style point-and-click. I can't imagine any easier way to get started in data science (I'm assuming
you already have SQL experience; if not, then SQL would technically be an easier way to start).
I just found the best SQL book for beginners on SoftArchive too. Here's the link:
https://sanet.st/blogs/mecury-books/a_guide_to_sql_mindtap_course_list_th_edition.4067213.html
This is actually a college textbook (which normally would be terrible). However, in this case, it's phenomenal, one of
the easiest ways to get up-to-speed. I got it for a friend that's always wanted to learn SQL and she said it made
sense to her, more than other books I've sent her -- although she only got about a chapter in.
Reply
Like
lionking2k, zemane, yangyun, and 6 others like this.
Sstefcab
o
o
Apr 12, 2022
Edited
Hi,
Reply
Like
zemane, MediaFanatic, and FelixGurero like this.
MediaFanatic
o
Apr 12, 2022
@stefcab - What sort of projects are you building to learn Tensorflow?
For example, classification, image recognition, text/sentiment analysis, audio/video interpretation work, etc?
If you have not already discovered this -- the easiest way to learn is to pick an application that is similar to the type
of work you'll be using it with. Not only because the sample code is similar -- but it's so much easier to understand
because the use-cases will make much more sense as you're learning.
Reply
Like
fatproton replied to this.
stefcab and FelixGurero like this.
Sstefcab
o
o
Apr 14, 2022
First learning Tensorflow, allows me to revive linear algebra, vectors, matrices.
This powerful tool allows for text recognition, syntax, knowing user habits with the UI, sorting documents
Reply
Like
MediaFanatic replied to this.
MediaFanatic and FelixGurero like this.
MediaFanatic
o
Apr 14, 2022
stefcab
allows for text recognition, syntax, knowing user habits
Absolutely! But you're going to start with one single project / approach in mind, correct? I just want to be sure of
this; if you approach it from one form alone, first (eg - "sentiment analysis" would be similar to your "user habits"
point). This will make it much, much easier to get a firm-grip when you first start.
If you can find a video that walks you through one single niche project (and remember -- it does NOT matter what
language they use -- you're learning TensorFlow library and the concepts apply across languages). If you can start
with one niche, real-world project -- I promise you that you will learn 10x faster and retain it far more than if you
learn it academically and universally.
Reply
Like
lionking2k and FelixGurero like this.
Sstefcab
o
o
Apr 14, 2022
Yes starting with one single and simple project. it is essential to reinforce the basics (matrices, vectors), fully
understand what you want to do, and yes the basics of Tensorflow.
Reply
Like
MediaFanatic and FelixGurero like this.
Mmustafa335
o
Apr 15, 2022
I'm currently reading this book, although 8th edition is released but not available on SaNet yet, do you think it's a
good choice to learn statistics for data science, and if there is a better alternative, I'm interested
https://sanet.st/blogs/mecury-books/
elementary_statistics_picturing_the_world_th_edition_global_edition.3980768.html
Reply
Like
MIZANing replied to this.
MediaFanatic, FelixGurero, and restfulwait like this.
FFelixGurero
o
Apr 15, 2022
Edited
Can someone please suggest good beginner Introductory books or book on "Data Science" explaining
Thank you.
Reply
Like
AlbinoIvan and MIZANing replied to this.
mustafa335 and restfulwait like this.
MediaFanatic
o
Apr 15, 2022
@mustafa335 - Yes, that is one of the better books I've seen on the topic.
Normally college textbooks are not ideal for self-teaching; however, modern books produced by either (a) Pearson
or (b) Cengage, are some of the most costly books to author and produce. They typically are the most visual and
easiest to understand when the topic is complex.
They are also extremely well-organized and their topics are well-researched.
In addition, yes, it's good to start with either (a) statistics, (b) databases, or (c) general programming -- BEFORE --
you get into Data Science. Because this field relies on all three of those skills. I would say database understanding
more than anything else. That's good news because database work is some of the easiest to learn. The problem is
that you must learn it exceptionally well. You can't do data science with a beginner's understanding of databases,
unfortunately.
You CAN do it with a beginning level of statistics or programming. Databases are the one thing that require a little
bit more experience to fundamentally understand / work in data science.
@FelixGurero - You asked this above and I started going through course/books in hopes of finding something to
help you. Unfortunately every one that I've tested, so far, has been relatively bad.
To me, it almost seems like these people don't truly work in data science and instead they are just trying to earn
money because there isn't anything out that is great. Therefore, they read some books or learn some basics, then
make a course.
I will keep looking -- and if you have a course you'd like me to review and give you an opinion, let me know.
That includes anything you find on Youtube. Because I've seen more decent videos on Youtube, than free videos
here. Unfortunately those videos on youtube were over the past years and I have not maintained links.
I hope that help; I'm sorry I don't yet have better news for you!
Reply
Like
mustafa335 and FelixGurero replied to this.
mustafa335, MIZANing, and FelixGurero like this.
Mmustafa335
o
Apr 15, 2022
MediaFanatic
Thank you for the reply and the feedback.
I have started with learning Python, almost finished the book Starting Out with Python 5th Edition by Tony Gaddis, It
has 14 chapters, and I think I have finished chapter 10.
I took a break, will return to finish it soon, I'm also waiting for the book "Introduction to Python Programming and
Data Structures Global Edition" by Daniel Liang which will be released on may, although one book might seems
enough, but Daniel Liang is one of the best authors in programming languages field.
And I'm one of those who like the style of College Textbooks, doing lots of exercises is the best part to master the
topic.
My main goal right now to master math for data science, although I know tow very well written books about the
topic, one by Manning Publication
https://sanet.st/blogs/mecury-books/math_for_programmers_d_graphics_machine_learning_and_simulations_
with_python_final_release.3555869.html
And the other one is buy O'Reilly which is on the early release stage
https://sanet.st/blogs/booook/essential_math_for_data_science_fifth_early_release.4078350.html
I prefer reading College textbook specified in specific topic each.
For Calculus I have chosen Calculus Early Transcendentals by James Stewart
https://sanet.st/blogs/wowbooks/calculus_early_transcendentals_ed.3689358.html
For Linear Algebra I have chosen Linear Algebra and its Applications 6th Edition
https://sanet.st/blogs/mecury-books/linear_algebra_and_its_applications_th_edition_global_edition.3842856.
html
For database I have chosen Learning SQL: Generate, Manipulate, and Retrieve Data, 3rd Edition
https://sanet.st/blogs/tarantoga2019/learning_sql_generate_manipulate_and_retrieve_data_rd_edition_pdf.34
79800.html
and Database System Concepts, 7th Edition
https://sanet.st/blogs/mecury-books/database_system_concepts_th_edition_true_pdf.3427237.html
Not to forget for R language the choice is always R for Data Science
https://r4ds.had.co.nz/
After I'm done with these books there are long list starting with Data Science from Scratch: First Principles with
Python, 2nd Edition
https://sanet.st/blogs/mecury-books/data_science_from_scratch_first_principles_with_python_nd_edition_tru
e_pdf.3834775.html
Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python
https://sanet.st/blogs/mecury-books/machine_learning_with_pytorch_and_scikit_learn_develop_machine_lear
ning_and_deep_learning_models_with_python.4045883.html
Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python, 2nd Edition
https://sanet.st/blogs/tarantoga2019/practical_statistics_for_data_scientists__essential_concepts_using_r_and_
python_nd_edition_pdf.3348516.html
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build
Intelligent Systems
2nd Edition is available, 3rd Edition is on the Early release stage
https://sanet.st/blogs/booook/hands_on_machine_learning_with_scikit_learn_keras_and_tensorflow_rd_editio
n_second_early_release.4061934.html
Python Data Science Handbook, 2nd Edition
https://sanet.st/blogs/booook/python_data_science_handbook_nd_edition_second_early_release.4076127.ht
ml
And there are really lots of excellent books about Machine Learning and Deep learning or Data Science in general, I
That is why my plan right now to focus on Statistics, then Linear Algebra, and Calculus.
I have studied those maths at college, but need a refresh and learn them the right way.
After maths will return to Python, then R, and Database.
Finally will start to look at books for Data Science, Machine Learning, and Deep Learning.
Reply
Like
rraghavus and MIZANing replied to this.
zemane, AlbinoIvan, MediaFanatic, and 5 others like this.
Rrraghavus
o
Apr 16, 2022
mustafa335
An addition to your list
- https://sanet.st/blogs/mecury-books/building_data_driven_applications_with_danfo_js_a_practical_guide_to
_data_analysis_and_machine_learning_using_javascript.3882839.html
Reply
Like
mustafa335, MediaFanatic, and FelixGurero like this.
FFelixGurero
o
Apr 16, 2022
MediaFanatic
Yes you are right. I have browsed through several books and courses, but none of them have a proper introduction
to data science for a beginner.
Many of the recommended books on data science/machine learning are not for beginners.
All books or courses start with either python or statistics and move into machine learning but does not give
complete picture about Data Science.
My search led to this book. It has some introduction. Please give your opinion on this book.
https://www.amazon.com/Building-Effective-Data-Science-Practice/dp/1484274180
If you could suggest some Data Science Youtube videos, that would be really great.
Reply
Like
MediaFanatic replied to this.
MediaFanatic likes this.
AlbinoIvan
o
Apr 16, 2022
Edited
FelixGurero Data science is the use of data to try to understand and solve real-world problems. It's a concept that
has been used by human civilization for thousands of years so it's by no means a new field. With the advent of cloud
computing and cloud storage there has been exponentially more data generated than was previously in existence
and with it the field of 'Data Science' was created. CRISP-DM is a process model that describes six phases of the
data science life cycle.
Drew Conway's Data Science Venn Diagram
Conway's interpretation of Data Science can be reduced to three major skill fields.
It isn't a binary case of either you have each of the skills or you don't. It's impossible for anyone to know everything,
but it is possible to learn the fundamentals of a field and then focus on particular skills.
Machine Learning is primarily the intersection of the fields of Maths/Statistics with Programming/Databases.
I'll focus on the Machine Learning path because Domain Knowledge is specific to the area where data analytics and
decision science is applied.
If you want brush up on your Maths & Statistics knowledge then take a look at the various OpenStax textbooks.
Those books are used by high schools and university courses in the United States and should give you a good
grounding. You are then ready for Linear Algebra with Gilbert Strang after which you will have the knowledge of
the basics of matrix theory and linear algebra.
No Bullshit Guide to Linear Algebra helps piece together the Maths which can be particularly helpful if you chose
to dig into Deep Learning later.
That would be the fundamentals of Maths and Statistics covered but you need to be able to apply it to solving
problems for which I would recommend the Grokking series of books.
If anyone wants to learn R, then the lecturers of the Data Science courses at John Hopkins University have published
a number of books on the use of the R language which are used for the basis of the Data Science course at the
university.
R books @ Leanpub
R Programming for Data Science
Exploratory Data Analysis with R
Tidyverse Skills for Data Science in R
Data Analysis for the Life Sciences
Reply
Like
lionking2k, zemane, yangyun, and 8 others like this.
MediaFanatic
o
Apr 16, 2022
Edited
I think some people may have missed @mustafa335 's post above. It is an amazing post.
If you haven't read this post already, please review that -- many good resources (just above):
https://sanet.st/community/d/21245-only-for-statsdata-science-nerds/14
Thank you to @AlbinoIvan as well, these are also great resources.
If enough people are interested, we may be able to begin posting weekly (or so) posts, to create a coursework,
because there are so few good options.
We can include a list of resources to work from, by combining all of the books/tutorials that we believe to be best.
Course Outline
..Install a Python IDE (eg PyCharm, etc)
..Python Basics w/ Focus on Arrays, Data Structures
..NumPy Basics; Short intro to SciPy
..Data Frame Basics
..Pandas Basics
..GIS Basics
..PostgreSQL Basics + psycopg2 in Python
..pETL for Python (ETL Import/Clean Data Library) Basics
..Download large sample dataset (w/GIS), Import to PostgreSQL
..Jupyter Notebook Basics
..Initial Data-Science project in Jupyter Notebook + Python + PostgreSQL
..Additional Data-Science by incorporating A.I. / M.L. (Sci-Kit) + SciPy + PostgreSQL
..GIS + A.I. + Data Science Project
IMHO -- This organization is 100x better than any course I've found on the topic.
If anyone disagrees, please post and let me know. It's the only way we can learn / optimize.
Better yet ... If anyone can find a similar course to this proposed layout, please let me know -- it will save me
hundreds of hours
FelixGurero - I have not reviewed it yet. I noticed it's available here on SoftArchive:
https://sanet.st/blogs/booook/building_an_effective_data_science_practice_a_framework_to_bootstrap_and_
manage_a_successful_data_science_practice.3965558.html
I will review soon and let you know. Thank you!
Reply
Like
Urasium and mustafa335 replied to this.
huu2014, zemane, AlbinoIvan, and 3 others like this.
UUrasium
o
Apr 17, 2022
MediaFanatic
This is really an amazing suggestion and contribution. Many thanks
Reply
Like
huu2014, MediaFanatic, and MIZANing like this.
Mmustafa335
o
Apr 17, 2022
MediaFanatic If enough people are interested, we may be able to begin posting weekly (or so) posts, to create a
coursework, because there are so few good options.
I'm one of those who are interested.
Reply
Like
zemane, MediaFanatic, and MIZANing like this.
5 DAYS LATER
AlbinoIvan
o
Apr 21, 2022
Edited
I don't think that I've ever seen it posted on the site but if people are interested in getting into AI and Deep Learning
then Andrew Ng's courses are a great way to get to grips with the subject. I availed of an opportunity to take the
Deep Learning Specialization on the Coursera platform a few years ago and loved his teaching method. He keeps
everything as simple as possible by ignoring the complicated aspects but not too simple so that the student is still
challenged to learn. Then later the complicated bit that was previously ignored is no longer so daunting and it's
easily explained with his insight. I have described it as a Jedi mind trick the way he makes such difficult topics seem
trivial to tackle without in depth math knowledge.
Reply
Like
silversurfer, MediaFanatic, MIZANing, and FelixGurero like this.
MediaFanatic
o
Apr 22, 2022
Edited
AlbinoIvan - Thank you for the excellent resource for this post.
In addition to being a good resource -- you clarify a problem I've been explaining...
The reason that there are not many good courses on "data science", partially because it can become Machine-
Learning-based, rather than understanding there are many fundamentals that are more squarely under the topic of
"data science".
If you consider that AI/ML is primarily designed for pattern-matching based on insanely-large datasets -- you realize
that most professional data-scientists (an area I have worked professionally) spend far more time in hands-on data-
science endeavors, than in AI/ML.
I think AI/ML is wonderful and it's crucial -- however, for those "learning", it would be a major mistake (IMHO) to
learn AI/ML before learning the real-world knowledge/work required in data science.
...In short, you've provided a great resource and highlighted the exact problem with many of the courses on this
subject. For example, it would be horrible if someone attempted to get work as a data scientist, knowing AI/ML, then
not being able to "wrangle" the data properly to accomplish either of those. This is why data "wrangling" (ETL), SQL,
NumPy/Pandas, and statistical tools (eg regression) are so crucial. They precede and are far more common, than the
point where we have enough data to begin leveraging the valuable AI/ML libraries.
EDIT: While going through my books tonight, I found a great "R" book from SA that goes into these real-world
topics in a very logical, effective order: "The Big R Book"
-- https://sanet.st/blogs/boooks/the_big_r_book_from_data_science_to_learning_machines_and_big_data.352
2482.html
Reply
Like
mustafa335 replied to this.
mustafa335, MIZANing, and FelixGurero like this.
Mmustafa335
o
Apr 22, 2022
MediaFanatic
Thanks for pointing that out, here is the True PDF of the book that you have mentioned.
https://sanet.st/blogs/mecury-books/the_big_r_book_from_data_science_to_learning_machines_and_big_data
_true_pdf.3616734.html
Reply
Like
MediaFanatic, MIZANing, and FelixGurero like this.
Mmustafa335
o
Apr 22, 2022
These are other two awesome books worth reading
Database System Concepts, 7th Edition
And Mathematics for Machine Learning 1st Edition
And I have found the Course by Krista King are the best, she has 9 courses on Udemy, all are uploaded on SaNet, the
courses cover topics from prealgebra, going through probability and statistics, and finishing by calculus and
differential equations
Calculus 3 includes Differential Equations which she removed from udemy and made it seperated course on her
website, but the older uploaded calculus 3 course contains the deferential equations part
Krista King Math Site
Krista King Udemy Courses, Copy the name and search it on Sanet
All Math books written Written by Ron Larson worth reading, You can make a comparison between his Calculus
book, Stewart's book, and Thomas book.
Here is an interesting discussion about best calculus books on Quora, it may help you pick the best Calculus book
What are the best calculus books?
I'm also interested in knowing from Sanetter, what are their thoughts of best Calculus books.
Reply
Like
silversurfer, yangyun, robodo, and 3 others like this.
Ffatproton
o
Apr 24, 2022
MediaFanatic
There is a nice practical introduction ebook to ML in general with TF.
https://sanet.st/blogs/mecury-books/
machine_learning_with_tensorflow_nd_edition_true_epub_mobi.3612968.html
Reply
Like
mustafa335, MediaFanatic, FelixGurero, and fatproton like this.
MIZANing
o
Apr 24, 2022
mustafa335
I've read the table of Contents of that book, and all of its topic are discussed very clearly by Brandon Foltz in his you
tube channel :
https://www.youtube.com/c/BrandonFoltz/playlists
I almost watched all of his videos at least twice and reproduced the results using minitab, Statgraphics and jmp. My
Favorite stat soft is jmp which I personally consider the most powerful statistical tool. I don't know
why @MediaFanatic didn't mention it in his first post.
Reply
Like
MediaFanatic replied to this.
mustafa335, MediaFanatic, and FelixGurero like this.
MIZANing
o
Apr 24, 2022
FelixGurero
Last year I have started to dig deep into statistics and I found this channel very helpful:
https://www.youtube.com/c/joshstarmer/playlists
I hope you find it useful too.
Reply
Like
mustafa335, MediaFanatic, and FelixGurero like this.
MIZANing
o
Apr 24, 2022
mustafa335
Here is a nice book, I haven't read it yet but when I went through it, I found it very informative.
https://sanet.st/blogs/ebookdownload/introduction_to_probability_for_data_science.3955421.html
Reply
Like
mustafa335 replied to this.
mustafa335, MediaFanatic, and FelixGurero like this.
Mmustafa335
o
Apr 24, 2022
MIZANing
It looks interesting, thank you for pointing it out.
What do you think of Database System Concepts 7th Edition?
Is it good for data science?
Reply
Like
MediaFanatic and MIZANing replied to this.
MediaFanatic, MIZANing, and FelixGurero like this.
MediaFanatic
o
Apr 24, 2022
Edited
MIZANing
My Favorite stat soft is jmp which I personally consider the most powerful statistical tool. I don't know
why @MediaFanatic didn't mention it in his first post.
Because I've never heard of it, lol ... Thank you for the suggestion, I'm going to play with it now! UPDATE: I didn't
realize "jmp" is the name of SAS' app. I haven't used SAS in a long time, since it was under a different name. Looking
forward to this.
I'm reviewing your Statistics book now (Probability for Data Science). UPDATE: Great book for people that learn fast
or have a good background in mathematics. Also good for people that know statistics and want a quick-reference;
well-organized, no-nonsense -- good examples -- short descriptions. It's strange because the pages are divided in
the middle of quotes/charts and I suspect it was originally published in ePub format, possibly?
Thank you as well for the Youtube resource, this sounds like it will be helpful for many. I could rip the course and
have one of the SA uploaders upload, if you think it would be worthwhile to have as an offline resource?
In my opinion, if you cover (a) relational theory, and (b) SQL query language, this is the best time/knowledge
tradeoff. In other words -- all topics you miss will still be tremendously valuable; however, to move forward with
other technologies and techniques, those are probably the first two you must master before moving
forward .
Reply
Like
mustafa335 and MIZANing replied to this.
mustafa335, MIZANing, and FelixGurero like this.
Mmustafa335
o
Apr 24, 2022
MediaFanatic In my opinion, if you cover (a) relational theory, and (b) SQL query language, this is the best
time/knowledge tradeoff
Thank you for the tip, I'm currently almost done with the Introduction chapter, then will move to Relational theory,
and afterwards SQL.
MediaFanatic all topics you miss will still be tremendously valuable; however, to move forward with other
technologies and techniques, those are probably the first two you must master before moving forward
That was really helpful, because the book is +1300 pages and has extra 200 pages as online chapters, doing it all at
once will take longer.
The remaining chapters could be done without rushing
What do you think of these Calculus books, and do you have a good alternative?
Answer to What are the best calculus books? by John P. Keller
And these ones
Answer to What are the best calculus books? by Leonardo Borges
I have picked the one by Ron Larson the same author of Elementary Statistics Picturing the world, but since I didn't
start yet with it, I'm still searching and comparing with other books
Reply
Like
MediaFanatic replied to this.
MediaFanatic and FelixGurero like this.
MediaFanatic
o
Apr 24, 2022
Edited
mustafa335 - If you are asking about Calculus books in the context of data science then I would definitely
recommend against those books.
Analogy: Imagine you designing the interior of an office building, with hundreds of different aspects. If you wanted
to pick the best chairs for your specific needs, would you read a book on how to manufacture chairs -- that's
thousands of pages and discusses threads, springs, plastics, etc? Or would you choose a pamphlet that explains the
benefits and trade-offs of one versus another?
As a developer and data scientist, I am terrible at Calculus. I'm great at Algebra - because I use it frequently.
However, Calc, not so much. Therefore, when I need Calc, I look up what I need, a bit like an encyclopedia.
I am in a rush so I cannot provide the links, but I have verified all of these books are available here -- some require
Nitroflare due to the age. However, I think these are far better in the limited context of data science (rather than a
professinoal mathematician).
In this case, the Dummies book is probably the best for learning. Visually is similar to Dummies with more
colors/spacing -- very nice and easy; however, Dummies is more thorough and equally easy.
Calculus Concepts (three different volumes) is the best option for your very long, college textbook, reference
encyclopedia. Just ignore the first chapter, which is basically an algebra test to be sure you're ready for calc.
....That point re:Algebra reminds me -- in most cases, pre-calc books might be better than Calc books (as a starting
point), if you're thinking about college-level texts.
Reply
Like
FelixGurero replied to this.
huu2014, mustafa335, MIZANing, and FelixGurero like this.
FFelixGurero
o
Apr 25, 2022
Edited
MediaFanatic Your course outline is great.
May be 'Apache Spark' should be added to the course outline. Thank you.
@AlbinoIvan Thank you very much for the great explanation. Many great suggestions. Thank you.
@MIZANing Thank you for the youtube channel Statistics recommendation.
Reply
Like
MIZANing replied to this.
huu2014, mustafa335, MediaFanatic, and MIZANing like this.
FFelixGurero
o
Apr 25, 2022
Edited
1. MediaFanatic idea is good. It will be great if we all Data Science/Machine learning enthusiasts can
collaborate and make learning data science/machine learning enjoyable.
2. There are so many courses and books for these topics. Each topic has hundreds of courses and books
a) Statistics b) Probability
c) Algebra d) Calculus
e) Python
F) Data Science g) Machine Learning
There are many sources for learning - Udemy, Coursera, Udacity, EDX, Pluralsight , Youtube, and many more.
On top of that Books.
If we have to go through hundreds of books and courses (meaning many) to learn Data Science/ML, it will take 2 or
3 years or forever to learn DS/ML.
3. It was very good when we were in school. We just had 1 prescribed/recommended textbook each for Math
& Science every year. We just had to go through that 1 book and we could pass the exam.
4. For tech, I think there is no standardized format or syllabus to learn something & get a job. It is like we have
to go through many courses and books and years of study to learn something or get a job. And this learning
never ends ....
Reply
Like
huu2014, mustafa335, MediaFanatic, and MIZANing like this.
MIZANing
o
Apr 25, 2022
Edited
mustafa335 What do you think of Database System Concepts 7th Edition?
Sorry, I'm not qualified to answer that question,... yet. I wanna learn SQL but for the moment I don't have any big
why. am a quality manger my background is electro-mechanical engineer and I have other subjects I would like to
master such as PLCs, PIC microcontrollers,...
My main subjects is quality improvement using statistics (SPC, DOE, MSA,...) and operations research, I started to dig
deep in stat only to know how to effectively use the advanced modeling techniques in jmp. My motto is: "You don't
have to be a mechanic to drive a car" . You just need to know a few things and let the software do the
hard work.
AlbinoIvan
o
Apr 28, 2022
MediaFanatic That looks interesting from the perspective of a catch all course but I'd have some reservations about
it. The statistics portion of the course seems light, for example there is no mention of Type I and Type II errors,
Maximum Likelihood Estimation (MLE) or Bayes. I see that P-Value has a heading which is generally how the Type I
error is reported but I see nothing to suggest that the course conveys an understanding of Type I or Type II errors.
The Machine Learning portion of the course covers Logistic Regression and 'Bayes Algorithm' so it would worry me
that the creator skipped over the underlying maths, i.e. Maximum Likelihood Estimation (MLE) and Bayes Theorem in
the statistics portion.
Reply
Like
MIZANing replied to this.
MediaFanatic and MIZANing like this.
AlbinoIvan
o
Apr 28, 2022
I took a post graduate college course on Statistics and Data Analysis some time ago and the statistics component
was a nightmare. In the first 30 minutes of the first lecture of the module the lecturer covered everything that I had
previously learned as a Computer Science undergraduate after that I was playing catch up. There was plenty of
mention of 'power' and 'robustness' of tests, stating of 'assumptions', the transforming of data to fit a distribution
but there was no explanation of the 'why'. There was an assumption that "this was easy" and "you should remember
from your undergraduate course" (over 20 years before), the lecturer left no room for discussion or to put it another
way I don't think they wanted to be challenged and the students were just supposed to follow their prepared script. I
made it through the module and passed the exam (56%) which surprised me. I thrived in the Data Analysis module
partially because the lecturer was grounded in the real world and had worked in an industry to which I could relate.
They also gave some additional lectures to properly cover the statistics module because in their view the approach
of the previous lecturer was a disaster and left us with a very poor foundation for the Data Analysis module. His
explanation for the difference of their approaches was that the Statistics lecturer was an academic researcher who
had the luxury of dealing with cleaned data and ignored outliers in the data because it allowed them to fit the data
to the assumptions for their tests and you can't pull that stunt in the real world where the outcome of the results
impacts on decisions that affect people's lives - removing outliers harms the data which is bad in an industry setting,
but for the academic researcher removing the outliers enhances the reproducibility of results.
Here's a book that touches on some of the things that the lecturer had to say on the subject of preparing data and it
also covers elements of what the statistics lecturer was unable or unwilling to explain.
Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting
Your Data
I'll try to dig out some more resources that I came across during my time on the course.
Reply
Like
MIZANing replied to this.
robodo, MediaFanatic, and MIZANing like this.
MIZANing
o
Apr 28, 2022
Edited
MediaFanatic
For someone who wants to see the big picture, I think this is a good start.
AlbinoIvan for example there is no mention of Type I and Type II errors
Here is an in-depth explanation of Type I and Type II errors (6 videos _ 2hrs 49mn)
https://www.youtube.com/c/BrandonFoltz/search?query=type%201%20error
and this is my RECAP of the 6 vids, only the first two slides are mine, the others are screenshots.
https://docs.google.com/presentation/d/17i0h1jn5fq210uezl9AK1ZLGxmUnO41w/edit?
usp=sharing&ouid=109303909641266894542&rtpof=true&sd=true
Logistic regression and MLR
There are two good courses here on SANET that I took: one for logistic regression (3h 3mn) and one for multiple
regression (4h 47m).
https://sanet.st/blogs/missa/binary_logistic_regression_with_minitab_ssmbb_level_topic.3636760.html
https://sanet.st/blogs/udemy1/mastering_multiple_regression_with_minitab_a_deep_dive.3487784.html
For Model diagnostics here is my RECAP:
https://drive.google.com/file/d/1Zmzpg2A7z08GT0cbYWhqHnGP5bWXWEBP/view?usp=sharing
From this book I have read in 2017 (page 20)
https://sanet.st/blogs/booknew/response_surface_methodology_process_and_product_optimization_using_d
esigned_experiments_rd_edition.2149141.html
HTH
Reply
Like
robodo and MediaFanatic like this.
MIZANing
o
Apr 28, 2022
Edited
AlbinoIvan removing outliers harms the data which is bad in an industry setting
I agree with you 100%, but there are exceptions from time to time.
I came across a case study in which the consultant came up with a plan for the experiments (DOE) and gave it to the
operator for implementation. At the end of the day, he is tired and filled in the last value without carrying out the
experience... After removing the value (outlier), they got a good model that solved the company's problem.
BTW, Thank you @AlbinoIvan for the book, I went through it and it makes me wanna read it.
Reply
Like
AlbinoIvan replied to this.
MediaFanatic likes this.
AlbinoIvan
o
Apr 28, 2022
MIZANing
At the end of the day, he is tired and filled in the last value without carrying out the experience... After removing the
value (outlier), they got a good model that solved the company's problem.
That's not really an outlier, it's an example of MNAR data (Missing Not At Random). The value was missing because
the operator chose not to perform the task and instead introduced an element of bias with the value they chose to
replace the missing value with. Removing the bias improved the model.
Reply
Like
MIZANing replied to this.
MIZANing likes this.
MIZANing
o
Apr 28, 2022
Edited
AlbinoIvan it's an example of MNAR data (Missing Not At Random).
I'm not aware of MNAR concept.
I think I misstated the issue. When the consultant started analyzing the plan, there was no missing value and you
know the first step that everyone should do is to plot the data, that's when he noticed the outlier...After
the investigation, they (consultant + CEO) figured out what was happened in the workshop.
Based on this clarification, is it still MNAR?
Reply
Like
AlbinoIvan replied to this.
MediaFanatic likes this.
MediaFanatic
o
Apr 29, 2022
Edited
@AlbinoIvan - Thank you for the comment.
The ironic thing is that I felt this course went into a lot more detail in statistics than any other well-rounded video
tutorial series I've seen.
IMHO, statistics is too complex and therefore, requires an entirely separate discipline / learning-track. Much the
same as writing code. For example, not a single one of these "Data Science" courses go into any detail in code
(beyond the simplest fundamentals required for their demos), or complex database topics, etc.
This all goes back to the original point: It's impossible to cover everything in data science. Data Science
requires the culmination of far too many disciplines.
If you have found a Data Science course with a better statistics component, will you let us know? It's been many
years since I actively worked on the heavy stats side. More advanced statistics are not common in most real-word
Data Science projects I run across. They might believe it to be; however, once you get into things and realize the
data is incomplete (at best) or relatively small N's to obtain reliable regressions... Eventually it comes down to fairly
basic premises.
That's just my 20 years, or so, primarily in financial/marketing/human-capital niches. Therefore, this opinion is just
that: subjective. Others experiences may be different.
I digress -- if you have any DS courses you prefer, please link here for others. Thank you!
Reply
Like
AlbinoIvan replied to this.
silversurfer, AlbinoIvan, and MIZANing like this.
AlbinoIvan
o
Apr 29, 2022
MIZANing Working backwards ...
The consultant created a plot of the data and discovered the outlier.
Consultant and CEO investigated the cause of the outlier.
The operator had created the outlier.
It wasn't the operator's intention to create the outlier, maybe they were tired and wanted to go home so they
skipped the final task and wrote down a value.
In Statistics and Data Analysis there is a problem posed by missing data. Why the data is missing determines how it
is handled.
MCAR - Missing Completely At Random. This assumes that the data with and without the missing value comes from
the same distribution and the missingness (for want of a better term) is independent of any unobserved data.
MAR - Missing At Random. This assumes that the data with and without the missing value comes from the same
distribution and the missingness is dependent of some elements of the observed data.
MNAR - Missing Not At Random. This assumes that the missing data is due to an outside factor not present in the
data so the missingness is dependent on unobserved data.
My interpretation of your example is that the operator was pressed for time, they wanted to leave work so they
skipped the final task. In doing so there was going to be missing data provided to the consultant and possibly in
order to avoid any questions they simply created a value.
The reason the data would be missing was because the operator was in a hurry - my lecturer explained that human
behavior is the top reason for MNAR data and gave an example of how such a thing cost a large tech company
nearly $500M almost 30 years ago.
What the operator did by replacing the missing data with a value they chose is an example of what you are not
supposed to do with MNAR data - never insert a value. It is coincidence that the value they used was an outlier when
the consultant plotted the data. If the value chosen had fit within the distribution of the data it would not have been
discovered in the plot.
Reply
Like
MIZANing replied to this.
MIZANing likes this.
Ttommytoolman
Apr 29, 2022
MediaFanatic Yes I love this idea. I work in this area although only come back to it in the last 6 months or so. I want
to make it my career now.
Reply
Like
MediaFanatic likes this.
AlbinoIvan
o
Apr 29, 2022
MediaFanatic I'm by no means an expert myself but in my own experience statistics as a subject is difficult to learn
because there are few resources that explain the intuition without the formulas. The statisticians that create the
resources need you to understand their language in order to join their club. This may not be their intention, in some
cases the high functioning individual just lacks the skill to communicate the subject to mere mortals. I once asked
my statistics lecturer to explain how to approach an issue regarding a regression analysis like I was a 5 year old and
they lost the plot before referring me to the course reading list.
My own background is in software development so the coding and data structures aspect almost come natural to
me. I began to program when 16kb was a lot of memory and learned that to get the most performance you
sometimes had to write machine code because the compiled code wasn't always the most efficient. That skill gave
me an edge when I was looking for my first proper job because in my interview I could articulate why one code
fragment was better than another, they both did the same thing but at the level below their path through the
processor was different and that was what determined the 10% difference in performance. That's pretty geeky stuff
but that's who I was, a function of my childhood trying squeeze performance out of a little box.
Knowing how the underlying architecture works gives a bit of resilience to deal with issues when things don't go to
script. I feel the same is true of Data Science and a lot of the material I have seen, I would describe as scripted cookie
cutter material. There's no room for "what if" or areas where things could go wrong are avoided as being "outside of
the scope" of the material. An individual doesn't need to be a maths or statistics genius to gain some insight into a
regression analysis, if what's going on can be explained in simple and understandable words so that anybody with a
solid grasp of language can follow the concept then the steps to take in order to tweak the output can be better
informed. This knowledge can be applied to shape the data used as input to the analysis and yield better results -
Andrew Ng makes the same point in his Machine Learning course.
For gaining insight into statistics without relying on maths, a friend of mine has recommended the following book.
It's recommended reading for their college course and says it helps to avoid embarrassing situations like "explain it
to me like I'm a 5 year old".
Statistics, Fourth Edition (David Freedman, Robert Pisani, Roger Purves)
I haven't seen the book on the site but I'll check with my friend if there is a pdf available - those college students can
be very resourceful.
MIZANing
o
Apr 29, 2022
Edited
AlbinoIvan
I digress...
Based on your reply, I'm going to say that as it sits .. this course remains one of the better courses, particularly for
statistics. If we find something better, that would be great. For a diverse video course on many topics, for the
time being this ?appears? to be the best recommendation we have for people at that stage in their learning
process?
@tommytoolman - Thank you for the introduction. Welcome to our small Data Science group .
Reply
Like
AlbinoIvan replied to this.
MIZANing likes this.
MIZANing
o
Apr 29, 2022
mustafa335
here is the book:
https://drive.google.com/file/d/1TC7KkzciZq0Y0P9_4veoOSpds6CK2UT4/view?usp=sharing
Reply
Like
amsguy, mustafa335, and MediaFanatic like this.
FFelixGurero
o
Apr 29, 2022
@MIZANing Thank you.
Watched just first few minutes - this DS course. I have not watched completely. Looks interesting
https://www.youtube.com/watch?v=xvEKQefqQ7A
Reply
Like
mustafa335, MediaFanatic, and MIZANing like this.
FFelixGurero
o
Apr 29, 2022
Statistics 4th Edition
https://easyupload.io/j7wimf
Reply
Like
bcb, mustafa335, and MediaFanatic like this.
FFelixGurero
o
Apr 29, 2022
Edited
Glancing through the Statistics and Mathematics books, I am really afraid if I can ever learn Data Science and
Reply
Like
MIZANing replied to this.
MIZANing
o
Apr 29, 2022
Edited
FelixGurero
Continuous effort – not strength or intelligence – is the key to unlocking our potential.
winston churchill
Reply
Like
huu2014, mustafa335, and MediaFanatic like this.
AlbinoIvan
o
Apr 29, 2022
mustafa335 I've asked one of the regular uploaders to add a few statistics books to the site. I'll share the links here
when they're posted.
Reply
Like
silversurfer, amsguy, mustafa335, and 2 others like this.
MediaFanatic
o
Apr 29, 2022
@FelixGurero - To be honest, I think that data science is easier than professional programming.
I'm not referring to learning programming through one of those "Learn in 24 hours" books or video courses, of
course.
Rather, I'm referring to programming for a living as a consultant. If you work for the government or a small business
-- or another low-pay/low-expectation job, you can get away with being one of those "learned in 24 hours"
programmers. However, making a good living in software development -- I think it's more difficult than data science.
The biggest reason is that in data science, everything is fairly straightforward. You won't spend days tracing
back some error (or just as long trying to find/reproduce some weird error that you swear can't be real).
I'm not saying this to scare people; I'm saying this to reassure you -- that Data Science is a lot to learn, yes -- but --
the more there is to learn, the easier it is to get a job, and the better it pays!
Most importantly, anything that appears tough (statistics, for example) -- you can refer back to the books as a
reference guide.
For me, another trick is working in the middle of the night. When something is very difficult and I'm writing code... I
cannot have ANY distractions. I close my door, use noise-cancelling headphones, and I make sure I've had no
caffeine. I need a very relaxed, concentrated mind.
A few days later you will think "that wasn't so bad ... I'll never remember how to do it in the future .. but it wasn't that
bad" lol.
Another thing to remember about software development: This is not a field for people that cannot concentrate -- or
people who do not like to read and learn constantly. As long as you can concentrate and you like reading/learning --
I assure you, nothing will be too difficult, if you give it time and use your references when
requried.
Reply
Like
AlbinoIvan, mustafa335, javarookie, and MIZANing like this.
AlbinoIvan
o
Apr 30, 2022
Here are the books that are focused on the intuition of statistics rather than on the mathematics.
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
Statistics, 4th Edition by David Freedman, Robert Pisani & Roger Purves
Reply
Like
MIZANing replied to this.
mustafa335 and MediaFanatic like this.
MIZANing
o
Apr 30, 2022
Edited
AlbinoIvan
In fact, JIM FROST has 3 books as far as I know. Here is the third:
https://drive.google.com/file/d/1eSPa-Rpdd692eTQBHkFgIrMN3_lZtigW/view?usp=sharing
and here is the data(For the 3 books)
https://drive.google.com/file/d/1hcikE8raWMTVr7IORSClg6hJBjpWRfse/view?usp=sharing
Reply
Like
AlbinoIvan, mustafa335, MediaFanatic, and rayjimmy like this.
MediaFanatic
o
Apr 30, 2022
@MIZANing - Thank you for uploading those.
Looks like there was another good book uploaded to SoftArchive today:
Reply
Like
AlbinoIvan and MIZANing like this.
Sstefcab
o
o
Apr 30, 2022
Edited
The hardest part isn't thinking it's hard, but the way the books are written. Most of the time, books about
mathematical sciences are too academic, vocabulary and syntax too abstract.
This book is much more concrete, explained in a way that is not at all academic.
https://sanet.st/blogs/mecury-books/
math_for_programmers_d_graphics_machine_learning_and_simulations_with_python_true_epub_mobi.360664
9.html
Reply
Like
MediaFanatic replied to this.
MediaFanatic likes this.
MediaFanatic
o
Apr 30, 2022
Edited
stefcab - If you enjoy that book, you may enjoy others that @mustafa335 has posted. I say that because I noticed
he has originally suggested this same book, above.
Take a look at some of @mustafa335 's recommendations above, here:
https://sanet.st/community/d/21245-only-for-statsdata-science-nerds/14
This got me thinking
Perhaps we should create a new, second thread. I can attempt to combine and categorize all links we've assembled
to-date. This way new users who stumble upon the post will have a single, organized reading list to start from?
I believe I'm only missing one thing -- @AlbinoIvan - You had mentioned some courses, I believe they are by
Andrew Ng. I'm not sure if they are "DeepLearning.AI" and "Fast.AI" or those were separate courses. Can you provide
a bit more details with links (even if we don't yet have them here). This way I can add this along with complete
resource list.
Reply
Like
mustafa335 replied to this.
stefcab, mustafa335, and MIZANing like this.
Mmustafa335
o
Apr 30, 2022
MediaFanatic
I would add this one as well, please check it out
Artificial Intelligence - A Modern Approach 4th Global Edition
Reply
Like
AlbinoIvan
o
May 1, 2022
Edited
MediaFanatic The point I was trying to make regarding how knowledge of assembly language was beneficial was
that it helped because I knew what impact the segment of code written in the text editor was having at the layer
below within the computer. Other developers were writing their code without such knowledge and assuming that
the compiler was taking care of it, that was a false assumption based upon an article that appeared in a publication
like Dobb's Journal. Nowadays most developers code using frameworks and it's pretty much the same issue, mostly
they know the API that they are coding to but very few take the time to delve deeper to look at the code in the
framework to see how it works. The principle is simple, if you think of the layers as links in a chain and you know
what affects the performance in the next link then you can tailor/optimize what you pass as input to the next link to
improve its performance - the knowledge of the workings of the next link gives you the reason 'why', it's an
informed decision as opposed to just following a mantra.
In terms of Data Science courses, I don't see the 'why' explained sometimes for simple things. The intuition isn't
imparted to the novice because some knowledge is assumed and that leaves scope for misunderstanding. To give an
example of a why not being explained: "T-test is used for Linear Regression, Z-test is used for Logistic Regression."
Why? It was never explained and I had a college lecturer and a tutor that either couldn't or refused to explain the
reason. "You just need to know that" and they also repeated another mantra "Use Z-test when n is large (> 30)
otherwise T-test" which is a different statement altogether. I eventually reasoned it out for myself.
The Z-test assumes a Normal Distribution, the standard deviation or variance are given. With an average of 0 and
variance of 1, Logistic Regression and the Z-test fit like a glove.
The T-test does not require the standard deviation or variance. This is helpful for Linear Regression because you
don't know the distribution, the data is only a sample from the unknown distribution. It follows from this that you
can't know the variance or the standard deviation so the T-test is appropriate.
That seems a trivial example but I've come across similar things during my journey - the teacher seems to be doing
you a favor by giving a rule that's a shortcut but in doing so the real knowledge gets lost. I've always been one to
look for the 'why' to understand the problem or indeed a solution to a problem rather than take
assumptions/mantras as fact.
His original Machine Learning course is available here on the site: Machine Learning by Andrew Ng. The course is
very old in terms of the later material being outdated but the intuitions he imparts from his own experiences are still
valid. For anyone that wants to learn Machine Learning, I highly recommend working through the coding challenges
of that course because it will challenge you, frustrate the shit out of you at times and make you think that you are
dumb - you're not dumb, there's simply a lot of learning packed into that one course and the instructor makes
mistakes in some of the math equations which doesn't help. It's tough because you are scaling a mountain but you
get to a point where you break through the ceiling and everything becomes clear.
That course was later updated and expanded to become the Deep Learning Specialization consisting of 5 separate
courses - i had the opportunity to take those courses on Coursera and found them much easier than the original
course because I had already had an understanding of the material. Some colleagues found the specialization
courses really tough and encountered similar frustration but they hadn't done the Machine Learning course. The
specialization courses were updated last year and it has been posted on the site recently, I can't tell you what has
changed as I haven't looked at it. Deep Learning Specialization.
I know people that highly recommend Krista King's courses for getting to grips with math topics. These courses have
previously appeared on the site but of course there are dead links for some of them now.
Become a Probability & Statistics Master
Become a Linear Algebra Master
Become a Calculus 1 Master
Become a Calculus 2 Master
Some familiarity with the statistical mathematics is fundamental to tackling Data Science. The Calculus and Linear
Algebra topics are helpful for plunging into Machine Learning in particular.
A friend who has no previous programming experience recommended the following book as it got them through
the module of a course they were taking. It isn't available on the site but somebody may find it in the wild and be
kind enough to share.
Introduction to Data Science for Social and Policy Research: Collecting and Organizing Data with R and
Python
The same author has produced a book on Data Visualization which is available. I have taken a look and it contains
very practical information.
Data Visualization for Social and Policy Research: A Step-by-Step Approach Using R and Python
MIZANing
o
May 23, 2022
I've posted a month ago this you tube link:
https://www.youtube.com/c/joshstarmer/playlists
In fact, Josh Starmer has turned his videos into a book and, fortunately, it has been posted here on SANET. It's a
good reference for Machine learning.
https://sanet.st/blogs/mgbook/the_statquest_illustrated_guide_to_machine_learning.4126626.html
Reply
Like
mustafa335 and MediaFanatic like this.
MediaFanatic
o
May 23, 2022
Edited
I enjoyed this. It's the quickest summary of how ML actually works, real-world.
It's true what they say, a picture is worth a thousand words -- because descriptions only confuse people that I speak
to. This is the first 10-second intro I've stumbled on, that gets people familiar with the M.L. workflow:
I also enjoyed this example, illustrating the difference between simple "Data Analysts" vs "Scientists". So many
"Analysts" call themselves "Data Scientists" that I enjoyed this clarification:
Reply
Like
AlbinoIvan, mustafa335, tecnico82, and MIZANing like this.
Mmustafa335
o
May 24, 2022
Edited
This is the collection of Andrew NG Deep Learning Specialization courses
He will be releasing Machine Learning Specialization in Python in June.
Actually, all courses from deeplearning.ai/programs are available on SaNet, just search for them.
Reply
Like
MediaFanatic, magicbug, and noname909 like this.
restfulwait
o
May 24, 2022
These discussions are gold. I just started in Data Science and this is what is wanted.
Reply
Like
MIZANing replied to this.
silversurfer, zemane, mustafa335, and 3 others like this.
MIZANing
o
May 25, 2022
Edited
restfulwait
Especially the last post by @MediaFanatic. Personally, I like to see the big picture before I dive into the details.(The
link refers to post number 85).
https://sanet.st/community/d/21245-only-for-statsdata-science-nerds/85
My second favourite post is by @AlbinoIvan, it is the explanation of data science for all levels.
(The link refers to post number 17).
https://sanet.st/community/d/21245-only-for-statsdata-science-nerds/17
Sure, there are a lot of GOOD posts, but these two are my favorites, for the reason that I'm a Six Sigma guy who got
addicted to statistics .
Reply
Like
zemane, MediaFanatic, and crickety like this.
Nnoname909
May 26, 2022
Edited
I am curious, I study python course for data science(pandas, seaborn, matplotplip, and statistics concept), can I use
SQL + Tableau instead? Is SQL performance equal to other languages(python, javascript) when it comes to handle
big amount of data, I studied the data analyst course of google and they mentioned that for table less than 2000
rows speadsheet, excel would be enough, when data gets bigger than that, go for SQL, python. Also, is R a must if I
learn python and want to be data analyst?
I just want to know if it is good to become jack-of-all-trades, and master of none, or should I focus on one.
Reply
Like
AlbinoIvan, MIZANing, and hakko9 replied to this.
crickety likes this.
AlbinoIvan
o
May 26, 2022
noname909 If you are looking at it from a career path perspective then you should identify the area where you want
to go and target the technologies/tools common to that area.
The field is so large that it's not possible to become a jack of all trades and it would be a journey into madness to
attempt doing so because as you delve into an area of the field you will no doubt discover another tool that is used
to do the same thing. Now you face a dilemma, should you drop everything to explore that fork in the path or add it
to the ever growing list of things to familiarize yourself with? The knowledge of experts in a field will trump that of
generalists and lead to better productivity and outcomes.
Excel is a powerful tool, you can use it to connect to SQL databases to query data but it has limitations on the
volume of data it can access. If you work on projects that never breach those hard limits of the software then you
could be very happy and productive. I know somebody who works for a company that is in the process of migrating
from Excel to Power BI because the data has outgrown the capacity of Excel. The staff have undertaken some
courses in Power BI and they have already realized that a different mindset will be required for working with the new
product. There are a few who have begun job hunting because they want to remain working with Excel, the thought
of leaving their comfort zone is too much for them. This is anecdotal, but I think it's an important lesson that people
entering the field need to be open to the possibility of change in their tools and workflows because the field is in
constantly in motion.
You mentioned learning Python and/or R. Python is a general programming language and is a skill set that is
applicable to more fields than R. If you decide that data science isn't all it's made out to be but you'd like to get into
web development as an example then your Python programming skills are more transferable than R. R was
developed specifically for statistics in academia and that is where it's strength lies, ggplot2 is amazing for presenting
data in graphical form and is far simpler than anything on offer in the Python ecosystem.
Be aware that depending on the industry where you work Python or R may not be the language used for analysis.
SAS is widely used for statistical analysis in banking, insurance and healthcare. SPSS is also used in the banking and
insurance industries.
SQL is universal. If you can write SQL queries then you can work across any industry because they all use databases.
The particulars of the vendor database (IBM Db2, MS SQL Server, MySQL, Oracle, PostgreSQL, SAP, Sybase etc.) can
be picked up and improve your technical expertise but the fundamentals of SQL are the same. When it comes to
querying properly structured data, SQL outperforms everything else which is why it has been a much sought after
skill for the past 30 years or so.
Reply
Like
noname909 replied to this.
zemane, MediaFanatic, MIZANing, and 2 others like this.
MIZANing
o
May 26, 2022
Edited
noname909 I just want to know if it is good to become jack-of-all-trades, and master of none, or should I focus on
one.
CEO but you cannot reach that job unless you are you are highly connected or good at something
meaning "Knowing everything about one thing" that is complete knowledge (Expert) which it helps you work your
way up through the company hierarchy.
Data Scientists do not care about OLTP at all: HADOOP, Cassandra, Elastic Search or may be Mongo sharding for a
small shop. However you can use SQL sample database to populate Panda data frames while playing around.
Reply
Like
noname909 replied to this.
bcb, zemane, and MediaFanatic like this.
Nnoname909
May 29, 2022
Edited
hakko9 Now I just aim to be data analyst, based on a course's explanation the hierarchy looks like this from top to
bottom with the top being the hardest: data scientist, data engineer, data analyst.
I happen to see machine learning course of Andrew Ng here, but it seems
old https://sanet.st/blogs/training4all/machine_learning_by_andrew_ng.3065873.html
Is the code, if there is, still usable? I assume the concept and theory should not change much, I heard many good
recommendation about Andrew Ng on Reddit, so I want to try it.
Reply
Like
MediaFanatic replied to this.
MediaFanatic and MIZANing like this.
MediaFanatic
o
May 29, 2022
Edited
noname909 - Do not worry about the title. When I mentioned earlier that many people confuse the titles "Data
Scientists" and "Data Analysts" -- that is just as common to be the people hiring, as the people doing the job.
In other words --
Focus primarily on the requirements, and the "80/20 rule" in the beginning (you can learn 80% of a topic in 20% of
the time -- the last 20% of learning requires 80% of the time). These numbers are not precise, but it's the concept
that's important to understand.
To your question -- yes, those courses are excellent. Don't worry about the age.
The last few years, nothing at the learning-level has changed substantially. In fact, some newer courses may be
worse (because so many people have become "overnight experts" that weren't doing this prior to the hype).
Just don't jump into one of the "servers" until you're comfortable with a simple desktop app, or it might dissuade
you.
I'll explain more in future posts. This should be enough to get you started ;-)
Reply
Like
noname909 replied to this.
mustafa335, MIZANing, hsny, and noname909 like this.
tecnico82
o
May 30, 2022
Edited
Probably in the future artificial intelligence will no longer need big data. I have given this a lot of thought, and it
seems the most logical idea; there will soon be a "substrate" on which dynamic algorithms will always find the most
efficient and safe path. I wonder what name this "substrate" will have?
To be more clear this thing should work in the following way:
there will be industries of "substrates" and you will pay to have the best "substrate"
the dynamic algorithms will instead be the same for everyone
the substrate will have quantum memory units
...
Why quantum memory units? simply because that is how our brains work
Reply
Like
MediaFanatic and AlbinoIvan replied to this.
MediaFanatic likes this.
MediaFanatic
o
May 30, 2022
Edited
tecnico82 - It's a nice thought, but it's based on a lack of understanding of the current technology...
First, the concept you mention already exists somewhat (auto-generated algorithms); there is a scientific program (I
forget the name) I don't believe it's available at the retail level, it's extremely high-end. It creates algorithms / solves
scientific issues simply by going through the data to create the algorithm. It can take HUGE amounts of time and
data to eventually find the algorithm.
Second, "Machine Learning" is almost precisely what you describe. It develops "pattern matching" (just a more
complex and multi-dimensional form of "algorithms") on its own, by examining large amounts of data against
"proofs" (the outcome you're looking for).
Third, "Artificial Intelligence" is what you describe if you take M.L. and you add "quantum" logic (I use the term
"quantum" in quotes because I'm referring to your intended definition; the actual definition is different than your
usage).
To clarify that last point, you refer to "quantum" as emulating our brains. Technically the term "quantum" in both
computing and memory, refers to the use of Qubits (at least one phase in addition to the standard on/off of binary).
This does not emulate the human mind in any way; it's simply a more robust computing-fundamental which allows
subsequent components of the "computer" to become exponentially more powerful with every added portion of
computing power.
What I believe you intended to say was "neural computing" ... And again, this is essentially the domain of A.I.
However...
The most fundamental point of your premise is the polar-opposite of how every form of A.I. / M.L. and all
algorithm-generating processes work.
Simply put --
All A.I. requires substantial amounts of data, as does any subset, or algorithm-generating tool. This cannot be
avoided because it's impossible to intuit the conclusions or algorithms of complex ("neuron"-style) computing.
This is not a lack of forward-thinking. It's literally the very fundamental by which neuron-computing must work,
including our own brains. It is impossible for something to locate algorithms or patterns that are accurate without
having enough data to assure the accuracy. The less data, the less likely the results will be accurate. Whether it's
today, or a hundred years from now.
The big difference between today and a hundreds years from now -- is that these algorithms won't need users to
supply the data; they'll be able to go out and fetch it on their own. This is essentially what IBM is building with
Watson, the most ambitious super-computing project on Earth, at present.
Sidenote: Technically Google's neural-net is more ambitious; however, it's 100% hidden from any public knowledge
and therefore it would be irresponsible to make that claim publicly; therefore: IBM Watson.
Reply
Like
AlbinoIvan, tecnico82, and hsny like this.
AlbinoIvan
o
May 30, 2022
tecnico82 Probably in the future artificial intelligence will no longer need big data.
With our existing technology we are many life times away from the possibility of artificial intelligence not requiring
big data.
Neural Networks and Deep Learning are based upon a very outdated idea of how elements of our brain work
(neurons). Scientists have since learned that neurons in the human brain are vastly more complex than previously
thought. A neural network with 8 hidden layers is required to model a single neuron in a human brain as we
understand it today. We also now know that memory isn't stored in a single area of the brain, it's stored across the
entire brain. The average human brain consists of 86 Billion neurons. That fact in itself should tell us that we haven't
yet scratched the surface of developing a real Artificial Intelligence.
MediaFanatic
o
May 30, 2022
Edited
AlbinoIvan - Thank you for the excellent citations. Everyone should read those, even if just for fun.
The video you linked:
Is also one of the best intro videos on the concept of human intelligence. Excellent resources. Thank you!
Reply
Like
helljava, AlbinoIvan, tecnico82, and hsny like this.
Nnoname909
May 30, 2022
MediaFanatic thanks for a very detail posts, what about R, it is part of Google's Data analyst course but not in one
of IBM, so I don't know if it is a must or can be replaced by python libraries.
Reply
Like
AlbinoIvan replied to this.
AlbinoIvan
o
May 31, 2022
noname909 what about R, it is part of Google's Data analyst course but not in one of IBM, so I don't know if it is a
must or can be replaced by python libraries.
R is not a requirement to know, it's used in particular fields primarily in academia or research where there is a heavy
focus on statistical analysis.
In Python, regression models are primarily created using the scikit-learn library. If you look at any books/courses
you will see this to be the case for the vast majority of the time. There is another library, statsmodels, which is for
"hard core" statistical analysis in the same vein as that provided by R. What's the difference between the two
libraries? One is built for speed and so contains optimizations and approximations to deliver performance over
statistical accuracy.
In real-world applications, Python's scikit-learn library is good enough, but academic researchers will want to use
the statsmodels library or R. There are some statistical algorithms for which there is no implementation in Python
but they are available in R. If you find yourself in such a situation then you can switch to using R or alternatively use
the rpy2 library which allows R functions to be called from within the Python environment. This approach will save
you the trauma of having to learn a new language which would be a huge cost in your time just for the sake of using
a functionality in a library unique to that language. The trade-off of course is that you may need to transform the
data into a data structure compatible with that expected by the R function.
scikit-learn : Machine Learning in Python
statsmodels : statistical models, hypothesis tests, and data exploration
rpy2 - R in Python
Medium: Calling R Libraries from Python
Google's product offerings are primarily targeted at academic institutions which is why I think a Data Analyst course
offered by them will feature the R language. IBM is focused on business applications so languages like Java, Python
and Scala are more likely to feature in any course they offer.
Reply
Like
tecnico82, MediaFanatic, and noname909 like this.
tecnico82
o
May 31, 2022
the experience of our brain is immersive, it collects and does not retain, but then it always finds all that it has
collected; there are many brains, one is deep, the second a little less and so on down to the surface; and there is one
brain higher, the second a little lessand so on down to the surface.And the surface is never where you should find it.
Reply
Like
restfulwait likes this.
4 DAYS LATER
Nnoname909
Jun 4, 2022
Edited
Day CONTENT
1 Introduction & BI Overview 2 Data Preparation (1) 3 Data Preparation (2) 4 Data Modeling (1) 5 Data Modeling (2)
6 Correction of Technical skills 7 Business Statistics (1) 8 Business Statistics (2) 9 Correction of Math & Statistics
10 Data Analytics Problem Solving (1) 11 Data Analytics Problem Solving (2) 12 Descriptive Analytics
13 Diagnostics Analytics (1) 14 Diagnostics Analytics (2) 15 Correction of Data Exploration
16 Dashboard Design & Find Insights 17 Correction of Dashboard & Insights 18 Data Storytelling & Visualization
19 Correction of Data Storytelling 20 Presentation for Final Test
Above is a 20-day course's schedule of BI, which claims to focus on analysis technique (mindset and critical thinking)
rather than tools(SQL, tableau, etc.). My question is I am not sure if the terms, concepts are core of BI or they just say
it to attract student, I took the Analysis Course of data analysis of Google, but it is so basic, calculate average, sum,
find max, min value of a category, I haven't seen insights, to be honest, I expect it to be more complex. Is there
online course or e-book covering the above concepts, techniques.
https://www.youtube.com/watch?v=qpoRO378qRY