Syllabus - CIS 509 Data Mining II (Fall 2019)
Syllabus - CIS 509 Data Mining II (Fall 2019)
Course Description
We use Big Data applications almost every waking minute of our lives. From checking Facebook,
to buying a book on Amazon, to watching a movie on Netflix, to buying a ticket on Expedia. As we use
these applications, we also generate mounds of Big Data. According to an IBM estimate, 40 Zettabytes
of data will have been created by 2020 -- a 300X increase since 2005. This exponential growth poses
technical challenges for storage, processing and analysis of data. Organizations need to be able to
address these challenges in order to tap into the potential of Big Data.
This course is designed to provide a hands-on foundation in Big Data technology and application
in Business. The course will start with a hands-on introduction to key Python Packages used in Data
Mining, including NumPy, Pandas and SKLearn. We will then learn about three popular big data
applications: Recommendation Systems, Text Mining and Social Network Analysis. This will involve
writing Python code for analysis. The course will then provide an introduction to Big Data. We will learn
about Hadoop, an open-source Platform used to store and process Big Data. We will use a Hadoop
Cluster on AWS and learn how to analyze data on Spark using PySpark/MLLib.
Note: a hands-on understanding of the content covered in the Python Programming Workshop
is a pre-requisite for this course.
1. Critical Thinking
2. Communication
3. Discipline Specific Knowledge
4. Ethical Leadership or Global Leadership
1
Teaching Philosophy, Course Objectives, and Course Learning Outcomes
This course is designed around the philosophy of pragmatic immersive learning-by-doing. The
course is set up using a building-block approach – each lecture will build upon the concepts of the
previous lectures. Almost every lecture will have a lecture component covering core concepts, and
hands-on guided exercises that bring the core concepts to life. Exercises, Assignments and Quizzes will
further develop the application of core concepts, and critical thinking skills.
1. Develop and run analysis using Python and associated Data Mining packages.
2. Build and interpret basic Recommender, Text Mining, and Social Network Analysis models for
data analysis
3. Develop an appreciation for the opportunities, breadth and complexity of Big Data technology
and applications
4. Identify and explain the core components of the Hadoop Big Data Platform including HDFS,
YARN, MapReduce, Pig, Hive, and Spark.
Required readings/videos for each lecture from these online references (and other current
topics if necessary) will be posted on Canvas as the course progresses. Students will be expected to work
through all required material called out on Canvas for each lecture.
Python:
• https://docs.python.org/3/tutorial/
• http://networkx.github.io/
• https://docs.scipy.org/doc/numpy/
• https://pandas.pydata.org/pandas-docs/stable/
• https://matplotlib.org/users/index.html
• https://seaborn.pydata.org/
• https://scikit-learn.org/stable/index.html
• https://spark.apache.org/docs/latest/ml-guide.html
2
Data Mining:
• Python Data Science Handbook by VanderPlas
• A Programmer’s Guide to Data Mining by Zacharski
• NLP with Python by Bird, Klein, Loper
• Social Network Analysis for Startups by Kouznetsov and Tsvetovat
Big Data:
• https://hadoop.apache.org/
• https://aws.amazon.com/
• https://spark.apache.org/docs/latest/ml-guide.html
3
Guided Exercises and Assignments
Assignments will be assigned at the end of a class, and will need to be completed and submitted
no later than the due dates specified.
Guided Exercises will be started in class with the Instructor’s guidance, and will need to be
completed and submitted no later than the due dates specified.
Two out of the five assignments, and all guided exercises are individual effort.
Three out of the five assignments are group-effort. This means you will work with your assigned
groups to submit these assignments. Team work is expected to be shared equally between the team
members. Grades of individual members of the team will be adjusted if they are not contributing equally
to the assignment. Such an adjustment will be determined by the instructor based on written feedback
from other team members.
Short Tests will be open book / open notes. Question format will be multiple choice.
Quizzes will be closed book, but one sheet (two pages) of hand-written notes will be allowed.
Question formats may include problem solving, multiple choice, short answers and programming.
No make-ups will be provided for the first missed Quiz or Short Test. In case you have to miss
more than one Quiz or Short Test, you must provide a legitimate and documented reason. Students
must provide this documentation as much in advance of their absence as possible, in order to be
assessed for reasonable accommodations.
Classroom Policies:
Cell phone usage (for any form of communication) during class is strictly prohibited. If necessary
you may step out of class to use your cell phone, but do so at your discretion of missing important
lecture material and announcements.
Laptops may be used only for the purposes of following along during class, doing in-class
exercises, and taking notes.
No material covered in this class may be shared in any form (this includes lectures, handouts,
exercises, assignments, short tests, and quizzes).
4
Academic Integrity and Ethical Behavior
A student who engages in academic misconduct as outlined in ASU’s academic integrity policy
(http://provost.asu.edu/academicintegrity) while attending a graduate program will receive strict
penalties.
Those penalties ordinarily will range from a letter reduction in final course grade to expulsion
from the program and School of Business. The penalty will be decided by the course faculty member and
the Assistant Dean of Academic Affairs. All allegations of academic misconduct must be reported to
program administrators. Any subsequent act of academic misconduct, regardless of severity, will result
in dismissal from the program and the School of Business.
Required Readings for each Lecture will be posted on Canvas, as well as mentioned in class, as
we go through the course. Students will be expected to read all required material called out on Canvas
for each lecture.
In case you have to miss a class, or delay the submission of an assignment, you must provide a
legitimate and documented reason. Students must provide this documentation as much in advance of
their absence as possible, in order to be assessed for reasonable accommodations.
No make-ups will be provided for the first missed Quiz or Short Test. In case you have to miss
more than one Quiz or Short Test, you must provide a legitimate and documented reason. Students
must provide this documentation as much in advance of their absence as possible, in order to be
assessed for reasonable accommodations.
Religious Accommodations
Accommodations will be made for students with religious holidays. Below is the calendar of
official religious holidays. Each holiday noted with two asterisks denotes an observance for which work
is not allowed. For these holidays, students will not be penalized in any way for missing class or
5
assignment. This means that this will not count as an absence in class and they will be granted a
makeup assignment or quiz, etc.
https://eoss.asu.edu/cora/holidays
All requests for accommodation must be submitted by the end of the first week of class.
University-Sanctioned Activities
Accommodations will be made for students who miss class related to university-sanctioned
activities according to ACD 304-02.
If you are participating in a university-sanctioned activity, please let your instructor know as
early in the course as possible so that accommodations can be made.
Tutoring Support
Arizona State University provides writing assistance through multiple platforms. More
information can be found here: http://studentsuccess.asu.edu/writingcenters
Title IX
Title IX is a federal law that provides that no person be excluded on the basis of sex from
participation in, be denied benefits of, or be subjected to discrimination under any education program
or activity. Both Title IX and university policy make clear that sexual violence and harassment based on
sex is prohibited. An individual who believes they have been subjected to sexual violence or harassed
on the basis of sex can seek support, including counseling and academic support, from the university. If
you or someone you know has been harassed on the basis of sex or sexually assaulted, you can find
information and resources at http://sexualviolenceprevention.asu.edu/faqs/students.
Disability Accommodations
If you need an accommodation for a disability, you must register with the Disability Resource
Center (DRC). Please submit appropriate documentation from the DRC to your instructor as early in the
course as possible so that accommodations can be made.
6
Copyright Material
No material covered in this class may be shared in any form (this includes lectures, handouts,
exercises, assignments, quizzes, and exams).
Information contained within this syllabus (except grading and absence policies)
is subject to change.