[ General Information
| Schedule
| Handouts
| Resources
| Requirements
]
Course description: Data mining is the process of automatic discovery of patterns, models, changes, associations and anomalies in massive databases. This course will provide an introduction to the main topics in data mining and knowledge discovery, including: algebraic and statistical foundations, pattern mining, clustering, and classification. Emphasis will be laid on the algorithmic approach. After taking this course students will be able to
Prerequisites: Discrete math, data structures and algorithms. Knowledge of linear algebra and of probablity and statistics is also needed, though an attempt will be made to review the basic concepts. Assignments will require the use of the Python language, with NumPy package for numeric computations, though an overview of important features will be given to start you learning it on your own via web tutorials, etc. | Credits: 3.
Instructor: Annie Liu | Email: liu@cs.stonybrook.edu | Office: Computer Science 1433 | Phone: 631-632-8463. | Office hours: Tue 9:30-10AM, 11AM-12PM, 12:40-1PM, Thu 9:30-10AM, 12:30-1PM, 2:20-2:30PM, email for an appointment, or stop by any time I'm around.
Lectures: Tue Thu 1-2:20PM, in Computer Science 2129.
Textbook: Data Mining and Analysis by Mohammed J. Zaki and Wagner Meira, Jr. Cambridge University Press, 2014. Thanks to Prof. Zaki for providing much of the course materials!
Grading: Lecture critiques, in-class exercises, programming assignments, 2 exams, and a project, worth 5%, 10%, 30%, 2 x 20%, and 15%, respectively, of the grade. Extra credit work will be given as appropriate. Partial credit will be given for partial work. Reduced credit for late assignments, 20% off per day.
Course homepage: http://www.cs.stonybrook.edu/~liu/cse392/
Unit 1 (Aug 26,28): Overview. Ch.1. Assignment 0
No class: Sep 2, Labor Day
Unit 2 (Sep 4,9,11,16,18,23): Foundation. Ch.2,3,4,6,5,7. Assignment 1,2
Exam 1 (Sep 25): In-class exam. You can prepare one hand-written personal "crib sheet".
No class: Sep 30, Oct 2
Unit 3 (Oct 7,9,14,16,21): Classification. Ch.20,21,18,19,22. Assignment 3,4
No class: Oct 23
Unit 4 (Oct 28,30, Nov 4,6,11): Clustering. Ch.13,14,15,16,17. Assignment 5
Exam 2 (Nov 13): In-class exam. You can prepare one hand-written personal "crib sheet".
Unit 5 (Nov 18,20,25, Dec 2,4): Frequent pattern mining. Ch.8,9,10,11,12. Project
No class: Nov 27, Thanksgiving
Project due (Dec 5)
Questionnaire
Lecture Critiques
In-Class Exercises
Assignment 0: Data mining problem, programming in Python
Assignment 1: Numeric data analysis
Assignment 2: High dimensions, kernel method, principal components
Assignment 3: SVM training
Assignment 4: Paired t-test and Bayes classifier
Assignment 5: Expectation-maximization clustering
Exam 1
Exam 2
Project
Interactive Site of This Course, for students in the class
Computer Science Department Windows Computing Facilities
Learn all information on the course homepage. Check the homepage periodically for announcements and other dynamic contents.
Attend all lectures and take good notes. This is the most efficient way to learn the course materials, because we will both distill and elaborate textbook materials and discuss other important materials. We will start promptly on time, with quick reviews every time, followed by exercises or quizzes. We will have every student participate in solving problems and presenting solutions in class.
Do all course work. The readings are to help you preview and review the materials discussed in the lectures. The assignments are to provide concrete experiences with the basic concepts and methods covered in the lectures. The exercises and quizzes are to help check that you are keeping up with the lectures and the assignments. The exams will be comprehensive.
Ask questions and get help. Ask questions in class, in office hours, and in the Q&A forum. Talk with your classmates, and share ideas (but nothing written or electronic).
Academic Integrity: All assignments, quizzes, and exams must be done individually, unless specified otherwise; you may discuss ideas with others and look up references, but you must write up your solutions independently and credit all sources that you used. Any plagiarism or other forms of cheating discovered will have a permanent consequence in your university record.
Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/uaa/academicjudiciary/
Americans with Disabilities Act: If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC(Educational Communications Center) Building, Room 128, (631)632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential.
Critical Incident Management: Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of University Community Standards any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn. Further information about most academic matters can be found in the Undergraduate Bulletin, the Undergraduate Class Schedule, and the Faculty-Employee Handbook.