We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13
Machine Learning
Siraj
Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 1
Lecture Content • Tools for ML • Which one is better? (Python, R, Weka etc.) • Scikit Learn, TensorFlow, Keras, PyTorch etc.
Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 2
Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 3 Should beginners prefer to use octave/matlab over python when starting out with machine learning? (Response on Kaggle By Rachael Tatman) • Matlab/Octave: If you're doing heavy signal processing I would recommend Matlab/Octave. (Andrew Ng's Coursera course is taught in Octave, which may give you the impression that it's a popular language for ML, almost no one uses it for that.) • R: If you're doing statistical analysis and visualization, I'd recommend R. • Python: If you're doing general machine learning or looking for a general purpose programming language, I'd recommend Python. • Julia: If you're doing machine learning or statistical computing and really, really care about speed, I'd recommend Julia. The downside is that it's not as popular so you're more likely to have to implement something yourself rather than using a pre-built package. • In general, if a new technique is developed by the statistics community, it will be implemented in R first. If it's developed by the machine learning community, it will be implemented in Python (or possibly C) first. • In your case, it sounds like Python would be the best choice for you. :)
Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 4
scikit-learn
• Machine Learning in Python
• Simple and efficient tools for data mining and data analysis • Accessible to everybody, and reusable in various contexts • Built on NumPy, SciPy, and matplotlib • Open source, commercially usable - BSD license
Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 5
Benefits and drawbacks of scikit-learn • Benefits: • Consistent interface to machine learning models • Provides many tuning parameters but with sensible defaults • Exceptional documentation • Rich set of functionality for companion tasks • Active community for development and support • Potential drawbacks: • Harder (than R) to get started with machine learning • Less emphasis (than R) on model interpretability Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 6 Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 7 TensorFlow • TensorFlow is an end-to-end open source platform for machine learning. • It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.
Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 8
Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 9 Companies Using TF • Case Studies are also available on TensorFlow Website:
Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 10
Keras • Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research. • Use Keras if you need a deep learning library that: • Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility). • Supports both convolutional networks and recurrent networks, as well as combinations of the two. • Runs seamlessly on CPU and GPU.
Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 11
Weka • Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code. Features • machine learning • clustering • data mining • association rules • preprocessing • attribute selection • classification • experiments • workflow • regression • visualization
Tuesday, September 3, 2019 Department of Computer Science, BUITEMS 12