[go: up one dir, main page]

0% found this document useful (0 votes)
9 views35 pages

Machine Learningfor Everyone

The document provides a simplified introduction to machine learning, aiming to demystify the subject for both programmers and managers. It covers classical machine learning methods, including supervised and unsupervised learning, as well as reinforcement learning, explaining their practical applications and popular algorithms. The author emphasizes the importance of real-world problem-solving over complex theoretical concepts in understanding machine learning.

Uploaded by

Celio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
9 views35 pages

Machine Learningfor Everyone

The document provides a simplified introduction to machine learning, aiming to demystify the subject for both programmers and managers. It covers classical machine learning methods, including supervised and unsupervised learning, as well as reinforcement learning, explaining their practical applications and popular algorithms. The author emphasizes the importance of real-world problem-solving over complex theoretical concepts in understanding machine learning.

Uploaded by

Celio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 35
Machine Learning for Everyone Machine Learning is like sex in high school. Everyone is talking about it, a few know what to do, and only your teacher is doing it. If you ever tried to read articles about machine learning on the Internet, most likely you stumbled upon two types of them: thick academic trilogies filled with theorems (I couldn’t even get through half of one) or fishy fairytales about artificial intelligence, data-science magic, and jobs of the future, I decided to write a post I've been wishing existed for a long time. A simple introduction for those who always wanted to understand machine learning. Only real-world problems, practical solutions, simple language, and no high-level theorems. One and for everyone. Whether you are a programmer or a manager. Let's roll. PROGRAMMERS ARE PROGRAMING! DATASCIENCE! PROFESSION OF FUTURE! IN THE NeXT FIVE YEARS. EXPONENTIAL GROWTH SMART MACHINES! [AS WE CAN SEE HERE. THs 1S OBVIOUS! 222 TWO TYPES OF ARTICLES ABOUT MACHINE LEARNING. add a comment here YOU'RE UNTEACHABLE f_ The map of the machine learnin: world Part 1. Classical Machine Learning The first methods came from pure statistics in the '5os. They solved formal math tasks — searching for patterns in numbers, evaluating the proximity of data points, and calculating vectors’ directions. Nowadays, half of the Internet is working on these algorithms. When you see a list of articles to "read next" or your bank blocks your card at random gas station in the middle of nowhere, most likely it's the work of one of those little guys. Big tech companies are huge fans of neural networks. Obviously. For them, 2% accuracy is an additional 2 billion in revenue. But when you are small, it doesn't make sense. I heard stories of the teams spending a year on a new recommendation algorithm for their e-commerce website, before discovering that 99% of traffic came from search engines. Their algorithms were useless, Most users didn't even open the main page. Despite the popularity, classical approaches are so natural that you could easily explain them to a toddler. They are like basic arithmetic — we use it every day, without even thinking, CLASSICAL MACHINE LEARNING SUPERVISED UNSUPERVISED CLASSIFICATION uustéAg Js LE REGRESSION 4 5 pwmension RebucTion (generalization) collapse all] [show all] 2 comments 1.1 Supervised Learning Classical machine learning is often divided into two categories Supervised and Unsupervised Learning. In the first case, the machine has a "supervisor" or a "teacher" who gives the machine all the answers, like whether it's a cat in the picture or a dog. The teacher has already divided (labeled) the data into cats and dogs, and the machine is using these examples to learn. One by one. Dog by cat. Unsupervised learning means the machine is left on its own with a pile of animal photos and a task to find out who's who. Data is not labeled, there's no teacher, the machine is trying to find any patterns on its own, We'll talk about these methods below. Clearly, the machine will learn faster with a teacher, so it's more commonly used in real-life tasks. There are two types of such tasks: classification - an object's category prediction, and regression - prediction of a specific point on a numeric axis. add a comment here Classification Classification "Splits objects based at one of the attributes known beforehand. Separate socks by based on color, documents based on language, music by genre” Today used for: - Spam filtering ~ Language detection ~ A search of similar documents ~ Sentiment analysis ~ Recognition of handwritten characters and numbers - Fraud detection Popular algorithms: Naive Bayes, Decision Tree, Logistic Regression, K-Nearest Neighbours, Support Vector Machine From here onward you can comment with additional information for these sections. Feel free to write your examples of tasks. Everything is written here based on my own subjective experience. add a comment here Regression Regression is basically classification where we forecast a number instead of category. Examples are car price by its mileage, traffic by time of the day, demand volume by growth of the company etc. Regression is perfect when something depends on time. Everyone who works with finance and analysis loves regression. It's even built-in to Excel. And it's super smooth inside — the machine simply tries to draw a line that indicates average correlation. Though, unlike a person with a pen and a whiteboard, machine does so with mathematical accuracy, calculating the average interval to every dot. PREDICT TRAFFIC JAMS POLYNOMIAL, current how current hour REGRESSION When the line is straight — it's a linear regression, when it's curved - polynomial. These are two major types of regression. The other ones are more exotic. Logistic regression is a black sheep in the flock. Don't let it trick you, as it's a classification method, not regression. It's okay to mess with regression and classification, though. Many classifiers turn into regression after some tuning. We can not only define the class of the object but memorize how close it is. Here comes a regression. If you want to get deeper into this, check these series: Machine Learning for Humans. I really love and recommend it! add a comment here 1.2 Unsupervised learning Unsupervised was invented a bit later, in the 'g0s. It is used less often, but sometimes we simply have no choice. Labeled data is luxury. But what if I want to create, let's say, a bus classifier? Should I manually take photos of million fucking buses on the streets and label each of them? No way, that will take a lifetime, and I still have so many games not played on my Steam account. There's a little hope for capitalism in this case. Thanks to social stratification, we have millions of cheap workers and services like Mechanical Turk who are ready to complete your task for $0.05. And that's how things usually get done here. Or you can try to use unsupervised learning, But I can't remember any good practical application for it, though. It’s usually useful for exploratory data analysis but not as the main algorithm. Specially trained meatbag with Oxford degree feeds the machine with a ton of garbage and watches it. Are there any clusters? No. Any visible relations? No. Well, continue then. You wanted to work in data science, right? add a comment here Clustering Clustering "Divides objects based on unknown features. Machine chooses the best way" Nowadays used: + For market segmentation (types of customers, loyalty) + To merge close points on a map + For image compression + To analyze and label new data + To detect abnormal behavior Popular algorithms: K-means clustering, Mean-Shift, DBSCAN add a comment here Clustering is a classification with no predefined classes. It’s like dividing socks by color when you don't remember all the colors you have. Clustering algorithm trying to find similar (by some features) objects and merge them in a cluster. Those who have lots of similar features are joined in one class. With some algorithms, you even can specify the exact number of clusters you want. An excellent example of clustering — markers on web maps. When you're looking for all vegan restaurants around, the clustering engine groups them to blobs with a number. Otherwise, your browser would freeze, trying to draw all three million vegan restaurants in that hipster downtown. Apple Photos and Google Photos use more complex clustering. They're looking for faces in photos to create albums of your friends. The app doesn't know how many friends you have and how they look, but it's trying to find the common facial features. Typical clustering. Another popular issue is image compression. When saving the image to PNG you can set the palette, let's say, to 32 colors. It means clustering will find all the "reddish" pixels, calculate the "average red" and set it for all the red pixels. Fewer colors — lower file size — profit However, you may have problems with colors like Cyan™-like colors. Is it green or blue? Here comes the K-Means algorithm. It randomly sets 32 color dots in the palette. Now, those are centroids. The remaining points are marked as assigned to the nearest centroid. Thus, we get kind of galaxies around these 32 colors. Then we're moving the centroid to the center of its galaxy and repeat that until centroids stop moving. All done. Clusters defined, stable, and there are exactly 32 of them. Here is a more real-world explanation: PUT KEBAB KIOSKS IN THE OPTIMAL WAY (alto Llutrating the k-means nethad) 4 Put ab Kosten random 2, Watch haw buyers chaste 8. ove Watts later te Places ety ‘ha neat ona the caters of ther poplarty Watch and move agin 5 Repeat millon mes 1 one ove god of Kabob! Searching for the centroids is convenient. Though, in real life clusters not always circles. Let's imagine you're a geologist. And you need to find some similar minerals on the map. In that case, the clusters can be weirdly shaped and even nested. Also, you don’t even know how many of them to expect. 10? 100? K-means does not fit here, but DBSCAN can be helpful. Let's say, our dots are people at the town square. Find any three people standing close to each other and ask them to hold hands. Then, tell them to start grabbing hands of those neighbors they can reach. And so on, and so on until no one else can take anyone's hand. That's our first cluster. Repeat the process until everyone is clustered. Done. Anice bonus: a person who has no one to hold hands with — is an anomaly. It all looks cool in motion: eaten 109 Restart Pause Interested in clustering? Check out this piece The § Clustering Algorithms Data Scientists Need to Know Just like classification, clustering could be used to detect anomalies. User behaves abnormally after signing up? Let the machine ban him temporarily and create a ticket for the support to check it. Maybe it's a bot. We don't even need to know what "normal behavior" is, we just upload all user actions to our model and let the machine decide if it's a "typical" user or not. This approach doesn't work that well compared to the classification one, but it never hurts to try. add a comment here Dimensionality Reduction (Generalization) Dimension Reduction "Assembles specific features into more high-level ones" Nowadays is used for: + Recommender systems (+) + Beautiful visualizations + Topic modeling and similar document search + Fake image analysis + Risk management Popular algorithms: Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Latent Dirichlet allocation (LDA), Latent Semantic Analysis (LSA, pLSA, GLSA), t-SNIE (for visualization) add a comment here Assiciation Rule Learning This includes all the methods to analyze shopping carts, automate marketing strategy, and other event-related tasks, When you have a sequence of something and want to find patterns in it — try these thingys. Say, a customer takes a six-pack of beers and goes to the checkout. Should we place peanuts on the way? How often do people buy them together? Yes, it probably works for beer and peanuts, but what other sequences can we predict? Can a small change in the arrangement of goods lead to a significant increase in profits? Same goes for e-commerce. The task is even more interesting there — what is the customer going to buy next time? No idea why rule-learning seems to be the least elaborated upon category of machine learning. Classical methods are based on a head- on look through all the bought goods using trees or sets. Algorithms can only search for patterns, but cannot generalize or reproduce those on new examples. In the real world, every big retailer builds their own proprietary solution, so nooo revolutions here for you. The highest level of tech here — recommender systems. Though, I may be not aware of a breakthrough in the area. Let me know in the comments if you have something to share. THAT MEATBAG BOUGHT A SOFA ¥ PROBABLY, HE LOVE SOFASM RECOMMEND HIM 448 MORE SOFAS Bi collapse all] [show all] 3 comments Part 2. Reinforcement Learnin: 3 Action Reward Reinforcement Learning Finally, we get to something looks like real artificial intelligence. In lots of articles reinforcement learning is placed somewhere in between of supervised and unsupervised learning. They have nothing in common! Is this because of the name? Reinforcement learning is used in cases when your problem is not related to data at all, but you have an environment to live in. Like a video game world or a city for self-driving car. Marl/O - Machine Learnin. Neural network plays Mario Knowledge of all the road rules in the world will not teach the autopilot how to drive on the roads. Regardless of how much data we collect, we still can't foresee all the possible situations. This is why its goal is to minimize error, not to predict all the moves. Surviving in an environment is a core idea of reinforcement learning. Throw poor little robot into real life, punish it for errors and reward it for right deeds. Same way we teach our kids, right? More effective way here — to build a virtual city and let self-driving car to learn all its tricks there first. That's exactly how we train auto- pilots right now. Create a virtual city based on a real map, populate with pedestrians and let the car learn to kill as few people as possible. When the robot is reasonably confident in this artificial GTA, it's freed to test in the real streets. Fun! There may be two different approaches — Model-Based and Model-Free. Model-Based means that car needs to memorize a map or its parts. That's a pretty outdated approach since it's impossible for the poor self-driving car to memorize the whole planet. In Model-Free learning, the car doesn't memorize every movement but tries to generalize situations and act rationally while obtaining a maximum reward. HOW MACHINES BEHAVE IN CASE OF FIRE. CLASSICAL, MACHINE REINFORCEMENT PROGRAMMING. LEARNING LEARNING costed all th near secordng to my atten oe Remember the news about Al beating a top player at the game of Go? Despite shortly before this it being proved that the number of combinations in this game is greater than the number of atoms in the universe. This means the machine could not remember all the combinations and thereby win Go (as it did chess). At each turn, it simply chose the best move for each situation, and it did well enough to outplay a human meatbag. This approach is a core concept behind Q-learning and its derivatives (SARSA & DQN). 'Q' in the name stands for "Quality" as a robot learns to perform the most "qualitative" action in each situation and all the situations are memorized as a simple markovian process. states (AEN a tearing learn ta maize the rewards +10 +5 MARKOV PROCESS Such a machine can test billions of situations in a virtual environment, remembering which solutions led to greater reward. But how can it distinguish previously seen situations from a completely new one? If a self-driving car is at a road crossing and the traffic light turns green — does it mean it can go now? What if there's an ambulance rushing through a street nearby? The answer today is "no one knows". There's no easy answer. Researchers are constantly searching for it but meanwhile only finding workarounds. Some would hardcode all the situations manually that let them solve exceptional cases, like the trolley problem. Others would go deep and let neural networks do the job of figuring it out. This led us to the evolution of Q-learning called Deep Q-Network (DQN). But they are not a silver bullet either. Reinforcement Learning for an average person would look like a real artificial intelligence. Because it makes you think wow, this machine is making decisions in real life situations! This topic is hyped right now, it's advancing with incredible pace and intersecting with a neural network to clean your floor more accurately. Amazing world of technologies! Off-topic. When I was a student, genetic algorithms (link has cool visualization) were really popular. This is about throwing a bunch of robots into a single environment and making them try reaching the goal until they die. Then we pick the best ones, cross them, mutate some genes and rerun the simulation. After a few milliard years, we will get an intelligent creature. Probably. Evolution at its finest. Genetic algorithms are considered as part of reinforcement learning and they have the most important feature proved by decade-long practice: no one gives a shit about them. Humanity still couldn't come up with a task where those would be more effective than other methods. But they are great for student experiments and let people get their university supervisors excited about "artificial intelligence” without too much labour. And youtube would love it as well. add a comment here Part 3. Ensemble Methods Ensemble Methods "Bunch of stupid trees learning to correct errors of each other” Nowadays is used for: + Everything that fits classical algorithm approaches (but works better) + Search systems (*) + Computer vision + Object detection Popular algorithms: Random Forest, Gradient Boosting add a comment here It's time for modern, grown-up methods. Ensembles and neural networks are two main fighters paving our path to a singularity. Today they are producing the most accurate results and are widely used in production. However, the neural networks got all the hype today, while the words like "boosting" or "bagging" are scarce hipsters on TechCrunch. Despite all the effectiveness the idea behind these is overly simple. If you take a bunch of inefficient algorithms and force them to correct each other's mistakes, the overall quality of a system will be higher than even the best individual algorithms. You'll get even better results if you take the most unstable algorithms that are predicting completely different results on small noise in input data. Like Regression and Decision Tree: . These algorithms are so sensitive to even a single outlier in input data to have models go mad. In fact, this is what we need. We can use any algorithm we know to create an ensemble. Just throw a bunch of classifiers, spice it up with regression and don't forget to measure accuracy. From my experience: don't even try a Bayes or KNN here. Although "dumb", they are really stable. That's boring and predictable. Like your ex. Instead, there are three battle-tested methods to create ensembles. Stacking Output of several parallel models is passed as input to the last one which makes a final decision. Like that girl who asks her girlfriends whether to meet with you in order to make the final decision herself. DIFFERENT ALGORITHMS (ra) ~ [2 FRA decsion st _- ® sien ° (oa) — one oe » (ses) RS 4 STACKING Emphasis here on the word "different". Mixing the same algorithms on the same data would make no sense. The choice of algorithms is completely up to you. However, for final decision-making model, regression is usually a good choice. Based on my experience stacking is less popular in practice, because two other methods are giving better accuracy. Bagging aka Bootstrap AGGregatING. Use the same algorithm but train it on different subsets of original data. In the end — just average answers. Data in random subsets may repeat. For example, from a set like "1-2-3" we can get subsets like use these new datasets to teach the same algorithm several times and 2" and so on. We then predict the final answer via simple majority voting. SAME BAGSING ow TREE @P ALcoriTins = yy ae A e otis ar sus Avera we BAGGING The most famous example of bagging is the Random Forest algorithm, which is simply bagging on the decision trees (which were illustrated above). When you open your phone's camera app and see it drawing boxes around people's faces — it's probably the results of Random Forest work. Neural networks would be too slow to run real-time yet bagging is ideal given it can calculate trees on all the shaders of a video card or on these new fancy ML processors. In some tasks, the ability of the Random Forest to run in parallel is more important than a small loss in accuracy to the boosting, for example. off. specially in real-time processing. There is always a trade- Boosting Algorithms are trained one by one sequentially. Each subsequent one paying most of its attention to data points that were mispredicted by the previous one. Repeat until you are happy. Same as in bagging, we use subsets of our data but this time they are not randomly generated. Now, in each subsample we take a part of the data the previous algorithm failed to process. Thus, we make a new algorithm learn to fix the errors of the previous one. é (aa) Oh, en um rece on ts a BOOSTING countries precision of classification that all cool kids can envy. The cons were already called out — it doesn't parallelize. But it's still faster than neural networks. It's like a race between a dump truck and a racecar, The truck can do more, but if you want to go fast — take a car. If you want a real example of boosting — open Facebook or Google and start typing in a search query. Can you hear an army of trees roaring and smashing together to sort results by relevancy? That's because they are using boosting. Nowadays there are three popular tools for boosting, you can read a comparative report in CatBoost vs. LightGBM vs, XGBoos! collapse all] [show all] 1 comment Part 4. Neural Networks and Deep Leaning ,@-@.. BS Ol @ SP add a comment here The End: when the war with the machines? The main problem here is that the question "when will the machines become smarter than us and enslave everyone?" is initially wrong. There are too many hidden conditions in it. We say "become smarter than us" like we mean that there is a certain unified scale of intelligence. The top of which is a human, dogs are a bit lower, and stupid pigeons are hanging around at the very bottom. That's wrong. If this were the case, every human must beat animals in everything but it's not true. The average squirrel can remember a thousand hidden places with nuts — I can't even remember where are my keys. So intelligence is a set of different skills, not a single measurable value? Or is remembering nuts stashed locations not included in intelligence? An even more interesting question for me - why do we believe that the human brain possibilities are limited? There are many popular graphs on the Internet, where the technological progress is drawn as an exponent and the human possibilities are constant. But is it? Ok, multiply 1680 by 950 right now in your mind. I know you won't even try, lazy bastards. But give you a calculator — you'll do it in two seconds. Does this mean that the calculator just expanded the capabilities of your brain? If yes, can I continue to expand them with other machines? Like, use notes in my phone to not to remember a shitload of data? Oh, seems like I'm doing it right now. I'm expanding the capabilities of my brain with the machines. Think about it. Thanks for reading.

You might also like