diff --git a/22__pandas-how-to-filter-results-of-value_counts.patch b/22__pandas-how-to-filter-results-of-value_counts.patch new file mode 100644 index 0000000..e69de29 diff --git a/README.md b/README.md new file mode 100644 index 0000000..32615a6 --- /dev/null +++ b/README.md @@ -0,0 +1,101 @@ +# python +Jupyter notebooks and datasets for the interesting pandas/python/data science video series. + +# Contribution + +Feel free to contribute or suggest new ideas. To get in touch write on [mail](mailto:grouprivl@gmail.com?subject=[GitHub]%20Source%20Python). + +You can find nice guide about GitHub contribution: +* [Contributing to projects](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) +* [Step-by-step guide to contributing on GitHub](https://www.dataschool.io/how-to-contribute-on-github/) + +# Who is this repo for? + +For people who are interested in data science, data analysis and finding interesting insights for data. This repository is related to sites: +* [DataScientYst.com - Data Science Tutorials, Exercises, Guides, Videos with Python and Pandas](https://datascientyst.com/) +* [SoftHints.com - Python, Pandas, Linux, SQL Tutorials and Guides](https://softhints.com/) + +where you can find more interesting articles. + +New website dedicated to Pandas and Data Science was started: https://datascientyst.com/. It has better organization and covers topics in many areas. + + +The youtube channel is: + +* [SoftHints Youtube](https://www.youtube.com/@softhints/) +* [Popular Videos](https://www.youtube.com/@softhints/videos) + +# Latest Videos + +## Pandas + +0. [Pandas Tutorial : How to split columns of dataframe](https://www.youtube.com/watch?v=cCoGsFVPVh0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +1. [Pandas Tutorial : How to split dataframe by string or date](https://www.youtube.com/watch?v=7sgDvC4k6Xg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +2. [Easily extract tables from websites with pandas and python](https://www.youtube.com/watch?v=OXA_ZD1gR6A&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +3. [Easily extract information from excel with Python and Pandas](https://www.youtube.com/watch?v=hJMH_1o8eU0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +4. [Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2](https://www.youtube.com/watch?v=702lkQbZx50&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +5. [Pandas is column part of another column in the same row of dataframe](https://www.youtube.com/watch?v=duOHHDqI40c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +6. [Load multiple CSV files into a single Dataframe](https://www.youtube.com/watch?v=30ndwJm1I5c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +7. [Analyze top youtube channels 2019 with pandas - PewDiePie I](https://www.youtube.com/watch?v=mG9OnH9R5yM&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +8. [dataframe column transformations ( str, int, category, concat)](https://www.youtube.com/watch?v=5pbRivDYzko&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +9. [Pandas DataFrame generate n-level hierarchical JSON](https://www.youtube.com/watch?v=lCcE-0bykRU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +10. [Pandas How add new column existing DataFrame](https://www.youtube.com/watch?v=UvCO5gKQqtE&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +11. [Python Pandas find and drop duplicate data](https://www.youtube.com/watch?v=4ixLp8aFomw&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +12. [Map the headers to a column with pandas?](https://www.youtube.com/watch?v=3g6KG_8zq0E&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +13. [Pandas count values in a column of type list](https://www.youtube.com/watch?v=lx7KFd6BPcg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +14. [How to Optimize and Speed Up Pandas](https://www.youtube.com/watch?v=nW5ltiwV-6Y&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +15. [Pandas count and percentage by value for a column](https://www.youtube.com/watch?v=P5pxJkv71BU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +16. [Pandas use a list of values to select rows from a column](https://www.youtube.com/watch?v=jlSbo5wmTPQ&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) + + +## python + +0. [python string split by separator](https://www.youtube.com/watch?v=iBsg75W2Vig&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +1. [python random number generation examples](https://www.youtube.com/watch?v=WDTnZgSreL4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +2. [bilingual programming education in java and python](https://www.youtube.com/watch?v=eEHBjP06WSI&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +3. [biggest programmer salaries 2018](https://www.youtube.com/watch?v=X2bUUkWC7dE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +4. [python extract text from image or pdf](https://www.youtube.com/watch?v=PK-GvWWQ03g&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +5. [Python read validate and import CSV JSON file to MySQL](https://www.youtube.com/watch?v=WbW0rHCX2UU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +6. [python regex match date](https://www.youtube.com/watch?v=o8Je7hPgsdU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +7. [python regex cheat sheet with examples](https://www.youtube.com/watch?v=o_CSmob64uU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +8. [python string methods tutorial](https://www.youtube.com/watch?v=7yuPVq9DtV0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +9. [python shuffle list](https://www.youtube.com/watch?v=WFRBxz6AeZI&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +10. [Easy install of Python and PyCharm on Windows](https://www.youtube.com/watch?v=cDOlBRzHRI0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +11. [learn python for beginners complete tutorial 2018](https://www.youtube.com/watch?v=hnc3bGtYQsQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +12. [think python chaper 2](https://www.youtube.com/watch?v=A6EIl677ntQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +13. [Python/Java bad and good code comments examples](https://www.youtube.com/watch?v=SRCToEkq7to&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +14. [intellij pycharm surround string quote](https://www.youtube.com/watch?v=AgRHEGB8Urs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +15. [Top Five Most Annoying Programming Mistakes For Beginners with Python](https://www.youtube.com/watch?v=JToPoYip-C4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +16. [No Python Interpreter Configured For The Module - PyCharm/IntelliJ](https://www.youtube.com/watch?v=mkKDI6y2kyE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +17. [python split string into list examples](https://www.youtube.com/watch?v=T8EfomTlcfA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +18. [How to migrate/update virtualenv from Python 3.5 to 3.6](https://www.youtube.com/watch?v=cFTB5EJUxzw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +19. [Python String Remove Last n Characters](https://www.youtube.com/watch?v=hZHfdOKFlAw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +20. [Python Pandas 7 examples of filters and lambda apply](https://www.youtube.com/watch?v=7nYkJctgSSA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +21. [The simplest way to run python headless test with Chrome on Ubuntu](https://www.youtube.com/watch?v=BdppFIT_lIs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +22. [Python 3 Simple Examples get current folder and go to parent](https://www.youtube.com/watch?v=tQ_9a6UhUQs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +23. [python join/merge list two and more lists](https://www.youtube.com/watch?v=-zcJ4uB7XUo&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +24. [Easy way to convert dictionary to SQL insert with Python](https://www.youtube.com/watch?v=hUXGQwTSfMs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +25. [Python 3 detect and prevent TypeError-s](https://www.youtube.com/watch?v=DJd0JYaVkqA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +26. [The right way to declare multiple variables in Python](https://www.youtube.com/watch?v=8OoLg39nNlo&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +27. [Python uninstall a module installed with pip install and virtual envirornment](https://www.youtube.com/watch?v=03ahRfkfwME&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +28. [python performance profiling in pycharm](https://www.youtube.com/watch?v=EZ-im7m8630&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +29. [Python Cumulative Sum per Group with Pandas](https://www.youtube.com/watch?v=1tCbvYv_ibw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +30. [PyCharm - Breakpoints, Favorites, TODOs simple examples](https://www.youtube.com/watch?v=_fNZLrz97kg&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +31. [Python 3 simple ways to list files and folders](https://www.youtube.com/watch?v=oJdubyyJNIQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +32. [Python 3 elegant way to find most/less common element in a list](https://www.youtube.com/watch?v=P4LonC3puS4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +33. [clock angle problem final](https://www.youtube.com/watch?v=eIRhXharV7k&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +34. [Python 3 List Comprehension Tutorial for beginners](https://www.youtube.com/watch?v=DmSephyJNtQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +35. [python 3 how to remove white spaces](https://www.youtube.com/watch?v=0k0fvqikaoE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +36. [Pandas Tutorial : How to split dataframe by string or date](https://www.youtube.com/watch?v=7sgDvC4k6Xg&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +37. [improve your programming skills with fun](https://www.youtube.com/watch?v=uoAV7651Op0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +38. [pandas dataframe search for string in all columns filter regex](https://www.youtube.com/watch?v=vbHFIALhSWE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +39. [Pandas is column part of another column in the same row of dataframe](https://www.youtube.com/watch?v=duOHHDqI40c&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +40. [Easily extract tables from websites with pandas and python](https://www.youtube.com/watch?v=OXA_ZD1gR6A&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +41. [Easily extract information from excel with Python and Pandas](https://www.youtube.com/watch?v=hJMH_1o8eU0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +42. [Python asterisk argument or What is the usage of * asterisk in Python](https://www.youtube.com/watch?v=JBm8iptLnuA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +43. [Easy Image validation with Python - valid image, blank or pattern](https://www.youtube.com/watch?v=HMB4zrP_-HY&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +44. [Pandas DataFrame generate n-level hierarchical JSON](https://www.youtube.com/watch?v=lCcE-0bykRU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +45. [Python group or sort list of lists by common element](https://www.youtube.com/watch?v=zVQJQxpedm8&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +46. [Think Python: Chapter 3 Functions 3.2](https://www.youtube.com/watch?v=Ol3Dwucax9U&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +47. [Questions and Answers 1 Improve OCR and tabula range](https://www.youtube.com/watch?v=nrF_Rgh88no&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +48. [Map the headers to a column with pandas?](https://www.youtube.com/watch?v=3g6KG_8zq0E&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) diff --git a/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb b/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb index 853b283..217df97 100644 --- a/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb +++ b/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb @@ -610,7 +610,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.8" } }, "nbformat": 4, diff --git a/notebooks/Books/Think Python/Chapter_5__Conditionals_and_recursion.ipynb b/notebooks/Books/Think Python/Chapter_5__Conditionals_and_recursion.ipynb new file mode 100644 index 0000000..73f5439 --- /dev/null +++ b/notebooks/Books/Think Python/Chapter_5__Conditionals_and_recursion.ipynb @@ -0,0 +1,1682 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 5 Conditionals and recursion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Modulus operator\n", + "* Boolean expressions\n", + "* Logical operators\n", + "* Conditional and Alternative execution\n", + "* Chained and Nested conditionals\n", + "* Recursion and Infinite recursion\n", + "* Keyboard input\n", + "* Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.1 Floor division and modulus" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The main topic of this chapter is the if statement, which\n", + "executes different code depending on the state of the program.\n", + "But first I want to introduce two new operators: floor division\n", + "and modulus." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The floor division operator, //, divides\n", + "two numbers and rounds down to an integer. For example, suppose the\n", + "run time of a movie is 105 minutes. You might want to know how\n", + "long that is in hours. Conventional division\n", + "returns a floating-point number:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1.75" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "minutes = 105\n", + "minutes / 60" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But we don’t normally write hours with decimal points. Floor\n", + "division returns the integer number of hours, rounding down:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "minutes = 105\n", + "hours = minutes // 60\n", + "hours" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To get the remainder, you could subtract off one hour in minutes:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "45" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "remainder = minutes - hours * 60\n", + "remainder" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An alternative is to use the modulus operator, %, which\n", + "divides two numbers and returns the remainder." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "45" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "remainder = minutes % 60\n", + "remainder" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The modulus operator is more useful than it seems. For\n", + "example, you can check whether one number is divisible by another—if\n", + "x % y is zero, then x is divisible by y.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Also, you can extract the right-most digit\n", + "or digits from a number. For example, x % 10 yields the\n", + "right-most digit of x (in base 10). Similarly x % 100\n", + "yields the last two digits." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you are using Python 2, division works differently. The\n", + "division operator, /, performs floor division if both\n", + "operands are integers, and floating-point division if either\n", + "operand is a float.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.2 Boolean expressions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A boolean expression is an expression that is either true\n", + "or false. The following examples use the \n", + "operator ==, which compares two operands and produces\n", + "True if they are equal and False otherwise:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "5 == 5" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "5 == 6" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "True and False are special\n", + "values that belong to the type bool; they are not strings:\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "bool" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(True)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "bool" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(False)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "str" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type('True') ## Question?" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "str" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type('true')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The == operator is one of the relational operators; the\n", + "others are:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x != y # x is not equal to y\n", + "x > y # x is greater than y\n", + "x < y # x is less than y\n", + "x >= y # x is greater than or equal to y\n", + "x <= y # x is less than or equal to y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Although these operations are probably familiar to you, the Python\n", + "symbols are different from the mathematical symbols. A common error\n", + "is to use a single equal sign (=) instead of a double equal sign\n", + "(==). Remember that = is an assignment operator and\n", + "== is a relational operator. There is no such thing as\n", + "=< or =>.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.3 Logical operators" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are three logical operators: and, or, and not. The semantics (meaning) of these operators is\n", + "similar to their meaning in English. For example,\n", + "x > 0 and x < 10 is true only if x is greater than 0\n", + "and less than 10.\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = 5\n", + "x > 0 and x < 10" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = 15\n", + "x > 0 and x < 10" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "n%2 == 0 or n%3 == 0 is true if either or both of the\n", + "conditions is true, that is, if the number is divisible by 2 or\n", + "3." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "False True False\n", + "False False True\n", + "True True True\n", + "False False False\n" + ] + } + ], + "source": [ + "for n in [4,9,6, 7]:\n", + " print(n%2 == 0 and n%3 == 0, n%2 == 0, n%3 == 0 )" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True True False\n", + "True False True\n", + "True True True\n", + "False False False\n" + ] + } + ], + "source": [ + "for n in [4,9,6,7]:\n", + " print(n%2 == 0 or n%3 == 0, n%2 == 0, n%3 == 0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, the not operator negates a boolean\n", + "expression, so not (x > y) is true if x > y is false,\n", + "that is, if x is less than or equal to y." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "not True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Strictly speaking, the operands of the logical operators should be\n", + "boolean expressions, but Python is not very strict.\n", + "Any nonzero number is interpreted as True:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "42 and True" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "0 and True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This flexibility can be useful, but there are some subtleties to\n", + "it that might be confusing. You might want to avoid it (unless\n", + "you know what you are doing)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Bonus: Boolean algebra and Truth table\n", + "\n", + "* https://en.wikipedia.org/wiki/Boolean_algebra\n", + "* https://en.wikipedia.org/wiki/Truth_table" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.4 Conditional execution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "In order to write useful programs, we almost always need the ability\n", + "to check conditions and change the behavior of the program\n", + "accordingly. Conditional statements give us this ability. The\n", + "simplest form is the if statement:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "x is positive\n" + ] + } + ], + "source": [ + "x = 42\n", + "if x > 0:\n", + " print('x is positive')" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1 is positive\n", + "4 is positive\n" + ] + } + ], + "source": [ + "for x in [1, -2, 4]: ## Question?\n", + " if x > 0:\n", + " print(f'{x} is positive')" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "'>' not supported between instances of 'str' and 'int'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m'5'\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: '>' not supported between instances of 'str' and 'int'" + ] + } + ], + "source": [ + "'5' > 0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The boolean expression after if is\n", + "called the condition. If it is true, the indented\n", + "statement runs. If not, nothing happens.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "if statements have the same structure as function definitions:\n", + "a header followed by an indented body. Statements like this are\n", + "called compound statements." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There is no limit on the number of statements that can appear in\n", + "the body, but there has to be at least one.\n", + "Occasionally, it is useful to have a body with no statements (usually\n", + "as a place keeper for code you haven’t written yet). In that\n", + "case, you can use the pass statement, which does nothing.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = -42\n", + "if x < 0:\n", + " pass # TODO: need to handle negative values!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.5 Alternative execution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A second form of the if statement is “alternative execution”,\n", + "in which there are two possibilities and the condition determines\n", + "which one runs. The syntax looks like this:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "x is even\n" + ] + } + ], + "source": [ + "if x % 2 == 0:\n", + " print('x is even')\n", + "else:\n", + " print('x is odd')" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1 is odd\n", + "-2 is even\n", + "4 is even\n" + ] + } + ], + "source": [ + "# f-strings or string interpollation\n", + "\n", + "for x in [1, -2, 4]:\n", + " if x % 2 == 0:\n", + " print(f'{x} is even')\n", + " else:\n", + " print(f'{x} is odd')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the remainder when x is divided by 2 is 0, then we know that\n", + "x is even, and the program displays an appropriate message. If\n", + "the condition is false, the second set of statements runs.\n", + "Since the condition must be true or false, exactly one of the\n", + "alternatives will run. The alternatives are called branches, because they are branches in the flow of execution.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.6 Chained conditionals" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Sometimes there are more than two possibilities and we need more than\n", + "two branches. One way to express a computation like that is a chained conditional:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "x is less than y\n" + ] + } + ], + "source": [ + "y = 42\n", + "if x < y and 1:\n", + " print('x is less than y')\n", + "elif x > y:\n", + " print('x is greater than y')\n", + "else:\n", + " print('x and y are equal')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "elif is an abbreviation of “else if”. Again, exactly one\n", + "branch will run. There is no limit on the number of elif statements. If there is an else clause, it has to be\n", + "at the end, but there doesn’t have to be one.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if choice == 'a':\n", + " draw_a()\n", + "elif choice == 'b':\n", + " draw_b()\n", + "elif choice == 'c':\n", + " draw_c()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Each condition is checked in order. If the first is false,\n", + "the next is checked, and so on. If one of them is\n", + "true, the corresponding branch runs and the statement\n", + "ends. Even if more than one condition is true, only the\n", + "first true branch runs. " + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "100\n" + ] + } + ], + "source": [ + "if x < 100: ## Question: What will be the output?\n", + " print('100')\n", + "elif x < 101:\n", + " print('101')\n", + "elif x < 102:\n", + " print('102')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.7 Nested conditionals" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One conditional can also be nested within another. We could have\n", + "written the example in the previous section like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if x == y:\n", + " print('x and y are equal')\n", + "else:\n", + " if x < y:\n", + " print('x is less than y')\n", + " else:\n", + " print('x is greater than y')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The outer conditional contains two branches. The\n", + "first branch contains a simple statement. The second branch\n", + "contains another if statement, which has two branches of its\n", + "own. Those two branches are both simple statements,\n", + "although they could have been conditional statements as well." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Although the indentation of the statements makes the structure\n", + "apparent, nested conditionals become difficult to read very\n", + "quickly. It is a good idea to avoid them when you can." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Logical operators often provide a way to simplify nested conditional\n", + "statements. For example, we can rewrite the following code using a\n", + "single conditional:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if 0 < x:\n", + " if x < 10:\n", + " print('x is a positive single-digit number.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The print statement runs only if we make it past both\n", + "conditionals, so we can get the same effect with the and operator:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if 0 < x and x < 10:\n", + " print('x is a positive single-digit number.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For this kind of condition, Python provides a more concise option:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if 0 < x < 10:\n", + " print('x is a positive single-digit number.') ## Question?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.8 Recursion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is legal for one function to call another;\n", + "it is also legal for a function to call itself. It may not be obvious\n", + "why that is a good thing, but it turns out to be one of the most\n", + "magical things a program can do.\n", + "For example, look at the following function:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "def countdown(n):\n", + " if n <= 0:\n", + " print('Blastoff!')\n", + " else:\n", + " print(n)\n", + " countdown(n-1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If n is 0 or negative, it outputs the word, “Blastoff!”\n", + "Otherwise, it outputs n and then calls a function named countdown—itself—passing n-1 as an argument." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What happens if we call this function like this?" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3\n", + "2\n", + "1\n", + "Blastoff!\n" + ] + } + ], + "source": [ + "countdown(3) ## Question?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The execution of countdown begins with n=3, and since\n", + "n is greater than 0, it outputs the value 3, and then calls itself..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The countdown that got n=3 returns." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And then you’re back in __main__. So, the\n", + "total output looks like this:\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A function that calls itself is recursive; the process of\n", + "executing it is called recursion.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As another example, we can write a function that prints a\n", + "string n times." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "def print_n(s, n):\n", + " if n <= 0:\n", + " return\n", + " print(s)\n", + " print_n(s, n-1)" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "s\n", + "s\n" + ] + } + ], + "source": [ + "print_n('s', 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If n <= 0 the return statement exits the function. The\n", + "flow of execution immediately returns to the caller, and the remaining\n", + "lines of the function don’t run.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The rest of the function is similar to countdown: it displays\n", + "s and then calls itself to display s n−1 additional\n", + "times. So the number of lines of output is 1 + (n - 1), which\n", + "adds up to n." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For simple examples like this, it is probably easier to use a for loop. But we will see examples later that are hard to write\n", + "with a for loop and easy to write with recursion, so it is\n", + "good to start early.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.9 Stack diagrams for recursive functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In Section 3.9, we used a stack diagram to represent\n", + "the state of a program during a function call. The same kind of\n", + "diagram can help interpret a recursive function." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Every time a function gets called, Python creates a\n", + "frame to contain the function’s local variables and parameters.\n", + "For a recursive function, there might be more than one frame on the\n", + "stack at the same time." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Figure 5.1 shows a stack diagram for countdown called with\n", + "n = 3." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As usual, the top of the stack is the frame for __main__.\n", + "It is empty because we did not create any variables in \n", + "__main__ or pass any arguments to it.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The four countdown frames have different values for the\n", + "parameter n. The bottom of the stack, where n=0, is\n", + "called the base case. It does not make a recursive call, so\n", + "there are no more frames." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an exercise, draw a stack diagram for print_n called with\n", + "s = 'Hello' and n=2.\n", + "Then write a function called do_n that takes a function\n", + "object and a number, n, as arguments, and that calls\n", + "the given function n times." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.10 Infinite recursion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If a recursion never reaches a base case, it goes on making\n", + "recursive calls forever, and the program never terminates. This is\n", + "known as infinite recursion, and it is generally not\n", + "a good idea. Here is a minimal program with an infinite recursion:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "def recurse():\n", + " recurse()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In most programming environments, a program with infinite recursion\n", + "does not really run forever. Python reports an error\n", + "message when the maximum recursion depth is reached:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "ename": "RecursionError", + "evalue": "maximum recursion depth exceeded", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mRecursionError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mrecurse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m## Question?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mrecurse\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrecurse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mrecurse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "... last 1 frames repeated, from the frame below ...\n", + "\u001b[0;32m\u001b[0m in \u001b[0;36mrecurse\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrecurse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mrecurse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mRecursionError\u001b[0m: maximum recursion depth exceeded" + ] + } + ], + "source": [ + "recurse() ## Question?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This traceback is a little bigger than the one we saw in the\n", + "previous chapter. When the error occurs, there are 1000\n", + "recurse frames on the stack!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you encounter an infinite recursion by accident, review\n", + "your function to confirm that there is a base case that does not\n", + "make a recursive call. And if there is a base case, check whether\n", + "you are guaranteed to reach it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.11 Keyboard input" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The programs we have written so far accept no input from the user.\n", + "They just do the same thing every time." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Python provides a built-in function called input that\n", + "stops the program and\n", + "waits for the user to type something. When the user presses Return or Enter, the program resumes and input\n", + "returns what the user typed as a string. In Python 2, the same\n", + "function is called raw_input.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "x\n" + ] + } + ], + "source": [ + "text = input()" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'x'" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Before getting input from the user, it is a good idea to print a\n", + "prompt telling the user what to type. input can take a\n", + "prompt as an argument:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "What...is your name?\n", + "x\n" + ] + } + ], + "source": [ + "name = input('What...is your name?\\n')" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'x'" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "name" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The sequence \\n at the end of the prompt represents a newline, which is a special character that causes a line break.\n", + "That’s why the user’s input appears below the prompt. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you expect the user to type an integer, you can try to convert\n", + "the return value to int:" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "What...is the airspeed velocity of an unladen swallow?\n", + "100\n" + ] + } + ], + "source": [ + "prompt = 'What...is the airspeed velocity of an unladen swallow?\\n'\n", + "speed = input(prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'100'" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "speed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "But if the user types something other than a string of digits,\n", + "you get an error:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "speed = input(prompt)\n", + "What...is the airspeed velocity of an unladen swallow?\n", + "What do you mean, an African or a European swallow?\n", + "int(speed)\n", + "ValueError: invalid literal for int() with base 10" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "We will see how to handle this kind of error later.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "str" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(speed) ## Question?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.12 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When a syntax or runtime error occurs, the error message contains\n", + "a lot of information, but it can be overwhelming. The most\n", + "useful parts are usually:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Syntax errors are usually easy to find, but there are a few\n", + "gotchas. Whitespace errors can be tricky because spaces and\n", + "tabs are invisible and we are used to ignoring them.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [], + "source": [ + "x = 5 ## Question?\n", + "y = 6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In this example, the problem is that the second line is indented by\n", + "one space. But the error message points to y, which is\n", + "misleading. In general, error messages indicate where the problem was\n", + "discovered, but the actual error might be earlier in the code,\n", + "sometimes on a previous line.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The same is true of runtime errors. Suppose you are trying\n", + "to compute a signal-to-noise ratio in decibels. The formula\n", + "is SNRdb = 10 log10 (Psignal / Pnoise). In Python,\n", + "you might write something like this:" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "ename": "ValueError", + "evalue": "math domain error", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mnoise_power\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m10\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mratio\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msignal_power\u001b[0m \u001b[0;34m//\u001b[0m \u001b[0mnoise_power\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0mdecibels\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m10\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlog10\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mratio\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdecibels\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mValueError\u001b[0m: math domain error" + ] + } + ], + "source": [ + "import math\n", + "signal_power = 9\n", + "noise_power = 10\n", + "ratio = signal_power // noise_power\n", + "decibels = 10 * math.log10(ratio)\n", + "print(decibels)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "When you run this program, you get an exception:\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The error message indicates line 5, but there is nothing\n", + "wrong with that line. To find the real error, it might be\n", + "useful to print the value of ratio, which turns out to\n", + "be 0. The problem is in line 4, which uses floor division\n", + "instead of floating-point division.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should take the time to read error messages carefully, but don’t\n", + "assume that everything they say is correct." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.13 Glossary" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/Books/Think Python/Chapter_6__Fruitful_functions.ipynb b/notebooks/Books/Think Python/Chapter_6__Fruitful_functions.ipynb new file mode 100644 index 0000000..b47322b --- /dev/null +++ b/notebooks/Books/Think Python/Chapter_6__Fruitful_functions.ipynb @@ -0,0 +1,1699 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 6  Fruitful functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "* Return values\n", + "* Incremental development\n", + "* Composition\n", + "* Boolean functions\n", + "* More recursion\n", + "* Leap of faith" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.1 Return values" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Many of the Python functions we have used, such as the math\n", + "functions, produce return values. But the functions we’ve written\n", + "are all void: they have an effect, like printing a value\n", + "or moving a turtle, but they don’t have a return value. In\n", + "this chapter you will learn to write fruitful functions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def print_str(s):\n", + " print(s)\n", + "print_str(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def print(s):\n", + " print(s)\n", + "print(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "del print" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def double_int(i):\n", + " return i * 2\n", + "double_int(2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = print_str(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = double_int(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(x, y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Calling the function generates a return\n", + "value, which we usually assign to a variable or use as part of an\n", + "expression." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "e = math.exp(1.0)\n", + "height = radius * math.sin(radians)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "math.exp(1.0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import math \n", + "math.sin(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The functions we have written so far are void. Speaking casually,\n", + "they have no return value; more precisely,\n", + "their return value is None." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def print_me(s):\n", + " print(s)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print_me('x')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = print_me('x')\n", + "y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this chapter, we are (finally) going to write fruitful functions.\n", + "The first example is area, which returns the area of a circle\n", + "with the given radius:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def area(radius):\n", + " a = math.pi * radius**2\n", + " return a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "We have seen the return statement before, but in a fruitful\n", + "function the return statement includes\n", + "an expression. This statement means: “Return immediately from\n", + "this function and use the following expression as a return value.”\n", + "The expression can be arbitrarily complicated, so we could\n", + "have written this function more concisely:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def area(radius):\n", + " return math.pi * radius**2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "On the other hand, temporary variables like a can make\n", + "debugging easier.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Sometimes it is useful to have multiple return statements, one in each\n", + "branch of a conditional:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def absolute_value(x):\n", + " if x < 0:\n", + " return -x\n", + " else:\n", + " return x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Since these return statements are in an alternative conditional,\n", + "only one runs." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As soon as a return statement runs, the function\n", + "terminates without executing any subsequent statements.\n", + "Code that appears after a return statement, or any other place\n", + "the flow of execution can never reach, is called dead code.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def area_x(radius):\n", + " return 0\n", + " print('x')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "area_x(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In a fruitful function, it is a good idea to ensure\n", + "that every possible path through the program hits a\n", + "return statement. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def absolute_value(x):\n", + " if x < 0:\n", + " return -x\n", + " if x > 0:\n", + " return x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This function is incorrect because if x happens to be 0,\n", + "neither condition is true, and the function ends without hitting a\n", + "return statement. If the flow of execution gets to the end\n", + "of a function, the return value is None, which is not\n", + "the absolute value of 0.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(absolute_value(0))\n", + "None" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "By the way, Python provides a built-in function called \n", + "abs that computes absolute values.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an **exercise**, write a compare function that\n", + "takes two values, x and y, and returns 1 if x > y,\n", + "0 if x == y, and -1 if x < y.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Bonus** You can return more tham one variable from a function by using list/tuple etc" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def area_y(radius):\n", + " return 0, 1, 3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "area_y(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x,y,z = area_y(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.2 Incremental development" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note**: Have a clear \n", + "* use case\n", + "* specifications\n", + "* test results/case:\n", + "\n", + "> these values so that the horizontal distance is 3 and the\n", + "vertical distance is 4; that way, the result is 5, the hypotenuse \n", + "of a 3-4-5 triangle." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you write larger functions, you might find yourself\n", + "spending more time debugging." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To deal with increasingly complex programs,\n", + "you might want to try a process called\n", + "incremental development. The goal of incremental development\n", + "is to avoid long debugging sessions by adding and testing only\n", + "a small amount of code at a time.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an example, suppose you want to find the distance between two\n", + "points, given by the coordinates (x1, y1) and (x2, y2).\n", + "By the Pythagorean theorem, the distance is:\n", + "\n", + "distance = \t√(x2 − x1)2 + (y2 − y1)2\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first step is to consider what a distance function should\n", + "look like in Python. In other words, what are the inputs (parameters)\n", + "and what is the output (return value)?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this case, the inputs are two points, which you can represent\n", + "using four numbers. The return value is the distance represented by\n", + "a floating-point value." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Immediately you can write an outline of the function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def distance(x1, y1, x2, y2):\n", + " return 0.0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Obviously, this version doesn’t compute distances; it always returns\n", + "zero. But it is syntactically correct, and it runs, which means that\n", + "you can test it before you make it more complicated." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To test the new function, call it with sample arguments:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "distance(1, 2, 4, 6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "I chose these values so that the horizontal distance is 3 and the\n", + "vertical distance is 4; that way, the result is 5, the hypotenuse \n", + "of a 3-4-5 triangle. When testing a function, it is\n", + "useful to know the right answer.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "At this point we have confirmed that the function is syntactically\n", + "correct, and we can start adding code to the body.\n", + "A reasonable next step is to find the differences\n", + "x2 − x1 and y2 − y1. The next version stores those values in\n", + "temporary variables and prints them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def distance(x1, y1, x2, y2):\n", + " dx = x2 - x1\n", + " dy = y2 - y1\n", + " print('dx is', dx)\n", + " print('dy is', dy)\n", + " return 0.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "distance(1, 2, 4, 6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the function is working, it should display dx is 3 and \n", + "dy is 4. If so, we know that the function is getting the right\n", + "arguments and performing the first computation correctly. If not,\n", + "there are only a few lines to check." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we compute the sum of squares of dx and dy:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def distance(x1, y1, x2, y2):\n", + " dx = x2 - x1\n", + " dy = y2 - y1\n", + " dsquared = dx**2 + dy**2\n", + " print('dsquared is: ', dsquared)\n", + " return 0.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "distance(1, 2, 4, 6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Again, you would run the program at this stage and check the output\n", + "(which should be 25).\n", + "Finally, you can use math.sqrt to compute and return the result:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def distance(x1, y1, x2, y2):\n", + " dx = x2 - x1\n", + " dy = y2 - y1\n", + " dsquared = dx**2 + dy**2\n", + " result = math.sqrt(dsquared)\n", + " return result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "distance(1, 2, 4, 6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If that works correctly, you are done. Otherwise, you might\n", + "want to print the value of result before the return\n", + "statement." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The final version of the function doesn’t display anything when it\n", + "runs; it only returns a value. The print statements we wrote\n", + "are useful for debugging, but once you get the function working, you\n", + "should remove them. Code like that is called scaffolding\n", + "because it is helpful for building the program but is not part of the\n", + "final product.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you start out, you should add only a line or two of code at a\n", + "time. As you gain more experience, you might find yourself writing\n", + "and debugging bigger chunks. Either way, incremental development\n", + "can save you a lot of debugging time." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The key aspects of the process are:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an exercise, use incremental development to write a function\n", + "called hypotenuse that returns the length of the hypotenuse of a\n", + "right triangle given the lengths of the other two legs as arguments.\n", + "Record each stage of the development process as you go.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.3 Composition" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you should expect by now, you can call one function from within\n", + "another. As an example, we’ll write a function that takes two points,\n", + "the center of the circle and a point on the perimeter, and computes\n", + "the area of the circle." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Assume that the center point is stored in the variables xc and\n", + "yc, and the perimeter point is in xp and yp. The\n", + "first step is to find the radius of the circle, which is the distance\n", + "between the two points. We just wrote a function, distance, that does that:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "radius = distance(1, 2, 4, 6)\n", + "radius" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The next step is to find the area of a circle with that radius;\n", + "we just wrote that, too:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "result = area(radius)\n", + "result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Encapsulating these steps in a function, we get:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def circle_area(xc, yc, xp, yp): # 1, 2, 4, 6\n", + " radius = distance(xc, yc, xp, yp) # 5\n", + " result = area(radius) # 78.539\n", + " return result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The temporary variables radius and result are useful for\n", + "development and debugging, but once the program is working, we can\n", + "make it more concise by composing the function calls:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def circle_area(xc, yc, xp, yp):\n", + " return area(distance(xc, yc, xp, yp))\n", + "circle_area(1, 2, 4, 6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.4 Boolean functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Functions can return booleans, which is often convenient for hiding\n", + "complicated tests inside functions. \n", + "For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_divisible(x, y):\n", + " if x % y == 0:\n", + " return True\n", + " else:\n", + " return False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "It is common to give boolean functions names that sound like yes/no\n", + "questions; is_divisible returns either True or False\n", + "to indicate whether x is divisible by y." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is an example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_divisible(6, 4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_divisible(6, 3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_divisible(0, 0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result of the == operator is a boolean, so we can write the\n", + "function more concisely by returning it directly:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_divisible(x, y):\n", + " return x % y == 0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Boolean functions are often used in conditional statements:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if is_divisible(x, y):\n", + " print('x is divisible by y')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "It might be tempting to write something like:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if is_divisible(x, y) == True:\n", + " print('x is divisible by y')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "But the extra comparison is unnecessary." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an exercise, write a function is_between(x, y, z) that\n", + "returns True if x ≤ y ≤ z or False otherwise." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.5 More recursion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have only covered a small subset of Python, but you might\n", + "be interested to know that this subset is a complete\n", + "programming language, which means that anything that can be\n", + "computed can be expressed in this language. Any program ever written\n", + "could be rewritten using only the language features you have learned\n", + "so far (actually, you would need a few commands to control devices\n", + "like the mouse, disks, etc., but that’s all)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Proving that claim is a nontrivial exercise first accomplished by Alan\n", + "Turing, one of the first computer scientists (some would argue that he\n", + "was a mathematician, but a lot of early computer scientists started as\n", + "mathematicians). Accordingly, it is known as the Turing Thesis.\n", + "For a more complete (and accurate) discussion of the Turing Thesis,\n", + "I recommend Michael Sipser’s book Introduction to the\n", + "Theory of Computation." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To give you an idea of what you can do with the tools you have learned\n", + "so far, we’ll evaluate a few recursively defined mathematical\n", + "functions. A recursive definition is similar to a circular\n", + "definition, in the sense that the definition contains a reference to\n", + "the thing being defined. A truly circular definition is not very\n", + "useful:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you saw that definition in the dictionary, you might be annoyed. On\n", + "the other hand, if you looked up the definition of the factorial\n", + "function, denoted with the symbol !, you might get something like\n", + "this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "0! = 1 \n", + "n! = n (n−1)!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This definition says that the factorial of 0 is 1, and the factorial\n", + "of any other value, n, is n multiplied by the factorial of n−1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So 3! is 3 times 2!, which is 2 times 1!, which is 1 times\n", + "0!. Putting it all together, 3! equals 3 times 2 times 1 times 1,\n", + "which is 6.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you can write a recursive definition of something, you can\n", + "write a Python program to evaluate it. The first step is to decide\n", + "what the parameters should be. In this case it should be clear\n", + "that factorial takes an integer:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def factorial(n):" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the argument happens to be 0, all we have to do is return 1:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def factorial(n):\n", + " if n == 0:\n", + " return 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Otherwise, and this is the interesting part, we have to make a\n", + "recursive call to find the factorial of n−1 and then multiply it by\n", + "n:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def factorial(n):\n", + " if n == 0:\n", + " return 1\n", + " else:\n", + " recurse = factorial(n-1)\n", + " result = n * recurse\n", + " return result" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "720" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "factorial(6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The flow of execution for this program is similar to the flow of countdown in Section 5.8. If we call factorial\n", + "with the value 3:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since 3 is not 0, we take the second branch and calculate the factorial\n", + "of n-1..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The return value (2) is multiplied by n, which is 3, and the result, 6,\n", + "becomes the return value of the function call that started the whole\n", + "process.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Figure 6.1 shows what the stack diagram looks like for\n", + "this sequence of function calls." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The return values are shown being passed back up the stack. In each\n", + "frame, the return value is the value of result, which is the\n", + "product of n and recurse.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the last frame, the local\n", + "variables recurse and result do not exist, because\n", + "the branch that creates them does not run." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.6 Leap of faith\n", + "\n", + "#### flow of execution vs Leap of faith" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Following the flow of execution is one way to read programs, but\n", + "it can quickly become overwhelming. An\n", + "alternative is what I call the “leap of faith”. When you come to a\n", + "function call, instead of following the flow of execution, you assume that the function works correctly and returns the right\n", + "result." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In fact, you are already practicing this leap of faith when you use\n", + "built-in functions. When you call math.cos or math.exp,\n", + "you don’t examine the bodies of those functions. You just\n", + "assume that they work because the people who wrote the built-in\n", + "functions were good programmers." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def circle_area(xc, yc, xp, yp): # 1, 2, 4, 6\n", + " radius = distance(xc, yc, xp, yp) # 5\n", + " result = area(radius) # 78.539\n", + " return result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The same is true when you call one of your own functions. For\n", + "example, in Section 6.4, we wrote a function called \n", + "is_divisible that determines whether one number is divisible by\n", + "another. Once we have convinced ourselves that this function is\n", + "correct—by examining the code and testing—we can use the function\n", + "without looking at the body again.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_divisible(x, y):\n", + " if x % y == 0:\n", + " return True\n", + " else:\n", + " return False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The same is true of recursive programs. When you get to the recursive\n", + "call, instead of following the flow of execution, you should assume\n", + "that the recursive call works (returns the correct result) and then ask\n", + "yourself, “Assuming that I can find the factorial of n−1, can I\n", + "compute the factorial of n?” It is clear that you\n", + "can, by multiplying by n." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Of course, it’s a bit strange to assume that the function works\n", + "correctly when you haven’t finished writing it, but that’s why\n", + "it’s called a leap of faith!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.7 One more example" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "After factorial, the most common example of a recursively\n", + "defined mathematical function is fibonacci, which has the\n", + "following definition (see\n", + "http://en.wikipedia.org/wiki/Fibonacci_number):\n", + "\n", + "\n", + "```\n", + "fibonacci(0) = 0 \n", + " \t \tfibonacci(1) = 1 \n", + " \t \tfibonacci(n) = fibonacci(n−1) + fibonacci(n−2)\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Translated into Python, it looks like this:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "def fibonacci(n):\n", + " if n == 0:\n", + " return 0\n", + " elif n == 1:\n", + " return 1\n", + " else:\n", + " return fibonacci(n-1) + fibonacci(n-2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you try to follow the flow of execution here, even for fairly\n", + "small values of n, your head explodes. But according to the\n", + "leap of faith, if you assume that the two recursive calls\n", + "work correctly, then it is clear that you get\n", + "the right result by adding them together.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0, 1, 1, 2, 3, 5, 8, 13, 21, 34, " + ] + } + ], + "source": [ + "for i in range(0,10):\n", + " print(fibonacci(i), end=', ')" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "21" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fibonacci(8.0)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "ename": "RecursionError", + "evalue": "maximum recursion depth exceeded in comparison", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mRecursionError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfibonacci\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m8.5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mfibonacci\u001b[0;34m(n)\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfibonacci\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mfibonacci\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "... last 1 frames repeated, from the frame below ...\n", + "\u001b[0;32m\u001b[0m in \u001b[0;36mfibonacci\u001b[0;34m(n)\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfibonacci\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mfibonacci\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mRecursionError\u001b[0m: maximum recursion depth exceeded in comparison" + ] + } + ], + "source": [ + "fibonacci(8.5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.8 Checking types" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What happens if we call factorial and give it 1.5 as an argument?\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "factorial(1.5)\n", + "RuntimeError: Maximum recursion depth exceeded" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "It looks like an infinite recursion. How can that be? The function\n", + "has a base case—when n == 0. But if n is not an integer,\n", + "we can miss the base case and recurse forever.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the first recursive call, the value of n is 0.5.\n", + "In the next, it is -0.5. From there, it gets smaller\n", + "(more negative), but it will never be 0." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have two choices. We can try to generalize the factorial\n", + "function to work with floating-point numbers, or we can make factorial check the type of its argument. The first option is\n", + "called the gamma function and it’s a\n", + "little beyond the scope of this book. So we’ll go for the second.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use the built-in function isinstance to verify the type\n", + "of the argument. While we’re at it, we can also make sure the\n", + "argument is positive:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "def factorial(n):\n", + " if not isinstance(n, int):\n", + " print('Factorial is only defined for integers.')\n", + " return None\n", + " elif n < 0:\n", + " print('Factorial is not defined for negative integers.')\n", + " return None\n", + " elif n == 0:\n", + " return 1\n", + " else:\n", + " return n * factorial(n-1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first base case handles nonintegers; the\n", + "second handles negative integers. In both cases, the program prints\n", + "an error message and returns None to indicate that something\n", + "went wrong:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Factorial is only defined for integers.\n", + "None\n" + ] + } + ], + "source": [ + "print(factorial('fred'))" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Factorial is not defined for negative integers.\n", + "None\n" + ] + } + ], + "source": [ + "print(factorial(-2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If we get past both checks, we know that n is a non-negative integer, so we can prove that the recursion terminates.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This program demonstrates a pattern sometimes called a guardian.\n", + "The first two conditionals act as guardians, protecting the code that\n", + "follows from values that might cause an error. The guardians make it\n", + "possible to prove the correctness of the code." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In Section 11.4 we will see a more flexible alternative to printing\n", + "an error message: raising an exception." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.9 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Breaking a large program into smaller functions creates natural\n", + "checkpoints for debugging. If a function is not\n", + "working, there are three possibilities to consider:\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To rule out the first possibility, you can add a print statement\n", + "at the beginning of the function and display the values of the\n", + "parameters (and maybe their types). Or you can write code\n", + "that checks the preconditions explicitly.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the parameters look good, add a print statement before each\n", + "return statement and display the return value. If\n", + "possible, check the result by hand. Consider calling the\n", + "function with values that make it easy to check the result\n", + "(as in Section 6.2)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the function seems to be working, look at the function call\n", + "to make sure the return value is being used correctly (or used\n", + "at all!).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Adding print statements at the beginning and end of a function\n", + "can help make the flow of execution more visible.\n", + "For example, here is a version of factorial with\n", + "print statements:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "def factorial(n):\n", + " space = ' ' * (4 * n)\n", + " print(space, 'factorial', n)\n", + " if n == 0:\n", + " print(space, 'returning 1')\n", + " return 1\n", + " else:\n", + " recurse = factorial(n-1)\n", + " result = n * recurse\n", + " print(space, 'returning', result)\n", + " return result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "space is a string of space characters that controls the\n", + "indentation of the output. Here is the result of factorial(4) :" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " factorial 4\n", + " factorial 3\n", + " factorial 2\n", + " factorial 1\n", + " factorial 0\n", + " returning 1\n", + " returning 1\n", + " returning 2\n", + " returning 6\n", + " returning 24\n" + ] + }, + { + "data": { + "text/plain": [ + "24" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "factorial(4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you are confused about the flow of execution, this kind of\n", + "output can be helpful. It takes some time to develop effective\n", + "scaffolding, but a little bit of scaffolding can save a lot of debugging." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.10 Glossary" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_10__Lists.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_10__Lists.ipynb new file mode 100644 index 0000000..3e4ea07 --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_10__Lists.ipynb @@ -0,0 +1,1693 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 10  Lists\n", + "\n", + "http://greenteapress.com/thinkpython2/html/thinkpython2011.html\n", + "\n", + "* A list is a sequence\n", + "* Lists are mutable\n", + "* Traversing a list\n", + "* List operations\n", + "* List slices\n", + "* List methods\n", + "* Map, filter and reduce\n", + "* Deleting elements\n", + "* Lists and strings\n", + "* Objects and values\n", + "* Aliasing\n", + "* List arguments\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.1 A list is a sequence" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This chapter presents one of Python’s most useful built-in types, lists.\n", + "You will also learn more about objects and what can happen when you have\n", + "more than one name for the same object." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Like a string, a list is a sequence of values. In a string, the\n", + "values are characters; in a list, they can be any type. The values in\n", + "a list are called elements or sometimes items.\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are several ways to create a new list; the simplest is to\n", + "enclose the elements in square brackets ([ and ]):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[10, 20, 30, 40]\n", + "['crunchy frog', 'ram bladder', 'lark vomit']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first example is a list of four integers. The second is a list of\n", + "three strings. The elements of a list don’t have to be the same type.\n", + "The following list contains a string, a float, an integer, and\n", + "(lo!) another list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "['spam', 2.0, 5, [10, 20]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A list within another list is nested.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A list that contains no elements is\n", + "called an empty list; you can create one with empty\n", + "brackets, [].\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you might expect, you can assign list values to variables:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cheeses = ['Cheddar', 'Edam', 'Gouda']\n", + "numbers = [42, 123]\n", + "empty = []\n", + "print(cheeses, numbers, empty)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.2 Lists are mutable" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The syntax for accessing the elements of a list is the same as for\n", + "accessing the characters of a string—the bracket operator. The\n", + "expression inside the brackets specifies the index. Remember that the\n", + "indices start at 0:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cheeses[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Unlike strings, lists are mutable. When the bracket operator appears\n", + "on the left side of an assignment, it identifies the element of the\n", + "list that will be assigned.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "numbers = [42, 123]\n", + "numbers[1] = 5\n", + "numbers" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "numbers[4] = 5" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The one-eth element of numbers, which\n", + "used to be 123, is now 5.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Figure 10.1 shows \n", + "the state diagram for cheeses, numbers and empty:\n", + "\n", + "![](http://greenteapress.com/thinkpython2/html/thinkpython2011.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lists are represented by boxes with the word “list” outside\n", + "and the elements of the list inside. cheeses refers to\n", + "a list with three elements indexed 0, 1 and 2.\n", + "numbers contains two elements; the diagram shows that the\n", + "value of the second element has been reassigned from 123 to 5.\n", + "empty refers to a list with no elements.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "List indices work the same way as string indices:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The in operator also works on lists." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cheeses = ['Cheddar', 'Edam', 'Gouda']\n", + "'Edam' in cheeses" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "'Brie' in cheeses" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.3 Traversing a list" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The most common way to traverse the elements of a list is\n", + "with a for loop. The syntax is the same as for strings:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for cheese in cheeses:\n", + " print(cheese)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This works well if you only need to read the elements of the\n", + "list. But if you want to write or update the elements, you\n", + "need the indices. A common way to do that is to combine\n", + "the built-in functions range and len:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for i in range(len(numbers)):\n", + " numbers[i] = numbers[i] * 2\n", + "numbers" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for i, e in enumerate(numbers):\n", + " print(i , e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This loop traverses the list and updates each element. len\n", + "returns the number of elements in the list. range returns\n", + "a list of indices from 0 to n−1, where n is the length of\n", + "the list. Each time through the loop i gets the index\n", + "of the next element. The assignment statement in the body uses\n", + "i to read the old value of the element and to assign the\n", + "new value.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A for loop over an empty list never runs the body:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for x in []:\n", + " print('This never happens.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Although a list can contain another list, the nested\n", + "list still counts as a single element. The length of this list is\n", + "four:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "['spam', 1, ['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus flatten list of list and list compehension\n", + "\n", + "https://docs.python.org/3.0/tutorial/datastructures.html?highlight=list%20comprehension" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_list = [['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[element for sublist in my_list for element in sublist ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_list = ['spam', 1, ['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[element for element in my_list if not isinstance(element, list) ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[element for sublist in my_list if isinstance(sublist, list) for element in sublist ]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.4 List operations" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The + operator concatenates lists:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [1, 2, 3]\n", + "b = [4, 5, 6]\n", + "c = a + b\n", + "c" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The * operator repeats a list a given number of times:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[0] * 4" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[1, 2, 3] * 3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first example repeats [0] four times. The second example\n", + "repeats the list [1, 2, 3] three times." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.5 List slices" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The slice operator also works on lists:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c', 'd', 'e', 'f']\n", + "t[1:3]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[:4]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[3:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you omit the first index, the slice starts at the beginning.\n", + "If you omit the second, the slice goes to the end. So if you\n", + "omit both, the slice is a copy of the whole list.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Since lists are mutable, it is often useful to make a copy\n", + "before performing operations that modify lists.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A slice operator on the left side of an assignment\n", + "can update multiple elements:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c', 'd', 'e', 'f']\n", + "t[1:3] = ['x', 'y']\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus: can you reverse list with slicing?\n", + "\n", + "['f', 'e', 'd', 'y', 'x', 'a']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[::-1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.6 List methods" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Python provides methods that operate on lists. For example,\n", + "append adds a new element to the end of a list:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c']\n", + "t.append('d')\n", + "t.index()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "extend takes a list as an argument and appends all of\n", + "the elements:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t1 = ['a', 'b', 'c']\n", + "t2 = ['d', 'e']\n", + "t1.extend(t2)\n", + "t1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This example leaves t2 unmodified." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "sort arranges the elements of the list from low to high:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['d', 'c', 'e', 'b', 'a']\n", + "t.sort()\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Most list methods are void; they modify the list and return None.\n", + "If you accidentally write t = t.sort(), you will be disappointed\n", + "with the result.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.7 Map, filter and reduce" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To add up all the numbers in a list, you can use a loop like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def add_all(t):\n", + " total = 0\n", + " for x in t:\n", + " total += x\n", + " return total" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "total is initialized to 0. Each time through the loop,\n", + "x gets one element from the list. The += operator\n", + "provides a short way to update a variable. This \n", + "augmented assignment statement,\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + " total += x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "is equivalent to" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + " total = total + x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As the loop runs, total accumulates the sum of the\n", + "elements; a variable used this way is sometimes called an\n", + "accumulator.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Adding up the elements of a list is such a common operation\n", + "that Python provides it as a built-in function, sum:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = [1, 2, 3]\n", + "sum(t)\n", + "6" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "**An operation like this that combines a sequence of elements into\n", + "a single value is sometimes called reduce.**\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Sometimes you want to traverse one list while building\n", + "another. For example, the following function takes a list of strings\n", + "and returns a new list that contains capitalized strings:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def capitalize_all(t):\n", + " res = []\n", + " for s in t:\n", + " res.append(s.capitalize())\n", + " return res" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "res is initialized with an empty list; each time through\n", + "the loop, we append the next element. So res is another\n", + "kind of accumulator.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**An operation like capitalize_all is sometimes called a map because it “maps” a function (in this case the method capitalize) onto each of the elements in a sequence.**\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another common operation is to select some of the elements from\n", + "a list and return a sublist. For example, the following\n", + "function takes a list of strings and returns a list that contains\n", + "only the uppercase strings:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def only_upper(t):\n", + " res = []\n", + " for s in t:\n", + " if s.isupper():\n", + " res.append(s)\n", + " return res" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "isupper is a string method that returns True if\n", + "the string contains only upper case letters." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**An operation like only_upper is called a filter because\n", + "it selects some of the elements and filters out the others.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Most common list operations can be expressed as a combination\n", + "of map, filter and reduce." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.8 Deleting elements" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are several ways to delete elements from a list. If you\n", + "know the index of the element you want, you can use\n", + "pop:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c']\n", + "x = t.pop(1)\n", + "t" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "pop modifies the list and returns the element that was removed.\n", + "If you don’t provide an index, it deletes and returns the\n", + "last element." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you don’t need the removed value, you can use the del\n", + "operator:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c']\n", + "del t[1]\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you know the element you want to remove (but not the index), you\n", + "can use remove:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'b', 'c']\n", + "t.remove('b')\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The return value from remove is None.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To remove more than one element, you can use del with\n", + "a slice index:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c', 'd', 'e', 'f']\n", + "del t[1:5]\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As usual, the slice selects all the elements up to but not\n", + "including the second index." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.9 Lists and strings" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A string is a sequence of characters and a list is a sequence\n", + "of values, but a list of characters is not the same as a\n", + "string. To convert from a string to a list of characters,\n", + "you can use list:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'spam'\n", + "t = list(s)\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Because list is the name of a built-in function, you should\n", + "avoid using it as a variable name. I also avoid l because\n", + "it looks too much like 1. So that’s why I use t." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The list function breaks a string into individual letters. If\n", + "you want to break a string into words, you can use the split\n", + "method:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'pining for the fjords'\n", + "t = s.split()\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "An optional argument called a delimiter specifies which\n", + "characters to use as word boundaries.\n", + "The following example\n", + "uses a hyphen as a delimiter:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'spam-spam-spam'\n", + "delimiter = '-'\n", + "t = s.split(delimiter)\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "join is the inverse of split. It\n", + "takes a list of strings and\n", + "concatenates the elements. join is a string method,\n", + "so you have to invoke it on the delimiter and pass the\n", + "list as a parameter:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['pining', 'for', 'the', 'fjords']\n", + "delimiter = ' '\n", + "s = delimiter.join(t)\n", + "s" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In this case the delimiter is a space character, so\n", + "join puts a space between words. To concatenate\n", + "strings without spaces, you can use the empty string,\n", + "'', as a delimiter. \n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.10 Objects and values" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we run these assignment statements:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 'banana'\n", + "b = 'banana'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "We know that a and b both refer to a\n", + "string, but we don’t\n", + "know whether they refer to the same string.\n", + "There are two possible states, shown in Figure 10.2.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In one case, a and b refer to two different objects that\n", + "have the same value. In the second case, they refer to the same\n", + "object.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To check whether two variables refer to the same object, you can\n", + "use the is operator." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 'banana'\n", + "b = 'banana'\n", + "a is b" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In this example, Python only created one string object, and both a and b refer to it. But when you create two lists, you get\n", + "two objects:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [1, 2, 3]\n", + "b = [1, 2, 3]\n", + "a is b" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "So the state diagram looks like Figure 10.3.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this case we would say that the two lists are equivalent,\n", + "because they have the same elements, but not identical, because\n", + "they are not the same object. If two objects are identical, they are\n", + "also equivalent, but if they are equivalent, they are not necessarily\n", + "identical.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Until now, we have been using “object” and “value”\n", + "interchangeably, but it is more precise to say that an object has a\n", + "value. If you evaluate [1, 2, 3], you get a list\n", + "object whose value is a sequence of integers. If another\n", + "list has the same elements, we say it has the same value, but\n", + "it is not the same object.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.11 Aliasing" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If a refers to an object and you assign b = a,\n", + "then both variables refer to the same object:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [1, 2, 3]\n", + "b = a\n", + "b is a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The state diagram looks like Figure 10.4.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The association of a variable with an object is called a reference. In this example, there are two references to the same\n", + "object.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An object with more than one reference has more\n", + "than one name, so we say that the object is aliased.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the aliased object is mutable, changes made with one alias affect\n", + "the other:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "b[0] = 42\n", + "a" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "b" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Although this behavior can be useful, it is error-prone. In general,\n", + "it is safer to avoid aliasing when you are working with mutable\n", + "objects.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For immutable objects like strings, aliasing is not as much of a\n", + "problem. In this example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 'banana'\n", + "b = 'banana'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "It almost never makes a difference whether a and b refer\n", + "to the same string or not." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.12 List arguments" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you pass a list to a function, the function gets a reference to\n", + "the list. If the function modifies the list, the caller sees\n", + "the change. For example, delete_head removes the first element\n", + "from a list:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "def delete_head(t):\n", + " del t[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Here’s how it is used:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['b', 'c']" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letters = ['a', 'b', 'c']\n", + "delete_head(letters)\n", + "letters" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The parameter t and the variable letters are\n", + "aliases for the same object. The stack diagram looks like\n", + "Figure 10.5.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since the list is shared by two frames, I drew\n", + "it between them." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is important to distinguish between operations that\n", + "modify lists and operations that create new lists. For\n", + "example, the append method modifies a list, but the\n", + "+ operator creates a new list.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s an example using append:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "t1 = [1, 2]\n", + "t2 = t1.append(3)\n", + "t1" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "t2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The return value from append is None." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s an example using the + operator:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "t3 = t1 + [4]\n", + "t1" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3, 4]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "t3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result of the operator is a new list, and the original list is\n", + "unchanged." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This difference is important when you write functions that\n", + "are supposed to modify lists. For example, this function\n", + "does not delete the head of a list:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "def bad_delete_head(t):\n", + " t = t[1:] # WRONG!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The slice operator creates a new list and the assignment\n", + "makes t refer to it, but that doesn’t affect the caller.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "t4 = [1, 2, 3]\n", + "bad_delete_head(t4)\n", + "t4" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "At the beginning of bad_delete_head, t and t4\n", + "refer to the same list. At the end, t refers to a new list,\n", + "but t4 still refers to the original, unmodified list." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An alternative is to write a function that creates and\n", + "returns a new list. For\n", + "example, tail returns all but the first\n", + "element of a list:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "def tail(t):\n", + " return t[1:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This function leaves the original list unmodified.\n", + "Here’s how it is used:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['b', 'c']" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letters = ['a', 'b', 'c']\n", + "rest = tail(letters)\n", + "rest\n", + "['b', 'c']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_11__Dictionaries.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_11__Dictionaries.ipynb new file mode 100644 index 0000000..da79412 --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_11__Dictionaries.ipynb @@ -0,0 +1,2168 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 11  Dictionaries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "http://greenteapress.com/thinkpython2/html/thinkpython2012.html\n", + "\n", + "* 11.1  A dictionary is a mapping\n", + "* 11.2  Dictionary as a collection of counters\n", + "* 11.3  Looping and dictionaries\n", + "* 11.4  Reverse lookup\n", + "* 11.5  Dictionaries and lists\n", + "* 11.6  Memos\n", + "* 11.7  Global variables\n", + "* 11.8  Debugging\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Python: List vs Tuple vs Dictionary vs Set](https://blog.softhints.com/python-list-vs-tuple-vs-dictionary-vs-set/)\n", + "\n", + "![](https://blog.softhints.com/content/images/size/w2000/2020/04/python_dict_vs_list_vs_tuple_vs_set.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.1 A dictionary is a mapping" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This chapter presents another built-in type called a dictionary.\n", + "Dictionaries are one of Python’s best features; they are the\n", + "building blocks of many efficient and elegant algorithms." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "A dictionary is like a list, but more general. In a list,\n", + "the indices have to be integers; in a dictionary they can\n", + "be (almost) any type." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A dictionary contains a collection of indices, which are called keys, and a collection of values. Each key is associated with a\n", + "single value. **The association of a key and a value is called a key-value pair** or sometimes an item. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In mathematical language, a dictionary represents a mapping\n", + "from keys to values, so you can also say that each key\n", + "“maps to” a value.\n", + "As an example, we’ll build a dictionary that maps from English\n", + "to Spanish words, so the keys and the values are all strings." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The function dict creates a new dictionary with no items.\n", + "Because dict is the name of a built-in function, you\n", + "should avoid using it as a variable name.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{}" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp = dict()\n", + "eng2sp" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{}" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp = {}\n", + "eng2sp" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The squiggly-brackets, {}, represent an empty dictionary.\n", + "To add items to the dictionary, you can use square brackets:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "eng2sp['one'] = 'uno'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This line creates an item that maps from the key\n", + "'one' to the value 'uno'. If we print the\n", + "dictionary again, we see a key-value pair with a colon\n", + "between the key and value:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'one': 'uno'}" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "eng2sp['one'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'one': '1'}" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This output format is also an input format. For example,\n", + "you can create a new dictionary with three items:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "eng2sp = {'one': 'uno',\n", + " 'two': 'dos', \n", + " 'three': 'tres'\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "But if you print eng2sp, you might be surprised:" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'one': 'two', 'two': 'dos', 'three': 'tres'}" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The order of the key-value pairs might not be the same. If\n", + "you type the same example on your computer, you might get a\n", + "different result. In general, the order of items in\n", + "a dictionary is unpredictable." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But that’s not a problem because\n", + "the elements of a dictionary are never indexed with integer indices.\n", + "Instead, you use the keys to look up the corresponding values:" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'dos'" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp['two']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The key 'two' always maps to the value 'dos' so the order\n", + "of the items doesn’t matter." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the key isn’t in the dictionary, you get an exception:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "ename": "KeyError", + "evalue": "'four'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0meng2sp\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'four'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mKeyError\u001b[0m: 'four'" + ] + } + ], + "source": [ + "eng2sp['four']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The len function works on dictionaries; it returns the\n", + "number of key-value pairs:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(eng2sp)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The in operator works on dictionaries, too; it tells you whether\n", + "something appears as a key in the dictionary (appearing\n", + "as a value is not good enough).\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'one' in eng2sp" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'uno' in eng2sp" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "To see whether something appears as a value in a dictionary, you\n", + "can use the method values, which returns a collection of\n", + "values, and then use the in operator:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "vals = eng2sp.values()\n", + "'uno' in vals" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "dict_values(['uno', 'dos', 'tres'])\n", + "dict_keys(['one', 'two', 'three'])\n", + "dict_items([('one', 'uno'), ('two', 'dos'), ('three', 'tres')])\n" + ] + } + ], + "source": [ + "print(eng2sp.values())\n", + "print(eng2sp.keys())\n", + "print(eng2sp.items())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The in operator uses different algorithms for lists and\n", + "dictionaries. For lists, it searches the elements of the list in\n", + "order, as in Section 8.6. As the list gets longer, the search\n", + "time gets longer in direct proportion." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Python dictionaries use a data structure\n", + "called a hashtable that has a remarkable property: the\n", + "in operator takes about the same amount of time no matter how\n", + "many items are in the dictionary. I explain how that’s possible\n", + "in Section B.4, but the explanation might not make\n", + "sense until you’ve read a few more chapters." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Bonus**:\n", + "\n", + "* [Hash function](https://en.wikipedia.org/wiki/Hash_function)\n", + "* [Hash table](https://en.wikipedia.org/wiki/Hash_table)\n", + "* [Collision (computer science)](https://en.wikipedia.org/wiki/Collision_(computer_science))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.2 Dictionary as a collection of counters" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Suppose you are given a string and you want to count how many\n", + "times each letter appears. There are several ways you could do it:\n", + "\n", + "1. You could create 26 variables, one for each letter of the alphabet. Then you could traverse the string and, for each character, increment the corresponding counter, probably using a chained conditional.\n", + "\n", + "2. You could create a list with 26 elements. Then you could convert each character to a number (using the built-in function ord), use the number as an index into the list, and increment the appropriate counter.\n", + "\n", + "3. You could create a dictionary with characters as keys and counters as the corresponding values. The first time you see a character, you would add an item to the dictionary. After that you would increment the value of an existing item." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each of these options performs the same computation, but each\n", + "of them implements that computation in a different way.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An implementation is a way of performing a computation;\n", + "some implementations are better than others. For example,\n", + "an advantage of the dictionary implementation is that we don’t\n", + "have to know ahead of time which letters appear in the string\n", + "and we only have to make room for the letters that do appear." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is what the code might look like:" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "def histogram(s):\n", + " d = dict()\n", + " for c in s:\n", + " if c not in d:\n", + " d[c] = 1\n", + " else:\n", + " d[c] += 1\n", + " return d" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The name of the function is histogram, which is a statistical\n", + "term for a collection of counters (or frequencies).\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The first line of the\n", + "function creates an empty dictionary. The for loop traverses\n", + "the string. Each time through the loop, if the character c is\n", + "not in the dictionary, we create a new item with key c and the\n", + "initial value 1 (since we have seen this letter once). If c is\n", + "already in the dictionary we increment d[c].\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s how it works:" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h = histogram('brontosaurus')\n", + "h" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The histogram indicates that the letters 'a' and 'b'\n", + "appear once; 'o' appears twice, and so on." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "Dictionaries have a method called get that takes a key\n", + "and a default value. If the key appears in the dictionary,\n", + "get returns the corresponding value; otherwise it returns\n", + "the default value. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'a': 1}" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h = histogram('a')\n", + "h" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h.get('a', 0)" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h.get('c', 0)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "ename": "KeyError", + "evalue": "'c'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mh\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'c'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mKeyError\u001b[0m: 'c'" + ] + } + ], + "source": [ + "h['c']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As an exercise, use get to write histogram more concisely. You\n", + "should be able to eliminate the if statement." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Excercise " + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [], + "source": [ + "def histogram(s):\n", + " d = dict()\n", + " for c in s:\n", + " d[c] = d.get(c, 0) + 1\n", + " return d" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h = histogram('brontosaurus')\n", + "h" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.3 Looping and dictionaries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you use a dictionary in a for statement, it traverses\n", + "the keys of the dictionary. For example, print_hist\n", + "prints each key and the corresponding value:" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [], + "source": [ + "def print_hist(h):\n", + " for c in h:\n", + " print(c, h[c])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Here’s what the output looks like:" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "p 1\n", + "a 1\n", + "r 2\n", + "o 1\n", + "t 1\n" + ] + } + ], + "source": [ + "h = histogram('parrot')\n", + "print_hist(h)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Again, the keys are in no particular order. To traverse the keys\n", + "in sorted order, you can use the built-in function sorted:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a 1\n", + "o 1\n", + "p 1\n", + "r 2\n", + "t 1\n" + ] + } + ], + "source": [ + "for key in sorted(h):\n", + " print(key, h[key])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Bonus Getting all keys and values" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "p 1\n", + "a 1\n", + "r 2\n", + "o 1\n", + "t 1\n" + ] + } + ], + "source": [ + "for k, v in h.items():\n", + " print(k, v)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.4 Reverse lookup" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Given a dictionary d and a key k, it is easy to\n", + "find the corresponding value v = d[k]. This operation\n", + "is called a lookup." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But what if you have v and you want to find k?\n", + "You have two problems: first, there might be more than one\n", + "key that maps to the value v. Depending on the application,\n", + "you might be able to pick one, or you might have to make\n", + "a list that contains all of them. Second, there is no\n", + "simple syntax to do a reverse lookup; you have to search." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is a function that takes a value and returns the first\n", + "key that maps to that value:" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [], + "source": [ + "def reverse_lookup(d, v):\n", + " for k in d:\n", + " if d[k] == v:\n", + " return k\n", + " raise LookupError()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This function is yet another example of the search pattern, but it\n", + "uses a feature we haven’t seen before, raise. The \n", + "raise statement causes an exception; in this case it causes a\n", + "LookupError, which is a built-in exception used to indicate\n", + "that a lookup operation failed.\n", + "\n", + " \n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we get to the end of the loop, that means v\n", + "doesn’t appear in the dictionary as a value, so we raise an\n", + "exception." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is an example of a successful reverse lookup:" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'r'" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h = histogram('parrot')\n", + "key = reverse_lookup(h, 2)\n", + "key" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'p': 1, 'a': 1, 'r': 2, 'o': 1, 't': 1}" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "histogram('parrot')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "And an unsuccessful one:" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "ename": "LookupError", + "evalue": "", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mLookupError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mreverse_lookup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mh\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mreverse_lookup\u001b[0;34m(d, v)\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mv\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mk\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mLookupError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mLookupError\u001b[0m: " + ] + } + ], + "source": [ + "key = reverse_lookup(h, 3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The effect when you raise an exception is the same as when\n", + "Python raises one: it prints a traceback and an error message.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you raise an exception, you can provide a detailed error message as an optional argument. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "ename": "LookupError", + "evalue": "value does not appear in the dictionary", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mLookupError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mk\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mLookupError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'value does not appear in the dictionary'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mreverse_lookup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mh\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mreverse_lookup\u001b[0;34m(d, v)\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mv\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mk\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mLookupError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'value does not appear in the dictionary'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mreverse_lookup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mh\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mLookupError\u001b[0m: value does not appear in the dictionary" + ] + } + ], + "source": [ + "def reverse_lookup(d, v):\n", + " for k in d:\n", + " if d[k] == v:\n", + " return k\n", + " raise LookupError('value does not appear in the dictionary')\n", + "key = reverse_lookup(h, 3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "raise LookupError('value does not appear in the dictionary')\n", + "Traceback (most recent call last):\n", + " File \"\", line 1, in ?\n", + "LookupError: value does not appear in the dictionary" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A reverse lookup is much slower than a forward lookup; if you\n", + "have to do it often, or if the dictionary gets big, the performance\n", + "of your program will suffer." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.5 Dictionaries and lists" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lists can appear as values in a dictionary. For example, if you\n", + "are given a dictionary that maps from letters to frequencies, you\n", + "might want to invert it; that is, create a dictionary that maps\n", + "from frequencies to letters. Since there might be several letters\n", + "with the same frequency, each value in the inverted dictionary\n", + "should be a list of letters.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is a function that inverts a dictionary:" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": {}, + "outputs": [], + "source": [ + "def invert_dict(d):\n", + " inverse = dict()\n", + " for key in d:\n", + " val = d[key]\n", + " if val not in inverse:\n", + " inverse[val] = [key]\n", + " else:\n", + " inverse[val].append(key)\n", + " return inverse" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Each time through the loop, key gets a key from d and \n", + "val gets the corresponding value. If val is not in inverse, that means we haven’t seen it before, so we create a new\n", + "item and initialize it with a singleton (a list that contains a\n", + "single element). Otherwise we have seen this value before, so we\n", + "append the corresponding key to the list. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is an example:" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'p': 1, 'a': 1, 'r': 2, 'o': 1, 't': 1}" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "hist = histogram('parrot')\n", + "hist" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{1: ['p', 'a', 'o', 't'], 2: ['r']}" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "inverse = invert_dict(hist)\n", + "inverse" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Figure 11.1 is a state diagram showing hist and inverse.\n", + "A dictionary is represented as a box with the type dict above it\n", + "and the key-value pairs inside. If the values are integers, floats or\n", + "strings, I draw them inside the box, but I usually draw lists\n", + "outside the box, just to keep the diagram simple.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lists can be values in a dictionary, as this example shows, but they\n", + "cannot be keys. Here’s what happens if you try:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "unhashable type: 'list'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mt\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0md\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'oops'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'" + ] + } + ], + "source": [ + "t = [1, 2, 3]\n", + "d = dict()\n", + "d[t] = 'oops'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "I mentioned earlier that a dictionary is implemented using\n", + "a hashtable and that means that the keys have to be hashable.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A hash is a function that takes a value (of any kind)\n", + "and returns an integer. Dictionaries use these integers,\n", + "called hash values, to store and look up key-value pairs.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This system works fine if the keys are immutable. But if the\n", + "keys are mutable, like lists, bad things happen. For example,\n", + "when you create a key-value pair, Python hashes the key and \n", + "stores it in the corresponding location. If you modify the\n", + "key and then hash it again, it would go to a different location.\n", + "In that case you might have two entries for the same key,\n", + "or you might not be able to find a key. Either way, the\n", + "dictionary wouldn’t work correctly." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "That’s why keys have to be hashable, and why mutable types like\n", + "lists aren’t. The simplest way to get around this limitation is to\n", + "use tuples, which we will see in the next chapter." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since dictionaries are mutable, they can’t be used as keys,\n", + "but they can be used as values." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.6 Memos" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you played with the fibonacci function from\n", + "Section 6.7, you might have noticed that the bigger\n", + "the argument you provide, the longer the function takes to run.\n", + "Furthermore, the run time increases quickly.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To understand why, consider Figure 11.2, which shows\n", + "the call graph for fibonacci with n=4:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A call graph shows a set of function frames, with lines connecting each\n", + "frame to the frames of the functions it calls. At the top of the\n", + "graph, fibonacci with n=4 calls fibonacci with n=3 and n=2. In turn, fibonacci with n=3 calls\n", + "fibonacci with n=2 and n=1. And so on.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Count how many times fibonacci(0) and fibonacci(1) are\n", + "called. This is an inefficient solution to the problem, and it gets\n", + "worse as the argument gets bigger.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One solution is to keep track of values that have already been\n", + "computed by storing them in a dictionary. A previously computed value\n", + "that is stored for later use is called a memo. Here is a\n", + "“memoized” version of fibonacci:" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [], + "source": [ + "known = {0:0, 1:1}\n", + "\n", + "def fibonacci(n):\n", + " if n in known:\n", + " return known[n]\n", + "\n", + " res = fibonacci(n-1) + fibonacci(n-2)\n", + " known[n] = res\n", + " return res" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "known is a dictionary that keeps track of the Fibonacci\n", + "numbers we already know. It starts with\n", + "two items: 0 maps to 0 and 1 maps to 1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Whenever fibonacci is called, it checks known.\n", + "If the result is already there, it can return\n", + "immediately. Otherwise it has to \n", + "compute the new value, add it to the dictionary, and return it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you run this version of fibonacci and compare it with\n", + "the original, you will find that it is much faster." + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The slowest run took 214.19 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "10000000 loops, best of 3: 109 ns per loop\n" + ] + } + ], + "source": [ + "% timeit fibonacci(20)" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [], + "source": [ + "def fibonacci(n):\n", + " if n < 1:\n", + " return 1\n", + " res = fibonacci(n-1) + fibonacci(n-2)\n", + " return res" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "100 loops, best of 3: 3.59 ms per loop\n" + ] + } + ], + "source": [ + "% timeit fibonacci(20)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Bonus \n", + "\n", + "* [Dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming)\n", + "* [19. Dynamic Programming I: Fibonacci, Shortest Paths](https://www.youtube.com/watch?v=OQ5jsbhAv_M)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.7 Global variables" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the previous example, known is created outside the function,\n", + "so it belongs to the special frame called __main__.\n", + "Variables in __main__ are sometimes called global\n", + "because they can be accessed from any function. Unlike local\n", + "variables, which disappear when their function ends, global variables\n", + "persist from one function call to the next.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is common to use global variables for flags; that is, \n", + "boolean variables that indicate (“flag”) whether a condition\n", + "is true. For example, some programs use\n", + "a flag named verbose to control the level of detail in the\n", + "output:" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Running example1\n" + ] + } + ], + "source": [ + "verbose = True\n", + "\n", + "def example1():\n", + " if verbose:\n", + " print('Running example1')\n", + "example1()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you try to reassign a global variable, you might be surprised.\n", + "The following example is supposed to keep track of whether the\n", + "function has been called:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 65, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "been_called = False\n", + "\n", + "def example2():\n", + " been_called = True # WRONG\n", + "\n", + "been_called" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "But if you run it you will see that the value of been_called\n", + "doesn’t change. The problem is that example2 creates a new local\n", + "variable named been_called. The local variable goes away when\n", + "the function ends, and has no effect on the global variable.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To reassign a global variable inside a function you have to\n", + "declare the global variable before you use it:" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "been_called = False\n", + "\n", + "def example2():\n", + " global been_called \n", + " been_called = True\n", + " \n", + "example2()\n", + " \n", + "been_called" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Pause the video and find why the `been_called` is False?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The global statement tells the interpreter\n", + "something like, “In this function, when I say been_called, I\n", + "mean the global variable; don’t create a local one.”\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s an example that tries to update a global variable:" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": {}, + "outputs": [ + { + "ename": "UnboundLocalError", + "evalue": "local variable 'count' referenced before assignment", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mUnboundLocalError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mcount\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcount\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;31m# WRONG\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mexample3\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mexample3\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mexample3\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mcount\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcount\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;31m# WRONG\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mexample3\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mUnboundLocalError\u001b[0m: local variable 'count' referenced before assignment" + ] + } + ], + "source": [ + "count = 0\n", + "\n", + "def example3():\n", + " count = count + 1 # WRONG\n", + " \n", + "example3()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Python assumes that count is local, and under that assumption\n", + "you are reading it before writing it. The solution, again,\n", + "is to declare count global.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "metadata": {}, + "outputs": [], + "source": [ + "def example3():\n", + " global count\n", + " count += 1\n", + "example3()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If a global variable refers to a mutable value, you can modify\n", + "the value without declaring the variable:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{0: 0, 1: 1, 2: 1}" + ] + }, + "execution_count": 79, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "known = {0:0, 1:1}\n", + "\n", + "def example4():\n", + " known[2] = 1\n", + "example4()\n", + "known" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "So you can add, remove and replace elements of a global list or\n", + "dictionary, but if you want to reassign the variable, you\n", + "have to declare it:" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": {}, + "outputs": [], + "source": [ + "def example5():\n", + " global known\n", + " known = dict()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Global variables can be useful, but if you have a lot of them,\n", + "and you modify them frequently, they can make programs\n", + "hard to debug." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.8 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you work with bigger datasets it can become unwieldy to\n", + "debug by printing and checking the output by hand. Here are some\n", + "suggestions for debugging large datasets:\n", + "\n", + "**1. Scale down the input:**\n", + "If possible, reduce the size of the dataset. For example if the program reads a text file, start with just the first 10 lines, or with the smallest example you can find. You can either edit the files themselves, or (better) modify the program so it reads only the first n lines.\n", + "If there is an error, you can reduce n to the smallest value that manifests the error, and then increase it gradually as you find and correct errors.\n", + "\n", + "**2. Check summaries and types:**\n", + "Instead of printing and checking the entire dataset, consider printing summaries of the data: for example, the number of items in a dictionary or the total of a list of numbers.\n", + "A common cause of runtime errors is a value that is not the right type. For debugging this kind of error, it is often enough to print the type of a value.\n", + "\n", + "**3. Write self-checks:**\n", + "Sometimes you can write code to check for errors automatically. For example, if you are computing the average of a list of numbers, you could check that the result is not greater than the largest element in the list or less than the smallest. This is called a “sanity check” because it detects results that are “insane”.\n", + "Another kind of check compares the results of two different computations to see if they are consistent. This is called a “consistency check”.\n", + "\n", + "**4. Format the output:**\n", + "Formatting debugging output can make it easier to spot an error. We saw an example in Section 6.9. Another tool you might find useful is the pprint module, which provides a pprint function that displays built-in types in a more human-readable format (pprint stands for “pretty print”)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Again, time you spend building scaffolding can reduce\n", + "the time you spend debugging.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5043, 28)" + ] + }, + "execution_count": 81, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
2ColorSam Mendes602.0148.00.0161.0Rory Kinnear11000.0200074175.0Action|Adventure|Thriller...994.0EnglishUKPG-13245000000.02015.0393.06.82.3585000
3ColorChristopher Nolan813.0164.022000.023000.0Christian Bale27000.0448130642.0Action|Thriller...2701.0EnglishUSAPG-13250000000.02012.023000.08.52.35164000
4NaNDoug WalkerNaNNaN131.0NaNRob Walker131.0NaNDocumentary...NaNNaNNaNNaNNaNNaN12.07.1NaN0
\n", + "

5 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "2 Color Sam Mendes 602.0 148.0 \n", + "3 Color Christopher Nolan 813.0 164.0 \n", + "4 NaN Doug Walker NaN NaN \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "2 0.0 161.0 Rory Kinnear \n", + "3 22000.0 23000.0 Christian Bale \n", + "4 131.0 NaN Rob Walker \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy ... \n", + "2 11000.0 200074175.0 Action|Adventure|Thriller ... \n", + "3 27000.0 448130642.0 Action|Thriller ... \n", + "4 131.0 NaN Documentary ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "0 3054.0 English USA PG-13 237000000.0 \n", + "1 1238.0 English USA PG-13 300000000.0 \n", + "2 994.0 English UK PG-13 245000000.0 \n", + "3 2701.0 English USA PG-13 250000000.0 \n", + "4 NaN NaN NaN NaN NaN \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "0 2009.0 936.0 7.9 1.78 \n", + "1 2007.0 5000.0 7.1 2.35 \n", + "2 2015.0 393.0 6.8 2.35 \n", + "3 2012.0 23000.0 8.5 2.35 \n", + "4 NaN 12.0 7.1 NaN \n", + "\n", + " movie_facebook_likes \n", + "0 33000 \n", + "1 0 \n", + "2 85000 \n", + "3 164000 \n", + "4 0 \n", + "\n", + "[5 rows x 28 columns]" + ] + }, + "execution_count": 82, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3390669.0" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(df['director_facebook_likes'].fillna(0))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 inf\n", + "1 0.300178\n", + "2 inf\n", + "3 0.007455\n", + "4 NaN\n", + " ... \n", + "5038 43.500000\n", + "5039 NaN\n", + "5040 inf\n", + "5041 inf\n", + "5042 5.625000\n", + "Length: 5043, dtype: float64" + ] + }, + "execution_count": 86, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['duration'] / df['director_facebook_likes']" + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0 907\n", + "NaN 104\n", + "3.0 70\n", + "6.0 66\n", + "7.0 64\n", + " ... \n", + "104.0 1\n", + "224.0 1\n", + "220.0 1\n", + "522.0 1\n", + "764.0 1\n", + "Name: director_facebook_likes, Length: 436, dtype: int64" + ] + }, + "execution_count": 84, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['director_facebook_likes'].value_counts(dropna=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.9 Glossary" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_12__Tuples.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_12__Tuples.ipynb new file mode 100644 index 0000000..fc2e3da --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_12__Tuples.ipynb @@ -0,0 +1,1497 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 12  Tuples\n", + "\n", + "\n", + "* 12.1  Tuples are immutable\n", + "* 12.2  Tuple assignment\n", + "* 12.3  Tuples as return values\n", + "* 12.4  Variable-length argument tuples\n", + "* 12.5  Lists and tuples\n", + "* 12.6  Dictionaries and tuples\n", + "* 12.7  Sequences of sequences" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.1 Tuples are immutable" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This chapter presents one more built-in type, the tuple, and then\n", + "shows how lists, dictionaries, and tuples work together.\n", + "I also present a useful feature for variable-length argument lists,\n", + "the gather and scatter operators." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One note: there is no consensus on how to pronounce “tuple”. Some people say **“tuh-ple”**, which rhymes with “supple”. But in the context of programming, most people say **“too-ple”**, which rhymes with “quadruple”." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " A tuple is a sequence of values. The values can be any type, and\n", + "they are indexed by integers, so in that respect tuples are a lot\n", + "like lists. The important difference is that tuples are immutable.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Syntactically, a tuple is a comma-separated list of values:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = 'a', 'b', 'c', 'd', 'e'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Although it is not necessary, it is common to enclose tuples in\n", + "parentheses:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ('a', 'b', 'c', 'd', 'e')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "To create a tuple with a single element, you have to include a final\n", + "comma:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t1 = 'a',\n", + "type(t1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A value in parentheses is not a tuple:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t2 = ('a')\n", + "type(t2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Another way to create a tuple is the built-in function tuple.\n", + "With no argument, it creates an empty tuple:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = tuple()\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the argument is a sequence (string, list or tuple), the result\n", + "is a tuple with the elements of the sequence:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = tuple('lupins')\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Because tuple is the name of a built-in function, you should\n", + "avoid using it as a variable name." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Most list operators also work on tuples. The bracket operator\n", + "indexes an element:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ('a', 'b', 'c', 'd', 'e')\n", + "t[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "And the slice operator selects a range of elements.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[1:3]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " But if you try to modify one of the elements of the tuple, you get\n", + "an error:\n", + "
\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[0] = 'A'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Because tuples are immutable, you can’t modify the elements. But you\n", + "can replace one tuple with another:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ('A',) + t[1:]\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This statement makes a new tuple and then makes t refer to it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The relational operators work with tuples and other sequences;\n", + "Python starts by comparing the first element from each\n", + "sequence. If they are equal, it goes on to the next elements,\n", + "and so on, until it finds elements that differ. Subsequent\n", + "elements are not considered (even if they are really big).\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "(0, 1, 2) < (0, 3, 4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "(0, 1, 2000000) < (0, 3, 4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.2 Tuple assignment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is often useful to swap the values of two variables.\n", + "With conventional assignments, you have to use a temporary\n", + "variable. For example, to swap a and b:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 4\n", + "b = 3\n", + "print(f'a: {a}, b: {b}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "temp = a\n", + "a = b\n", + "b = temp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(f'a: {a}, b: {b}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "

Bonus: Tower of Hanoi

\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This solution is cumbersome; tuple assignment is more elegant:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a, b = b, a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The left side is a tuple of variables; the right side is a tuple of\n", + "expressions. Each value is assigned to its respective variable. \n", + "All the expressions on the right side are evaluated before any\n", + "of the assignments." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "
\n", + " The number of variables on the left and the number of\n", + "values on the right have to be the same:\n", + "
\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a, b = 1, 2, 3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "More generally, the right side can be any kind of sequence\n", + "(string, list or tuple). For example, to split an email address\n", + "into a user name and a domain, you could write:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "addr = 'monty@python.org'\n", + "uname, domain = addr.split('@')\n", + "uname, domain" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = ['Everest', 8849, 27.9881, 86.9250]\n", + "name, height, latitude, longitude = data\n", + "\n", + "print(name, height, latitude, longitude)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The return value from split is a list with two elements;\n", + "the first element is assigned to uname, the second to\n", + "domain." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "uname #'monty'\n", + "domain #'python.org'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.3 Tuples as return values" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Strictly speaking, a function can only return one value, but\n", + "if the value is a tuple, the effect is the same as returning\n", + "multiple values. For example, if you want to divide two integers\n", + "and compute the quotient and remainder, it is inefficient to\n", + "compute x//y and then x%y. It is better to compute\n", + "them both at the same time.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The built-in function divmod takes two arguments and\n", + "returns a tuple of two values, the quotient and remainder.\n", + "You can store the result as a tuple:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = divmod(7, 3)\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Or use tuple assignment to store the elements separately:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "quot, rem = divmod(7, 3)\n", + "quot" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "rem" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Here is an example of a function that returns a tuple:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def min_max(t):\n", + " return min(t), max(t)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "max and min are built-in functions that find\n", + "the largest and smallest elements of a sequence. min_max\n", + "computes both and returns a tuple of two values.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.4 Variable-length argument tuples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Functions can take a variable number of arguments. A parameter\n", + "name that begins with * gathers arguments into\n", + "a tuple. For example, printall\n", + "takes any number of arguments and prints them:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def printall(*args):\n", + " print(args)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The gather parameter can have any name you like, but args is\n", + "conventional. Here’s how the function works:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "printall(1, 2.0, '3','x')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "
\n", + " The complement of gather is scatter. If you have a\n", + "sequence of values and you want to pass it to a function\n", + "as multiple arguments, you can use the * operator.\n", + "For example, divmod takes exactly two arguments; it\n", + "doesn’t work with a tuple:\n", + "
\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = (7, 3)\n", + "divmod(t)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " But if you scatter the tuple, it works:\n", + "
\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "divmod(*t)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Many of the built-in functions use\n", + "variable-length argument tuples. For example, max\n", + "and min can take any number of arguments:\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "max(1, 2, 3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "But sum does not.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sum(1, 2, 3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As an exercise, write a function called sum_all that takes any number\n", + "of arguments and returns their sum." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.5 Lists and tuples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "zip is a built-in function that takes two or more sequences and\n", + "interleaves them. The name of the function refers to\n", + "a zipper, which interleaves two rows of teeth." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This example zips a string and a list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'abc'\n", + "t = [0, 1, 2]\n", + "zip(s, t)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result is a zip object that knows how to iterate through\n", + "the pairs. The most common use of zip is in a for loop:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for pair in zip(s, t):\n", + " print(pair)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A zip object is a kind of iterator, which is any object\n", + "that iterates through a sequence. Iterators are similar to lists in some\n", + "ways, but unlike lists, you can’t use an index to select an element from\n", + "an iterator.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you want to use list operators and methods, you can\n", + "use a zip object to make a list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "zip(s, t)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(zip(s, t))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result is a list of tuples; in this example, each tuple contains\n", + "a character from the string and the corresponding element from\n", + "the list.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the sequences are not the same length, the result has the\n", + "length of the shorter one." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(zip('Anne', 'Elk'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "You can use tuple assignment in a for loop to traverse a list of\n", + "tuples:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = [('a', 0), ('b', 1), ('c', 2)]\n", + "for letter, number in t:\n", + " print(number, letter)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " Each time through the loop, Python selects the next tuple in\n", + "the list and assigns the elements to letter and \n", + "number. The output of this loop is:\n", + "
\n", + "\n", + "0 a\n", + "\n", + "1 b\n", + "\n", + "2 c" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you combine zip, for and tuple assignment, you get a\n", + "useful idiom for traversing two (or more) sequences at the same\n", + "time. For example, has_match takes two sequences, t1 and\n", + "t2, and returns True if there is an index i\n", + "such that t1[i] == t2[i]:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def has_match(t1, t2):\n", + " for x, y in zip(t1, t2):\n", + " if x == y:\n", + " return True\n", + " return False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you need to traverse the elements of a sequence and their\n", + "indices, you can use the built-in function enumerate:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for index, element in enumerate('abc'):\n", + " print(index, element)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result from enumerate is an enumerate object, which\n", + "iterates a sequence of pairs; each pair contains an index (starting\n", + "from 0) and an element from the given sequence.\n", + "In this example, the output is" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "0 a\n", + "1 b\n", + "2 c" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Again.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.6 Dictionaries and tuples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Dictionaries have a method called items that returns a sequence of\n", + "tuples, where each tuple is a key-value pair.\n", + "
\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = {'a':0, 'b':1, 'c':2}\n", + "t = d.items()\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result is a dict_items object, which is an iterator that\n", + "iterates the key-value pairs. You can use it in a for loop\n", + "like this:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for key, value in d.items():\n", + " print(key, value)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As you should expect from a dictionary, the items are in no\n", + "particular order." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "
\n", + "Going in the other direction, you can use a list of tuples to\n", + "initialize a new dictionary: \n", + "
\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = [('a', 0), ('c', 2), ('b', 1)]\n", + "d = dict(t)\n", + "d" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Combining dict with zip yields a concise way\n", + "to create a dictionary:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = dict(zip('abc', range(3)))\n", + "d" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The dictionary method update also takes a list of tuples\n", + "and adds them, as key-value pairs, to an existing dictionary.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is common to use tuples as keys in dictionaries (primarily because\n", + "you can’t use lists). For example, a telephone directory might map\n", + "from last-name, first-name pairs to telephone numbers. Assuming\n", + "that we have defined last, first and number, we\n", + "could write:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "directory[last, first] = number" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The expression in brackets is a tuple. We could use tuple\n", + "assignment to traverse this dictionary.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for last, first in directory:\n", + " print(first, last, directory[last,first])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This loop traverses the keys in directory, which are tuples. It\n", + "assigns the elements of each tuple to last and first, then\n", + "prints the name and corresponding telephone number." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are two ways to represent tuples in a state diagram. The more\n", + "detailed version shows the indices and elements just as they appear in\n", + "a list. For example, the tuple ('Cleese', 'John') would appear\n", + "as in Figure 12.1.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But in a larger diagram you might want to leave out the\n", + "details. For example, a diagram of the telephone directory might\n", + "appear as in Figure 12.2." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here the tuples are shown using Python syntax as a graphical\n", + "shorthand. The telephone number in the diagram is the complaints line\n", + "for the BBC, so please don’t call it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.7 Sequences of sequences" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I have focused on lists of tuples, but almost all of the examples in\n", + "this chapter also work with lists of lists, tuples of tuples, and\n", + "tuples of lists. To avoid enumerating the possible combinations, it\n", + "is sometimes easier to talk about sequences of sequences." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In many contexts, the different kinds of sequences (strings, lists and\n", + "tuples) can be used interchangeably. So how should you choose one\n", + "over the others?\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "To start with the obvious, strings are more limited than other\n", + "sequences because the elements have to be characters. They are\n", + "also immutable. If you need the ability to change the characters\n", + "in a string (as opposed to creating a new string), you might\n", + "want to use a list of characters instead.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "
\n", + "Lists are more common than tuples, mostly because they are mutable.\n", + "But there are a few cases where you might prefer tuples:\n", + "\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Because tuples are immutable, they don’t provide methods like sort and reverse, which modify existing lists. But Python\n", + "provides the built-in function sorted, which takes any sequence\n", + "and returns a new list with the same elements in sorted order, and\n", + "reversed, which takes a sequence and returns an iterator that\n", + "traverses the list in reverse order.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.8 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lists, dictionaries and tuples are examples of data\n", + "structures; in this chapter we are starting to see compound data\n", + "structures, like lists of tuples, or dictionaries that contain tuples\n", + "as keys and lists as values. Compound data structures are useful, but\n", + "they are prone to what I call shape errors; that is, errors\n", + "caused when a data structure has the wrong type, size, or structure.\n", + "For example, if you are expecting a list with one integer and I\n", + "give you a plain old integer (not in a list), it won’t work.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To help debug these kinds of errors, I have written a module\n", + "called structshape that provides a function, also called\n", + "structshape, that takes any kind of data structure as\n", + "an argument and returns a string that summarizes its shape.\n", + "You can download it from http://thinkpython2.com/code/structshape.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s the result for a simple list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from structshape import structshape\n", + "t = [1, 2, 3]\n", + "structshape(t)\n", + "'list of 3 int'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A fancier program might write “list of 3 ints”, but it\n", + "was easier not to deal with plurals. Here’s a list of lists:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t2 = [[1,2], [3,4], [5,6]]\n", + "structshape(t2)\n", + "'list of 3 list of 2 int'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the elements of the list are not the same type,\n", + "structshape groups them, in order, by type:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t3 = [1, 2, 3, 4.0, '5', '6', [7], [8], 9]\n", + "structshape(t3)\n", + "'list of (3 int, float, 2 str, 2 list of int, int)'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Here’s a list of tuples:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'abc'\n", + "lt = list(zip(t, s))\n", + "structshape(lt)\n", + "'list of 3 tuple of (int, str)'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "And here’s a dictionary with 3 items that map integers to strings." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = dict(lt) \n", + "structshape(d)\n", + "'dict of 3 int->str'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you are having trouble keeping track of your data structures,\n", + "structshape can help." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.9 Glossary" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ipytracer\n", + "from IPython.core.display import display\n", + "\n", + "def bubble_sort(unsorted_list):\n", + " x = ipytracer.ChartTracer(unsorted_list)\n", + " display(x)\n", + " length = len(x)-1\n", + " for i in range(length):\n", + " for j in range(length-i):\n", + " if x[j] > x[j+1]:\n", + " x[j], x[j+1] = x[j+1], x[j]\n", + " return x.tolist()\n", + "\n", + "bubble_sort([6,4,7,9])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ipytracer\n", + "from IPython.core.display import display\n", + "\n", + "def bubble_sort(unsorted_list):\n", + " x = ipytracer.List1DTracer(unsorted_list)\n", + " display(x)\n", + " length = len(x)-1\n", + " for i in range(length):\n", + " for j in range(length-i):\n", + " if x[j] > x[j+1]:\n", + " x[j], x[j+1] = x[j+1], x[j]\n", + " print(unsorted_list) \n", + " return x.tolist()\n", + "\n", + "bubble_sort([6,4,7,9])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ipytracer\n", + "from IPython.core.display import display\n", + "import re\n", + "\n", + " \n", + "def quick_sort(arr): \n", + " input_list = ipytracer.ChartTracer(arr)\n", + " display(input_list)\n", + "\n", + " def alphanum_key(key):\n", + " return [int(s) if s.isdigit() else s.lower() for s in re.split(\"([0-9]+)\", key)]\n", + "\n", + " return sorted(input_list, key=alphanum_key)\n", + "\n", + "\n", + "quick_sort(['6','4','7','9','3','5','1','8'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import random\n", + "def merge_sort(collectionx: list) -> list:\n", + " collectionx = ipytracer.List1DTracer(collectionx)\n", + " display(collectionx)\n", + " \n", + " for i in range(0, 8):\n", + " collectionx[i] = i\n", + " collectionx[i-1] = i-1\n", + " collectionx[i-2] = i*2\n", + "\n", + "\n", + "merge_sort([6,4,7,9,3,5,1,8,2])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def merge_sort(collection: list) -> list:\n", + "\n", + "\n", + " def merge(left: list, right: list) -> list:\n", + " \"\"\"merge left and right\n", + " :param left: left collection\n", + " :param right: right collection\n", + " :return: merge result\n", + " \"\"\"\n", + "\n", + " def _merge():\n", + " while left and right:\n", + " yield (left if left[0] <= right[0] else right).pop(0)\n", + " yield from left\n", + " yield from right\n", + "\n", + " return list(_merge())\n", + "\n", + " if len(collection) <= 1:\n", + " return collection\n", + " mid = len(collection) // 2\n", + " display(ipytracer.List1DTracer(collection))\n", + " left = merge_sort(collection[:mid])\n", + " right = merge_sort(collection[mid:])\n", + " x = merge(left, right)\n", + " display(x)\n", + " return merge(left, right)\n", + "\n", + "merge_sort([6,4,7,9,3,5,1,8,2])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def shell_sort(collection):\n", + " collection = ipytracer.List1DTracer(collection)\n", + " display(collection)\n", + " gaps = [701, 301, 132, 57, 23, 10, 4, 1]\n", + "\n", + " for gap in gaps:\n", + " for i in range(gap, len(collection)):\n", + " j = i\n", + " while j >= gap and collection[j] < collection[j - gap]:\n", + " collection[j], collection[j - gap] = collection[j - gap], collection[j]\n", + " j -= gap\n", + " return collection\n", + "\n", + "shell_sort([6,4,7,9,3,5,1,8,2])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_7__Iteration.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_7__Iteration.ipynb new file mode 100644 index 0000000..82a8cd7 --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_7__Iteration.ipynb @@ -0,0 +1,1198 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 7  Iteration" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* 7.1  Reassignment\n", + "* 7.2  Updating variables\n", + "* 7.3  The while statement\n", + "* 7.4  break\n", + "* 7.5  Square roots\n", + "* 7.6  Algorithms\n", + "* 7.7  Debugging - Demo break the problem in half\n", + "\n", + "\n", + "This chapter is about iteration, which is the ability to run\n", + "a block of statements repeatedly. We saw a kind of iteration,\n", + "using recursion, in Section 5.8.\n", + "We saw another kind, using a for loop,\n", + "in Section 4.2. In this chapter we’ll see yet another\n", + "kind, using a while statement." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.1 Reassignment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But first I want to say a little more about variable assignment." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you may have discovered, it is legal to make more than one\n", + "assignment to the same variable. A new assignment makes an existing\n", + "variable refer to a new value (and stop referring to the old value)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = 5\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "7" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = 7\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first time we display \n", + "x, its value is 5; the second time, its\n", + "value is 7." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Figure 7.1 shows what reassignment looks\n", + "like in a state diagram. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "At this point I want to address a common source of\n", + "confusion.\n", + "**Because Python uses the equal sign (=) for assignment, it is\n", + "tempting to interpret a statement like a = b as a\n", + "mathematical\n", + "proposition of equality**; that is, the claim that a and\n", + "b are equal. But this interpretation is wrong.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, equality is a symmetric relationship and assignment is not. For\n", + "example, in mathematics, if `a=7` then `7=a`. But in Python, the\n", + "statement `a = 7` is legal and `7 = a` is not." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Also, in mathematics, a proposition of equality is either true or\n", + "false for all time. If `a=b` now, then a will always equal b.\n", + "In Python, an assignment statement can make two variables equal, but\n", + "they don’t have to stay that way:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "a = 5" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "b = a # a and b are now equal" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "a = 3 # are a and b equal ?" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "b" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The third line changes the value of a but does not change the\n", + "value of b, so they are no longer equal. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Reassigning variables is often useful, but you should use it\n", + "with caution. If the values of variables change frequently, it can\n", + "make the code difficult to read and debug." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus: Python Constant\n", + "\n", + "https://docs.python.org/3/library/typing.html#typing.Final " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "MAX_SIZE: Final = 9000\n", + "MAX_SIZE += 1 # Error reported by type checker\n", + "\n", + "class Connection:\n", + " TIMEOUT: Final[int] = 10\n", + "\n", + "class FastConnector(Connection):\n", + " TIMEOUT = 1 # Error reported by type checker" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.2 Updating variables" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A common kind of reassignment is an update,\n", + "where the new value of the variable depends on the old." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "x = x + 1" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "8" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This means “get the current value of x, add one, and then\n", + "update x with the new value.”" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you try to update a variable that doesn’t exist, you get an\n", + "error, because Python evaluates the right side before it assigns\n", + "a value to x:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "8" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "ename": "NameError", + "evalue": "name 'x' is not defined", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdel\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mNameError\u001b[0m: name 'x' is not defined" + ] + } + ], + "source": [ + "del x\n", + "x = x + 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Before you can update a variable, you have to initialize\n", + "it, usually with a simple assignment:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "x = 0\n", + "x = x + 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Updating a variable by adding 1 is called an increment;\n", + "subtracting 1 is called a decrement.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus: Why are there no ++ and --​ operators in Python?\n", + "\n", + "https://stackoverflow.com/questions/3654830/why-are-there-no-and-operators-in-python\n", + "\n", + "1) Simple increment and decrement aren't needed as much as in other languages. You don't write things like \n", + "`for(int i = 0; i < 10; ++i)` \n", + "in Python very often; instead you do things like \n", + "`for i in range(0, 10)`.\n", + "\n", + "2) Python is a lot about **clarity** and no programmer is likely to correctly guess the meaning of --a unless s/he's learned a language having that construct." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x++\n", + "++x" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x+=1\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x-=1\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.3 The while statement" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Computers are often used to automate repetitive tasks. Repeating\n", + "identical or similar tasks without making errors is something that\n", + "computers do well and people do poorly. In a computer program,\n", + "repetition is also called iteration." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have already seen two functions, countdown and\n", + "print_n, that iterate using recursion. Because iteration is so\n", + "common, Python provides language features to make it easier.\n", + "One is the for statement we saw in Section 4.2.\n", + "We’ll get back to that later." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another is the while statement. Here is a version of countdown that uses a while statement:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def countdown(n):\n", + " while n > 0:\n", + " print(n)\n", + " n = n - 1\n", + " print('Blastoff!')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "You can almost read the while statement as if it were English.\n", + "It means, “While n is greater than 0,\n", + "display the value of n and then decrement\n", + "n. When you get to 0, display the word Blastoff!”\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "More formally, here is the flow of execution for a while statement:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This type of flow is called a loop because the third step\n", + "loops back around to the top. \n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The body of the loop should change the value of one or more variables\n", + "so that the condition becomes false eventually and the loop\n", + "terminates. Otherwise the loop will repeat forever, which is called\n", + "an infinite loop. An endless source of amusement for computer\n", + "scientists is the observation that the directions on shampoo,\n", + "“Lather, rinse, repeat”, are an infinite loop.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the case of countdown, we can prove that the loop\n", + "terminates: if n is zero or negative, the loop never runs.\n", + "Otherwise, n gets smaller each time through the\n", + "loop, so eventually we have to get to 0." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For some other loops, it is not so easy to tell. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def sequence(n):\n", + " while n != 1:\n", + " print(n)\n", + " if n % 2 == 0: # n is even\n", + " n = n / 2\n", + " else: # n is odd\n", + " n = n*3 + 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The condition for this loop is n != 1, so the loop will continue\n", + "until n is 1, which makes the condition false." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each time through the loop, the program outputs the value of n\n", + "and then checks whether it is even or odd. If it is even, n is\n", + "divided by 2. If it is odd, the value of n is replaced with\n", + "n*3 + 1. For example, if the argument passed to sequence\n", + "is 3, the resulting values of n are 3, 10, 5, 16, 8, 4, 2, 1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since n sometimes increases and sometimes decreases, there is no\n", + "obvious proof that n will ever reach 1, or that the program\n", + "terminates. For some particular values of n, we can prove\n", + "termination. For example, if the starting value is a power of two,\n", + "n will be even every time through the loop\n", + "until it reaches 1. The previous example ends with such a sequence,\n", + "starting with 16.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The hard question is whether we can prove that this program terminates\n", + "for all positive values of n. So far, no one has\n", + "been able to prove it or disprove it! (See\n", + "http://en.wikipedia.org/wiki/Collatz_conjecture.)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an exercise, rewrite the function print_n from\n", + "Section 5.8 using iteration instead of recursion." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus: Collatz conjecture sequence" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "def collatz_sequence(x):\n", + " seq = [x]\n", + " if x < 1:\n", + " return []\n", + " while x > 1:\n", + " if x % 2 == 0:\n", + " x = x / 2\n", + " else:\n", + " x = 3 * x + 1 \n", + " seq.append(x)\n", + " return seq" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[5, 16, 8.0, 4.0, 2.0, 1.0]" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "collatz_sequence(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "![alt text](https://wikimedia.org/api/rest_v1/media/math/render/svg/ec22031bdc2a1ab2e4effe47ae75a836e7dea459)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Resources\n", + "\n", + "* [Project Euler is a series of challenging mathematical/computer programming problems ](https://projecteuler.net/)\n", + "* [Collatz Conjecture in Color - Numberphile](https://www.youtube.com/watch?v=LqKpkdRRLZw)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.4 break" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Sometimes you don’t know it’s time to end a loop until you get half\n", + "way through the body. In that case you can use the break\n", + "statement to jump out of the loop." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For example, suppose you want to take input from the user until they\n", + "type done. You could write:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "> d\n", + "d\n", + "> wed\n", + "wed\n", + "> done\n", + "Done!\n" + ] + } + ], + "source": [ + "while True:\n", + " line = input('> ')\n", + " if line == 'done':\n", + " break\n", + " print(line)\n", + "\n", + "print('Done!')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The loop condition is True, which is always true, so the\n", + "loop runs until it hits the break statement." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each time through, it prompts the user with an angle bracket.\n", + "If the user types done, the break statement exits\n", + "the loop. Otherwise the program echoes whatever the user types\n", + "and goes back to the top of the loop. Here’s a sample run:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "> not done\n", + "not done\n", + "> done\n", + "Done!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This way of writing while loops is common because you\n", + "can check the condition anywhere in the loop (not just at the\n", + "top) and you can express the stop condition affirmatively\n", + "(“stop when this happens”) rather than negatively (“keep going\n", + "until that happens”)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.5 Square roots" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Loops are often used in programs that compute\n", + "numerical results by starting with an approximate answer and\n", + "iteratively improving it.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For example, one way of computing square roots is Newton’s method.\n", + "Suppose that you want to know the square root of a. If you start\n", + "with almost any estimate, x, you can compute a better\n", + "estimate with the following formula:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "For example, if a is 4 and x is 3:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.1666666666666665" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a = 4\n", + "x = 3\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result is closer to the correct answer (√4 = 2). If we\n", + "repeat the process with the new estimate, it gets even closer:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.0064102564102564" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = y\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "After a few more updates, the estimate is almost exact:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.0000102400262145" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = y\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.0000000000262146" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = y\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In general we don’t know ahead of time how many steps it takes\n", + "to get to the right answer, but we know when we get there\n", + "because the estimate\n", + "stops changing:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.0" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = y\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.0" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = y\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "When y == x, we can stop. Here is a loop that starts\n", + "with an initial estimate, x, and improves it until it\n", + "stops changing:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "while True:\n", + " print(x)\n", + " y = (x + a/x) / 2\n", + " if y == x:\n", + " break\n", + " x = y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "For most values of a this works fine, but in general it is\n", + "dangerous to test float equality.\n", + "Floating-point values are only approximately right:\n", + "most rational numbers, like 1/3, and irrational numbers, like\n", + "√2, can’t be represented exactly with a float.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Rather than checking whether x and y are exactly equal, it\n", + "is safer to use the built-in function abs to compute the\n", + "absolute value, or magnitude, of the difference between them:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + " if abs(y-x) < epsilon:\n", + " break" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Where epsilon has a value like 0.0000001 that\n", + "determines how close is close enough." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.6 Algorithms" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Newton’s method is an example of an algorithm: it is a\n", + "mechanical process for solving a category of problems (in this\n", + "case, computing square roots)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To understand what an algorithm is, it might help to start with\n", + "something that is not an algorithm. When you learned to multiply\n", + "single-digit numbers, you probably memorized the multiplication table.\n", + "In effect, you memorized 100 specific solutions. That kind of\n", + "knowledge is not algorithmic." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But if you were “lazy”, you might have learned a few\n", + "tricks. For example, to find the product of n and 9, you can\n", + "write n−1 as the first digit and 10−n as the second\n", + "digit. This trick is a general solution for multiplying any\n", + "single-digit number by 9. That’s an algorithm!\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Similarly, the techniques you learned for addition with carrying,\n", + "subtraction with borrowing, and long division are all algorithms. One\n", + "of the characteristics of algorithms is that they do not require any\n", + "intelligence to carry out. They are mechanical processes where\n", + "each step follows from the last according to a simple set of rules." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Executing algorithms is boring, but designing them is interesting,\n", + "intellectually challenging, and a central part of computer science." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Some of the things that people do naturally, without difficulty or\n", + "conscious thought, are the hardest to express algorithmically.\n", + "Understanding natural language is a good example. We all do it, but\n", + "so far no one has been able to explain how we do it, at least\n", + "not in the form of an algorithm." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.7 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you start writing bigger programs, you might find yourself\n", + "spending more time debugging. More code means more chances to\n", + "make an error and more places for bugs to hide.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One way to cut your debugging time is “debugging by bisection”.\n", + "For example, if there are 100 lines in your program and you\n", + "check them one at a time, it would take 100 steps." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Instead, try to break the problem in half. Look at the middle\n", + "of the program, or near it, for an intermediate value you\n", + "can check. Add a print statement (or something else\n", + "that has a verifiable effect) and run the program." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the mid-point check is incorrect, there must be a problem in the\n", + "first half of the program. If it is correct, the problem is\n", + "in the second half." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Every time you perform a check like this, you halve the number of\n", + "lines you have to search. After six steps (which is fewer than 100),\n", + "you would be down to one or two lines of code, at least in theory." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In practice it is not always clear what\n", + "the “middle of the program” is and not always possible to\n", + "check it. It doesn’t make sense to count lines and find the\n", + "exact midpoint. Instead, think about places\n", + "in the program where there might be errors and places where it\n", + "is easy to put a check. Then choose a spot where you\n", + "think the chances are about the same that the bug is before\n", + "or after the check." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_8__Strings.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_8__Strings.ipynb new file mode 100644 index 0000000..56f6504 --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_8__Strings.ipynb @@ -0,0 +1,1627 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 8  Strings\n", + "\n", + "* 8.1  A string is a sequence\n", + "* 8.2  len\n", + "* 8.3  Traversal with a for loop\n", + "* 8.4  String slices\n", + "* 8.5  Strings are immutable\n", + "* 8.6  Searching\n", + "* 8.7  Looping and counting\n", + "* 8.8  String methods\n", + "* 8.9  The in operator\n", + "* 8.10  String comparison\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "https://en.wikipedia.org/wiki/ASCII\n", + "![strings_in_python](strings_in_python.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.1 A string is a sequence" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Strings are not like integers, floats, and booleans. A string\n", + "is a sequence, which means it is\n", + "an ordered collection of other values. In this chapter you’ll see\n", + "how to access the characters that make up a string, and you’ll\n", + "learn about some of the methods strings provide.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "\n", + "\n", + "A string is a sequence of characters. \n", + "You can access the characters one at a time with the\n", + "bracket operator:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "fruit = 'banana'\n", + "letter = fruit[1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The second statement selects character number 1 from fruit and assigns it to letter. \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The expression in brackets is called an index. \n", + "The index indicates which character in the sequence you\n", + "want (hence the name)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But you might not get what you expect:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'a'" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letter" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "For most people, the first letter of 'banana' is b, not\n", + "a. But for computer scientists, the index is an offset from the\n", + "beginning of the string, and the offset of the first letter is zero." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'b'" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letter = fruit[0]\n", + "letter" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "So b is the 0th letter (“zero-eth”) of 'banana', a is the 1th letter (“one-eth”), and n is the 2th letter\n", + "(“two-eth”). " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an index you can use an expression that contains variables and\n", + "operators:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'a'" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "i = 1\n", + "fruit[i]" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'n'" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fruit[i+1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But the value of the index has to be an integer. Otherwise you\n", + "get:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "string indices must be integers", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mletter\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfruit\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1.5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: string indices must be integers" + ] + } + ], + "source": [ + "letter = fruit[1.5]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.2 len" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "len is a built-in function that returns the number of characters\n", + "in a string:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fruit = 'banana'\n", + "len(fruit)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "To get the last letter of a string, you might be tempted to try something\n", + "like this:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "length = len(fruit)\n", + "last = fruit[length]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The reason for the IndexError is that there is no letter in ’banana’ with the index 6. Since we started counting at zero, the\n", + "six letters are numbered 0 to 5. To get the last character, you have\n", + "to subtract 1 from length:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "last = fruit[length-1]\n", + "last" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Or you can use negative indices, which count backward from\n", + "the end of the string. The expression fruit[-1] yields the last\n", + "letter, fruit[-2] yields the second to last, and so on.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fruit[-1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.3 Traversal with a for loop" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A lot of computations involve processing a string one character at a\n", + "time. Often they start at the beginning, select each character in\n", + "turn, do something to it, and continue until the end. This pattern of\n", + "processing is called a traversal. One way to write a traversal\n", + "is with a while loop:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "index = 0\n", + "while index < len(fruit):\n", + " letter = fruit[index]\n", + " print(letter, end='')\n", + " index = index + 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This loop traverses the string and displays each letter on a line by\n", + "itself. The loop condition is index < len(fruit), so\n", + "when index is equal to the length of the string, the\n", + "condition is false, and the body of the loop doesn’t run. The\n", + "last character accessed is the one with the index len(fruit)-1,\n", + "which is the last character in the string." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exercise" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an **exercise**, write a function that takes a string as an argument\n", + "and displays the letters backward, one per line." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "index = len(fruit) -1\n", + "while index >= 0:\n", + " letter = fruit[index]\n", + " print(letter)\n", + " index -= 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another way to write a traversal is with a for loop:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for letter in fruit:\n", + " print(letter)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Each time through the loop, the next character in the string is assigned\n", + "to the variable letter. The loop continues until no characters are\n", + "left.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following example shows how to use concatenation (string addition)\n", + "and a for loop to generate an abecedarian series (that is, in\n", + "alphabetical order). In Robert McCloskey’s book Make\n", + "Way for Ducklings, the names of the ducklings are Jack, Kack, Lack,\n", + "Mack, Nack, Ouack, Pack, and Quack. This loop outputs these names in\n", + "order:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "prefixes = 'JKLMNOPQ'\n", + "suffix = 'ack'\n", + "\n", + "for letter in prefixes:\n", + " print(letter + suffix)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The output is:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Jack\n", + "Kack\n", + "Lack\n", + "Mack\n", + "Nack\n", + "Oack\n", + "Pack\n", + "Qack" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Of course, that’s not quite right because “Ouack” and “Quack” are\n", + "misspelled. As an exercise, modify the program to fix this error." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.4 String slices" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A segment of a string is called a slice. Selecting a slice is\n", + "similar to selecting a character:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'Monty Python'\n", + "s[0:5]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s[6:12]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The operator [n:m] returns the part of the string from the \n", + "“n-eth” character to the “m-eth” character, including the first but\n", + "excluding the last. This behavior is counterintuitive, but it might\n", + "help to imagine the indices pointing between the\n", + "characters, as in Figure 8.1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you omit the first index (before the colon), the slice starts at\n", + "the beginning of the string. If you omit the second index, the slice\n", + "goes to the end of the string:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fruit = 'banana'\n", + "fruit[:3]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fruit[7:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the first index is greater than or equal to the second the result\n", + "is an empty string, represented by two quotation marks:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fruit = 'banana'\n", + "fruit[3:3]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "An empty string contains no characters and has length 0, but other\n", + "than that, it is the same as any other string." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Continuing this example, what do you think \n", + "fruit[:] means? Try it and see.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus: Extended Slices\n", + "\n", + "https://docs.python.org/2/whatsnew/2.3.html#extended-slices\n", + "\n", + "[begin:end:step]\n", + "\n", + "* leaving begin and end off\n", + "* specify a step of -1" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'ananab'" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fruit[::-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'bnn'" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fruit[::2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.5 Strings are immutable" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is tempting to use the [] operator on the left side of an\n", + "assignment, with the intention of changing a character in a string.\n", + "For example:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "'str' object does not support item assignment", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mgreeting\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'Hello, world!'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mgreeting\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'J'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment" + ] + } + ], + "source": [ + "greeting = 'Hello, world!'\n", + "greeting[0] = 'J'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The “object” in this case is the string and the “item” is\n", + "the character you tried to assign. For now, an object is\n", + "the same thing as a value, but we will refine that definition\n", + "later (Section 10.10). \n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The reason for the error is that\n", + "strings are immutable, which means you can’t change an\n", + "existing string. The best you can do is create a new string\n", + "that is a variation on the original:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Jello, world!'" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "greeting = 'Hello, world!'\n", + "new_greeting = 'J' + greeting[1:]\n", + "new_greeting" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'a5'" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'a' + str(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This example concatenates a new first letter onto\n", + "a slice of greeting. It has no effect on\n", + "the original string.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.6 Searching" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What does the following function do?\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def find(word, letter):\n", + " index = 0\n", + " while index < len(word):\n", + " if word[index] == letter:\n", + " return index\n", + " index = index + 1\n", + " return -1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In a sense, find is the inverse of the [] operator.\n", + "Instead of taking an index and extracting the corresponding character,\n", + "it takes a character and finds the index where that character\n", + "appears. If the character is not found, the function returns -1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is the first example we have seen of a return statement\n", + "inside a loop. If word[index] == letter, the function breaks\n", + "out of the loop and returns immediately." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the character doesn’t appear in the string, the program\n", + "exits the loop normally and returns -1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This pattern of computation—traversing a sequence and returning\n", + "when we find what we are looking for—is called a search.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an exercise, modify find so that it has a\n", + "third parameter, the index in word where it should start\n", + "looking." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.7 Looping and counting" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following program counts the number of times the letter a\n", + "appears in a string:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3\n" + ] + } + ], + "source": [ + "word = 'banana'\n", + "count = 0\n", + "for letter in word:\n", + " if letter == 'a':\n", + " count = count + 1\n", + "print(count)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This program demonstrates another pattern of computation called a counter. The variable count is initialized to 0 and then\n", + "incremented each time an a is found.\n", + "When the loop exits, count\n", + "contains the result—the total number of a’s." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As an exercise, encapsulate this code in a function named count, and generalize it so that it accepts the string and the\n", + "letter as arguments." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then rewrite the function so that instead of\n", + "traversing the string, it uses the three-parameter version of find from the previous section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.8 String methods" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Strings provide methods that perform a variety of useful operations.\n", + "A method is similar to a function—it takes arguments and\n", + "returns a value—but the syntax is different. For example, the\n", + "method upper takes a string and returns a new string with\n", + "all uppercase letters.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Instead of the function syntax upper(word), it uses\n", + "the method syntax word.upper()." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'BANANA'" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word = 'banana'\n", + "new_word = word.upper()\n", + "new_word" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'banana'" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_word.lower()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This form of dot notation specifies the name of the method, upper, and the name of the string to apply the method to, word. The empty parentheses indicate that this method takes no\n", + "arguments.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A method call is called an invocation; in this case, we would\n", + "say that we are invoking upper on word.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As it turns out, there is a string method named find that\n", + "is remarkably similar to the function we wrote:" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word = 'banana'\n", + "index = word.find('a')\n", + "index" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In this example, we invoke find on word and pass\n", + "the letter we are looking for as a parameter." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Actually, the find method is more general than our function;\n", + "it can find substrings, not just characters:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word.find('na')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "By default, find starts at the beginning of the string, but\n", + "it can take a second argument, the index where it should start:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word.find('na', 3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This is an example of an optional argument;\n", + "find can\n", + "also take a third argument, the index where it should stop:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "-1" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "name = 'bob'\n", + "name.find('b', 1, 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This search fails because b does not\n", + "appear in the index range from 1 to 2, not including 2. Searching up to, but not including, the second index makes\n", + "find consistent with the slice operator." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus\n", + "\n", + "Split\n", + "https://docs.python.org/2/library/string.html#string.split\n", + "\n", + "Built-in Functions\n", + "https://docs.python.org/3/library/functions.html" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Monty Python, Monty Python']" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s = 'Monty Python, Monty Python'\n", + "s.split('$')" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'nbananaobananahbananatbananaybananaPbanana bananaybananatbanananbananaobananaMbanana banana,banananbananaobananahbananatbananaybananaPbanana bananaybananatbanananbananaobananaM'" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fruit.join(reversed(s))" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'a,n,a,n,a,b'" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "','.join(reversed(fruit))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.9 The in operator" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The word in is a boolean operator that takes two strings and\n", + "returns True if the first appears as a substring in the second:" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'a' in 'banana'" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'seed' in 'banana'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "For example, the following function prints all the\n", + "letters from word1 that also appear in word2:" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [], + "source": [ + "def in_both(word1, word2):\n", + " for letter in word1:\n", + " if letter in word2:\n", + " print(letter)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "With well-chosen variable names,\n", + "Python sometimes reads like English. You could read\n", + "this loop, “for (each) letter in (the first) word, if (the) letter \n", + "(appears) in (the second) word, print (the) letter.”" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s what you get if you compare apples and oranges:" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a\n", + "e\n", + "s\n" + ] + } + ], + "source": [ + "in_both('apples', 'oranges')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.10 String comparison" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The relational operators work on strings. To see if two strings are equal:" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All right, bananas.\n" + ] + } + ], + "source": [ + "if word == 'banana':\n", + " print('All right, bananas.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Other relational operations are useful for putting words in alphabetical\n", + "order:" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All right, bananas.\n" + ] + } + ], + "source": [ + "if word < 'banana':\n", + " print('Your word, ' + word + ', comes before banana.')\n", + "elif word > 'banana':\n", + " print('Your word, ' + word + ', comes after banana.')\n", + "else:\n", + " print('All right, bananas.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Python does not handle uppercase and lowercase letters the same way\n", + "people do. All the uppercase letters come before all the\n", + "lowercase letters, so:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Your word, Pineapple, comes before banana." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A common way to address this problem is to convert strings to a\n", + "standard format, such as all lowercase, before performing the\n", + "comparison. Keep that in mind in case you have to defend yourself\n", + "against a man armed with a Pineapple." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.11 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you use indices to traverse the values in a sequence,\n", + "it is tricky to get the beginning and end of the traversal\n", + "right. Here is a function that is supposed to compare two\n", + "words and return True if one of the words is the reverse\n", + "of the other, but it contains two errors:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_reverse(word1, word2):\n", + " if len(word1) != len(word2):\n", + " return False\n", + " \n", + " i = 0\n", + " j = len(word2)\n", + "\n", + " while j > 0:\n", + " if word1[i] != word2[j]:\n", + " return False\n", + " i = i+1\n", + " j = j-1\n", + "\n", + " return True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first if statement checks whether the words are the\n", + "same length. If not, we can return False immediately.\n", + "Otherwise, for the rest of the function, we can assume that the words\n", + "are the same length. This is an example of the guardian pattern\n", + "in Section 6.8.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "i and j are indices: i traverses word1\n", + "forward while j traverses word2 backward. If we find\n", + "two letters that don’t match, we can return False immediately.\n", + "If we get through the whole loop and all the letters match, we\n", + "return True." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we test this function with the words “pots” and “stop”, we\n", + "expect the return value True, but we get an IndexError:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_reverse('pots', 'stop')\n", + "...\n", + " File \"reverse.py\", line 15, in is_reverse\n", + " if word1[i] != word2[j]:\n", + "IndexError: string index out of range" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "For debugging this kind of error, my first move is to\n", + "print the values of the indices immediately before the line\n", + "where the error appears." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + " while j > 0:\n", + " print(i, j) # print here\n", + " \n", + " if word1[i] != word2[j]:\n", + " return False\n", + " i = i+1\n", + " j = j-1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Now when I run the program again, I get more information:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_reverse('pots', 'stop')\n", + "0 4\n", + "...\n", + "IndexError: string index out of range" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first time through the loop, the value of j is 4,\n", + "which is out of range for the string 'pots'.\n", + "The index of the last character is 3, so the\n", + "initial value for j should be len(word2)-1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If I fix that error and run the program again, I get:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_reverse('pots', 'stop')\n", + "0 3\n", + "1 2\n", + "2 1\n", + "True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This time we get the right answer, but it looks like the loop only ran\n", + "three times, which is suspicious. To get a better idea of what is\n", + "happening, it is useful to draw a state diagram. During the first\n", + "iteration, the frame for is_reverse is shown in\n", + "Figure 8.2. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I took some license by arranging the variables in the frame\n", + "and adding dotted lines to show that the values of i and\n", + "j indicate characters in word1 and word2." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Starting with this diagram, run the program on paper, changing the\n", + "values of i and j during each iteration. Find and fix the\n", + "second error in this function.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.12 Glossary" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_9__Case_study_A_word_play.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_9__Case_study_A_word_play.ipynb new file mode 100644 index 0000000..6eac903 --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_9__Case_study_A_word_play.ipynb @@ -0,0 +1,1405 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 9  Case study: word play" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9.1 Reading word lists" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This chapter presents the second case study, which involves\n", + "solving word puzzles by searching for words that have certain\n", + "properties. For example, we’ll find the longest palindromes\n", + "in English and search for words whose letters appear in\n", + "alphabetical order. And I will present another program development\n", + "plan: reduction to a previously solved problem." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For the exercises in this chapter we need a list of English words.\n", + "There are lots of word lists available on the Web, but the one most\n", + "suitable for our purpose is one of the word lists collected and\n", + "contributed to the public domain by Grady Ward as part of the Moby\n", + "lexicon project (see http://wikipedia.org/wiki/Moby_Project). It\n", + "is a list of 113,809 official crosswords; that is, words that are\n", + "considered valid in crossword puzzles and other word games. In the\n", + "Moby collection, the filename is 113809of.fic; you can download\n", + "a copy, with the simpler name words.txt, from\n", + "http://thinkpython2.com/code/words.txt.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This file is in plain text, so you can open it with a text\n", + "editor, but you can also read it from Python. The built-in\n", + "function open takes the name of the file as a parameter\n", + "and returns a file object you can use to read the file.\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "fin = open('words.txt')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "<_io.TextIOWrapper name='words.txt' mode='r' encoding='UTF-8'>" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fin" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "fin is a common name for a file object used for input. The file\n", + "object provides several methods for reading, including readline,\n", + "which reads characters from the file until it gets to a newline and\n", + "returns the result as a string: \n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'aa\\n'" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fin.readline()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first word in this particular list is “aa”, which is a kind of\n", + "lava. The sequence \\n represents the newline character that \n", + "separates this word from the next." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The file object keeps track of where it is in the file, so\n", + "if you call readline again, you get the next word:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'aah\\n'" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fin.readline()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The next word is “aah”, which is a perfectly legitimate\n", + "word, so stop looking at me like that.\n", + "Or, if it’s the newline character that’s bothering you,\n", + "we can get rid of it with the string method strip:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'aahed'" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "line = fin.readline()\n", + "word = line.strip()\n", + "word" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "You can also use a file object as part of a for loop.\n", + "This program reads words.txt and prints each word, one\n", + "per line:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "zymurgy\n" + ] + } + ], + "source": [ + "fin = open('words.txt')\n", + "for line in fin:\n", + " word = line.strip()\n", + " #print(word)\n", + "print(word)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9.2 Exercises" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are solutions to these exercises in the next section.\n", + "You should at least attempt each one before you read the solutions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 1** Write a program that reads words.txt and prints only the words with more than 20 characters (not counting whitespace)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
word
0aa
1aah
2aahed
3aahing
4aahs
\n", + "
" + ], + "text/plain": [ + " word\n", + "0 aa\n", + "1 aah\n", + "2 aahed\n", + "3 aahing\n", + "4 aahs" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "words = pd.read_csv('words.txt', names=['word'])\n", + "\n", + "words.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(113809, 1)" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "words.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
word
21685counterdemonstrations
47408hyperaggressivenesses
60406microminiaturizations
\n", + "
" + ], + "text/plain": [ + " word\n", + "21685 counterdemonstrations\n", + "47408 hyperaggressivenesses\n", + "60406 microminiaturizations" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "words[words['word'].str.len() > 20]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 2** \n", + "In 1939 Ernest Vincent Wright published a 50,000 word novel called Gadsby that does not contain the letter “e”. Since “e” is the most common letter in English, that’s not easy to do.\n", + "\n", + "In fact, it is difficult to construct a solitary thought without using that most common symbol. It is slow going at first, but with caution and hours of training you can gradually gain facility.\n", + "\n", + "All right, I’ll stop now.\n", + "\n", + "Write a function called has_no_e that returns True if the given word doesn’t have the letter “e” in it.\n", + "\n", + "Write a program that reads words.txt and prints only the words that have no “e”. Compute the percentage of words in the list that have no “e”." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(37641, 1)" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "words[~words['word'].fillna('_').str.contains('e')].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(76168, 1)" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "words[words['word'].fillna('_').str.contains('e')].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
word
113800zymogenes
113801zymogens
113802zymologies
113804zymoses
113807zymurgies
\n", + "
" + ], + "text/plain": [ + " word\n", + "113800 zymogenes\n", + "113801 zymogens\n", + "113802 zymologies\n", + "113804 zymoses\n", + "113807 zymurgies" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "words[words['word'].fillna('_').str.contains('e')].tail()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 3**\n", + "Write a function named avoids that takes a word and a string of forbidden letters, and that returns True if the word doesn’t use any of the forbidden letters.\n", + "\n", + "Write a program that prompts the user to enter a string of forbidden letters and then prints the number of words that don’t contain any of them. Can you find a combination of 5 forbidden letters that excludes the smallest number of words?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 4** \n", + "Write a function named uses_only that takes a word and a string of letters, and that returns True if the word contains only letters in the list. Can you make a sentence using only the letters acefhlo? Other than “Hoe alfalfa”?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 5** \n", + "Write a function named uses_all that takes a word and a string of required letters, and that returns True if the word uses all the required letters at least once. How many words are there that use all the vowels aeiou? How about aeiouy?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 6**\n", + "Write a function called is_abecedarian that returns True if the letters in a word appear in alphabetical order (double letters are ok). How many abecedarian words are there?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9.3 Search" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pixiedust database opened successfully\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " \n", + " Pixiedust version 1.1.18\n", + "
\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pixiedust" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All of the exercises in the previous section have something\n", + "in common; they can be solved with the search pattern we saw\n", + "in Section 8.6. The simplest example is:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "def has_no_e(word):\n", + " for letter in word:\n", + " if letter == 'e':\n", + " return False\n", + " return True" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "has_no_e('letter')" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "has_no_e('xxxx')" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "pixiedust": { + "displayParams": {} + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "%%pixie_debugger\n", + "has_no_e('letter')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The for loop traverses the characters in word. If we find\n", + "the letter “e”, we can immediately return False; otherwise we\n", + "have to go to the next letter. If we exit the loop normally, that\n", + "means we didn’t find an “e”, so we return True.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "You could write this function more concisely using the in\n", + "operator, but I started with this version because it \n", + "demonstrates the logic of the search pattern." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "avoids is a more general version of has_no_e but it\n", + "has the same structure:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "def avoids(word, forbidden):\n", + " for letter in word:\n", + " if letter in forbidden:\n", + " return False\n", + " return True" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "avoids('hintw', 'wz')" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "pixiedust": { + "displayParams": {} + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "%%pixie_debugger\n", + "avoids('hint', 'wz')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "We can return False as soon as we find a forbidden letter;\n", + "if we get to the end of the loop, we return True." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "uses_only is similar except that the sense of the condition\n", + "is reversed:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "def uses_only(word, available):\n", + " for letter in word: \n", + " if letter not in available:\n", + " return False\n", + " return True" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "uses_only('hinth', 'inth')" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "pixiedust": { + "displayParams": {} + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "%%pixie_debugger\n", + "uses_only('hint', 'inh')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Instead of a list of forbidden letters, we have a list of available\n", + "letters. If we find a letter in word that is not in\n", + "available, we can return False." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "uses_all is similar except that we reverse the role\n", + "of the word and the string of letters:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def uses_all(word, required):\n", + " for letter in required: \n", + " if letter not in word:\n", + " return False\n", + " return True" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "uses_only('hinth', 'inth')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pixiedust": { + "displayParams": {} + } + }, + "outputs": [], + "source": [ + "%%pixie_debugger\n", + "uses_only('hintt', 'inth')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Instead of traversing the letters in word, the loop\n", + "traverses the required letters. If any of the required letters\n", + "do not appear in the word, we can return False.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you were really thinking like a computer scientist, you would\n", + "have recognized that uses_all was an instance of a\n", + "previously solved problem, and you would have written:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def uses_all(word, required):\n", + " return uses_only(required, word)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This is an example of a program development plan called reduction to a previously solved problem, which means that you\n", + "recognize the problem you are working on as an instance of a solved\n", + "problem and apply an existing solution. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus\n", + "\n", + "How to check performance in Jupyter Notebook" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The slowest run took 9.60 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "10000000 loops, best of 3: 143 ns per loop\n" + ] + } + ], + "source": [ + "%%timeit \n", + "a = \"abc\"\n", + "b = \"abcdefghijklmnopqrstuvwxyz\"\n", + "for i in a:\n", + " if i in b: \n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1000000 loops, best of 3: 678 ns per loop\n" + ] + } + ], + "source": [ + "%%timeit \n", + "b = \"abc\"\n", + "a = \"abcdefghijklmnopqrstuvwxyz\"\n", + "for i in a:\n", + " if i in b: \n", + " pass" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9.4 Looping with indices" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I wrote the functions in the previous section with for\n", + "loops because I only needed the characters in the strings; I didn’t\n", + "have to do anything with the indices." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For is_abecedarian we have to compare adjacent letters,\n", + "which is a little tricky with a for loop:" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "def is_abecedarian(word):\n", + " previous = word[0]\n", + " for c in word:\n", + " if c < previous:\n", + " return False\n", + " previous = c\n", + " return True" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_abecedarian('hintt')" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "pixiedust": { + "displayParams": {} + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "%%pixie_debugger\n", + "is_abecedarian('hintt')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An alternative is to use recursion:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_abecedarian(word):\n", + " if len(word) <= 1:\n", + " return True\n", + " if word[0] > word[1]:\n", + " return False\n", + " return is_abecedarian(word[1:])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another option is to use a while loop:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_abecedarian(word):\n", + " i = 0\n", + " while i < len(word)-1:\n", + " if word[i+1] < word[i]:\n", + " return False\n", + " i = i+1\n", + " return True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The loop starts at i=0 and ends when i=len(word)-1. Each\n", + "time through the loop, it compares the ith character (which you can\n", + "think of as the current character) to the i+1th character (which you\n", + "can think of as the next)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the next character is less than (alphabetically before) the current\n", + "one, then we have discovered a break in the abecedarian trend, and\n", + "we return False." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we get to the end of the loop without finding a fault, then the\n", + "word passes the test. To convince yourself that the loop ends\n", + "correctly, consider an example like 'flossy'. The\n", + "length of the word is 6, so\n", + "the last time the loop runs is when i is 4, which is the\n", + "index of the second-to-last character. On the last iteration,\n", + "it compares the second-to-last character to the last, which is\n", + "what we want.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is a version of is_palindrome (see\n", + "Exercise 3) that uses two indices; one starts at the\n", + "beginning and goes up; the other starts at the end and goes down." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_palindrome(word):\n", + " i = 0\n", + " j = len(word)-1\n", + "\n", + " while i\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
foodPortion sizeper 100 gramsenergy
0Fish cake90 cals per cake200 calsMedium
1Fish fingers50 cals per piece220 calsMedium
2Gammon320 cals280 calsMed-High
3Haddock fresh200 cals110 calsLow calorie
4Halibut fresh220 cals125 calsLow calorie
\n", + "" + ], + "text/plain": [ + " food Portion size per 100 grams energy\n", + "0 Fish cake 90 cals per cake 200 cals Medium\n", + "1 Fish fingers 50 cals per piece 220 cals Medium\n", + "2 Gammon 320 cals 280 cals Med-High\n", + "3 Haddock fresh 200 cals 110 cals Low calorie\n", + "4 Halibut fresh 220 cals 125 cals Low calorie" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from tabula import read_pdf\n", + "import pandas as pd\n", + "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=3, pandas_options={'header': None})\n", + "df.columns = ['food', 'Portion size ', 'per 100 grams', 'energy']\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "s = df.energy" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Medium 14\n", + "High 6\n", + "Low calorie 4\n", + "Med-High 4\n", + "Low-Med 1\n", + "Low- Med 1\n", + "Name: energy, dtype: int64" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "counts = s.value_counts()\n", + "counts" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Medium 0.466667\n", + "High 0.200000\n", + "Low calorie 0.133333\n", + "Med-High 0.133333\n", + "Low-Med 0.033333\n", + "Low- Med 0.033333\n", + "Name: energy, dtype: float64" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "percent = s.value_counts(normalize=True)\n", + "percent" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Medium 46.7%\n", + "High 20.0%\n", + "Low calorie 13.3%\n", + "Med-High 13.3%\n", + "Low-Med 3.3%\n", + "Low- Med 3.3%\n", + "Name: energy, dtype: object" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "percent100 = s.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'\n", + "percent100" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countsperper100
Medium140.46666746.7%
High60.20000020.0%
Low calorie40.13333313.3%
Med-High40.13333313.3%
Low-Med10.0333333.3%
Low- Med10.0333333.3%
\n", + "
" + ], + "text/plain": [ + " counts per per100\n", + "Medium 14 0.466667 46.7%\n", + "High 6 0.200000 20.0%\n", + "Low calorie 4 0.133333 13.3%\n", + "Med-High 4 0.133333 13.3%\n", + "Low-Med 1 0.033333 3.3%\n", + "Low- Med 1 0.033333 3.3%" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.DataFrame({'counts': counts, 'per': percent, 'per100': percent100})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = df.energy\n", + "counts = s.value_counts()\n", + "percent = s.value_counts(normalize=True)\n", + "percent100 = s.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'\n", + "pd.DataFrame({'counts': counts, 'per': percent, 'per100': percent100})" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/Pandas search in column, every column and regex.ipynb b/notebooks/Pandas search in column, every column and regex.ipynb index 699ac60..c64e94f 100644 --- a/notebooks/Pandas search in column, every column and regex.ipynb +++ b/notebooks/Pandas search in column, every column and regex.ipynb @@ -1226,7 +1226,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.8" } }, "nbformat": 4, diff --git a/notebooks/Python Extract Table from PDF.ipynb b/notebooks/Python Extract Table from PDF.ipynb index 47add32..fbdb305 100644 --- a/notebooks/Python Extract Table from PDF.ipynb +++ b/notebooks/Python Extract Table from PDF.ipynb @@ -55,247 +55,16 @@ "metadata": {}, "outputs": [ { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
BREADS & CEREALSPortion size *per 100 grams (3.5 oz)Unnamed: 3energy content
0Bagel ( 1 average )140 cals (45g)310 calsNaNMedium
1Biscuit digestives86 cals (per biscuit)480 calsNaNHigh
2Jaffa cake48 cals (per biscuit)370 calsNaNMed-High
3Bread white (thick slice)96 cals (1 slice 40g)240 calsNaNMedium
4Bread wholemeal (thick)88 cals (1 slice 40g)220 calsNaNLow-med
5Chapatis250 cals300 calsNaNMedium
6Cornflakes130 cals (35g)370 calsNaNMed-High
7Crackerbread17 cals per slice325 calsNaNLow Calorie
8Cream crackers35 cals (per cracker)440 calsNaNLow / portion
9Crumpets93 cals (per crumpet)198 calsNaNLow-Med
10Flapjacks basic fruit mix320 cals500 calsNaNHigh
11Macaroni (boiled)238 cals (250g)95 calsNaNLow calorie
12Muesli195 cals (50g)390 calsNaNMed-high
13Naan bread (normal)300 cals (small plate size)320 calsNaNMedium
14Noodles (boiled)175 cals (250g)70 calsNaNLow calorie
15Pasta ( normal boiled )330 cals (300g)110 calsNaNLow calorie
16Pasta (wholemeal boiled )315 cals (300g)105 calsNaNLow calorie
17Porridge oats (with water)193 cals (350g)55 calsNaNLow calorie
18Potatoes** (boiled)210 cals (300g)70 calsNaNLow calorie
19Potatoes** (roast)420 cals (300g)140 calsNaNMedium
\n", - "
" - ], - "text/plain": [ - " BREADS & CEREALS Portion size * \\\n", - "0 Bagel ( 1 average ) 140 cals (45g) \n", - "1 Biscuit digestives 86 cals (per biscuit) \n", - "2 Jaffa cake 48 cals (per biscuit) \n", - "3 Bread white (thick slice) 96 cals (1 slice 40g) \n", - "4 Bread wholemeal (thick) 88 cals (1 slice 40g) \n", - "5 Chapatis 250 cals \n", - "6 Cornflakes 130 cals (35g) \n", - "7 Crackerbread 17 cals per slice \n", - "8 Cream crackers 35 cals (per cracker) \n", - "9 Crumpets 93 cals (per crumpet) \n", - "10 Flapjacks basic fruit mix 320 cals \n", - "11 Macaroni (boiled) 238 cals (250g) \n", - "12 Muesli 195 cals (50g) \n", - "13 Naan bread (normal) 300 cals (small plate size) \n", - "14 Noodles (boiled) 175 cals (250g) \n", - "15 Pasta ( normal boiled ) 330 cals (300g) \n", - "16 Pasta (wholemeal boiled ) 315 cals (300g) \n", - "17 Porridge oats (with water) 193 cals (350g) \n", - "18 Potatoes** (boiled) 210 cals (300g) \n", - "19 Potatoes** (roast) 420 cals (300g) \n", - "\n", - " per 100 grams (3.5 oz) Unnamed: 3 energy content \n", - "0 310 cals NaN Medium \n", - "1 480 cals NaN High \n", - "2 370 cals NaN Med-High \n", - "3 240 cals NaN Medium \n", - "4 220 cals NaN Low-med \n", - "5 300 cals NaN Medium \n", - "6 370 cals NaN Med-High \n", - "7 325 cals NaN Low Calorie \n", - "8 440 cals NaN Low / portion \n", - "9 198 cals NaN Low-Med \n", - "10 500 cals NaN High \n", - "11 95 cals NaN Low calorie \n", - "12 390 cals NaN Med-high \n", - "13 320 cals NaN Medium \n", - "14 70 cals NaN Low calorie \n", - "15 110 cals NaN Low calorie \n", - "16 105 cals NaN Low calorie \n", - "17 55 cals NaN Low calorie \n", - "18 70 cals NaN Low calorie \n", - "19 140 cals NaN Medium " - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"./tmp/pdf/Food Calories List.pdf\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" + ] } ], "source": [ @@ -309,226 +78,16 @@ "metadata": {}, "outputs": [ { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
BREADS & CEREALSPortion size *per 100 grams (3.5 oz)energy content
0Bagel ( 1 average )140 cals (45g)310 calsMedium
1Biscuit digestives86 cals (per biscuit)480 calsHigh
2Jaffa cake48 cals (per biscuit)370 calsMed-High
3Bread white (thick slice)96 cals (1 slice 40g)240 calsMedium
4Bread wholemeal (thick)88 cals (1 slice 40g)220 calsLow-med
5Chapatis250 cals300 calsMedium
6Cornflakes130 cals (35g)370 calsMed-High
7Crackerbread17 cals per slice325 calsLow Calorie
8Cream crackers35 cals (per cracker)440 calsLow / portion
9Crumpets93 cals (per crumpet)198 calsLow-Med
10Flapjacks basic fruit mix320 cals500 calsHigh
11Macaroni (boiled)238 cals (250g)95 calsLow calorie
12Muesli195 cals (50g)390 calsMed-high
13Naan bread (normal)300 cals (small plate size)320 calsMedium
14Noodles (boiled)175 cals (250g)70 calsLow calorie
15Pasta ( normal boiled )330 cals (300g)110 calsLow calorie
16Pasta (wholemeal boiled )315 cals (300g)105 calsLow calorie
17Porridge oats (with water)193 cals (350g)55 calsLow calorie
18Potatoes** (boiled)210 cals (300g)70 calsLow calorie
19Potatoes** (roast)420 cals (300g)140 calsMedium
\n", - "
" - ], - "text/plain": [ - " BREADS & CEREALS Portion size * \\\n", - "0 Bagel ( 1 average ) 140 cals (45g) \n", - "1 Biscuit digestives 86 cals (per biscuit) \n", - "2 Jaffa cake 48 cals (per biscuit) \n", - "3 Bread white (thick slice) 96 cals (1 slice 40g) \n", - "4 Bread wholemeal (thick) 88 cals (1 slice 40g) \n", - "5 Chapatis 250 cals \n", - "6 Cornflakes 130 cals (35g) \n", - "7 Crackerbread 17 cals per slice \n", - "8 Cream crackers 35 cals (per cracker) \n", - "9 Crumpets 93 cals (per crumpet) \n", - "10 Flapjacks basic fruit mix 320 cals \n", - "11 Macaroni (boiled) 238 cals (250g) \n", - "12 Muesli 195 cals (50g) \n", - "13 Naan bread (normal) 300 cals (small plate size) \n", - "14 Noodles (boiled) 175 cals (250g) \n", - "15 Pasta ( normal boiled ) 330 cals (300g) \n", - "16 Pasta (wholemeal boiled ) 315 cals (300g) \n", - "17 Porridge oats (with water) 193 cals (350g) \n", - "18 Potatoes** (boiled) 210 cals (300g) \n", - "19 Potatoes** (roast) 420 cals (300g) \n", - "\n", - " per 100 grams (3.5 oz) energy content \n", - "0 310 cals Medium \n", - "1 480 cals High \n", - "2 370 cals Med-High \n", - "3 240 cals Medium \n", - "4 220 cals Low-med \n", - "5 300 cals Medium \n", - "6 370 cals Med-High \n", - "7 325 cals Low Calorie \n", - "8 440 cals Low / portion \n", - "9 198 cals Low-Med \n", - "10 500 cals High \n", - "11 95 cals Low calorie \n", - "12 390 cals Med-high \n", - "13 320 cals Medium \n", - "14 70 cals Low calorie \n", - "15 110 cals Low calorie \n", - "16 105 cals Low calorie \n", - "17 55 cals Low calorie \n", - "18 70 cals Low calorie \n", - "19 140 cals Medium " - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"./tmp/pdf/Food Calories List.pdf\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdropna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'columns'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" + ] } ], "source": [ @@ -543,40 +102,15 @@ "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "-- ------------------------ ----------------- -------- -----------\n", - " 0 Fish fingers 50 cals per piece 220 cals Medium\n", - " 1 Gammon 320 cals 280 cals Med-High\n", - " 2 Haddock fresh 200 cals 110 cals Low calorie\n", - " 3 Halibut fresh 220 cals 125 cals Low calorie\n", - " 4 Ham 6 cals 240 cals Medium\n", - " 5 Herring fresh grilled 300 cals 200 cals Medium\n", - " 6 Kidney 200 cals 160 cals Medium\n", - " 7 Kipper 200 cals 120 cals Low calorie\n", - " 8 Liver 200 cals 150 cals Medium\n", - " 9 Liver pate 150 cals 300 cals Medium\n", - "10 Lamb (roast) 300 cals 300 cals Med-High\n", - "11 Lobster boiled 200 cals 100 cals Low calorie\n", - "12 Luncheon meat 300 cals 400 cals High\n", - "13 Mackeral 320 cals 300 cals Medium\n", - "14 Mussels 90 cals 90 cals Low-Med\n", - "15 Pheasant roast 200 cals 200 cals Medium\n", - "16 Pilchards (tinned) 140 cals 140 cals Medium\n", - "17 Prawns 180 cals 100 cals Low- Med\n", - "18 Pork 320 cals 290 cals Med-High\n", - "19 Pork pie 320 cals 450 cals High\n", - "20 Rabbit 200 cals 180 cals Medium\n", - "21 Salmon fresh 220 cals 180 cals Medium\n", - "22 Sardines tinned in oil 220 cals 220 cals Medium\n", - "23 Sardines in tomato sauce 180 cals 180 cals Medium\n", - "24 Sausage pork fried 250 cals 320 cals High\n", - "25 Sausage pork grilled 220 cals 280 cals Med-High\n", - "26 Sausage roll 290 cals 480 cals High\n", - "27 Scampi fried in oil 400 cals 340 cals High\n", - "28 Steak & kidney pie 400 cals 350 cals High\n", - "-- ------------------------ ----------------- -------- -----------\n" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"./tmp/pdf/Food Calories List.pdf\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpages\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mprint\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mtabulate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" ] } ], @@ -591,618 +125,16 @@ "metadata": {}, "outputs": [ { - "data": { - "text/plain": [ - "[{'extraction_method': 'stream',\n", - " 'top': 0.0,\n", - " 'left': 0.0,\n", - " 'width': 524.6400146484375,\n", - " 'height': 725.6300048828125,\n", - " 'data': [[{'top': 65.19,\n", - " 'left': 120.24,\n", - " 'width': 48.599998474121094,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Fish cake'},\n", - " {'top': 65.19,\n", - " 'left': 241.2,\n", - " 'width': 79.91999816894531,\n", - " 'height': 7.880000114440918,\n", - " 'text': '90 cals per cake'},\n", - " {'top': 65.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 65.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 87.75,\n", - " 'left': 114.6,\n", - " 'width': 60.00000762939453,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Fish fingers'},\n", - " {'top': 87.75,\n", - " 'left': 239.52,\n", - " 'width': 83.27998352050781,\n", - " 'height': 7.880000114440918,\n", - " 'text': '50 cals per piece'},\n", - " {'top': 87.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 87.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 110.19,\n", - " 'left': 120.72,\n", - " 'width': 47.63999938964844,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Gammon'},\n", - " {'top': 110.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '320 cals'},\n", - " {'top': 110.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '280 cals'},\n", - " {'top': 110.19,\n", - " 'left': 467.76,\n", - " 'width': 53.03997802734375,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Med-High'}],\n", - " [{'top': 132.75,\n", - " 'left': 107.88,\n", - " 'width': 73.31999969482422,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Haddock fresh'},\n", - " {'top': 132.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 132.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '110 cals'},\n", - " {'top': 132.75,\n", - " 'left': 464.04,\n", - " 'width': 60.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low calorie'}],\n", - " [{'top': 155.19,\n", - " 'left': 111.6,\n", - " 'width': 66.00000762939453,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Halibut fresh'},\n", - " {'top': 155.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 155.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '125 cals'},\n", - " {'top': 155.19,\n", - " 'left': 464.04,\n", - " 'width': 60.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low calorie'}],\n", - " [{'top': 177.75,\n", - " 'left': 131.4,\n", - " 'width': 26.279998779296875,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Ham'},\n", - " {'top': 177.75,\n", - " 'left': 265.92,\n", - " 'width': 30.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '6 cals'},\n", - " {'top': 177.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '240 cals'},\n", - " {'top': 177.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 200.19,\n", - " 'left': 93.72,\n", - " 'width': 101.63999938964844,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Herring fresh grilled'},\n", - " {'top': 200.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 200.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 200.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 222.75,\n", - " 'left': 125.4,\n", - " 'width': 38.279991149902344,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Kidney'},\n", - " {'top': 222.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 222.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '160 cals'},\n", - " {'top': 222.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 245.19,\n", - " 'left': 126.36,\n", - " 'width': 36.36000061035156,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Kipper'},\n", - " {'top': 245.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 245.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '120 cals'},\n", - " {'top': 245.19,\n", - " 'left': 464.04,\n", - " 'width': 60.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low calorie'}],\n", - " [{'top': 267.75,\n", - " 'left': 130.08,\n", - " 'width': 29.039993286132812,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Liver'},\n", - " {'top': 267.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 267.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '150 cals'},\n", - " {'top': 267.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 290.19,\n", - " 'left': 118.56,\n", - " 'width': 51.96000671386719,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Liver pate'},\n", - " {'top': 290.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '150 cals'},\n", - " {'top': 290.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 290.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 312.75,\n", - " 'left': 111.96,\n", - " 'width': 65.2800064086914,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Lamb (roast)'},\n", - " {'top': 312.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 312.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 312.75,\n", - " 'left': 467.76,\n", - " 'width': 53.03997802734375,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Med-High'}],\n", - " [{'top': 335.19,\n", - " 'left': 108.24,\n", - " 'width': 72.5999984741211,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Lobster boiled'},\n", - " {'top': 335.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 335.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '100 cals'},\n", - " {'top': 335.19,\n", - " 'left': 464.04,\n", - " 'width': 60.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low calorie'}],\n", - " [{'top': 357.75,\n", - " 'left': 105.96,\n", - " 'width': 77.2800064086914,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Luncheon meat'},\n", - " {'top': 357.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 357.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '400 cals'},\n", - " {'top': 357.75,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}],\n", - " [{'top': 380.19,\n", - " 'left': 120.36,\n", - " 'width': 48.36000061035156,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Mackeral'},\n", - " {'top': 380.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '320 cals'},\n", - " {'top': 380.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 380.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 402.75,\n", - " 'left': 123.36,\n", - " 'width': 42.36000061035156,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Mussels'},\n", - " {'top': 402.75,\n", - " 'left': 262.92,\n", - " 'width': 36.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '90 cals'},\n", - " {'top': 402.75,\n", - " 'left': 373.08,\n", - " 'width': 36.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '90 cals'},\n", - " {'top': 402.75,\n", - " 'left': 468.84,\n", - " 'width': 51.000030517578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low-Med'}],\n", - " [{'top': 425.19,\n", - " 'left': 108.6,\n", - " 'width': 72.00000762939453,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Pheasant roast'},\n", - " {'top': 425.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 425.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 425.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 447.75,\n", - " 'left': 100.2,\n", - " 'width': 88.68000793457031,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Pilchards (tinned)'},\n", - " {'top': 447.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '140 cals'},\n", - " {'top': 447.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '140 cals'},\n", - " {'top': 447.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 470.19,\n", - " 'left': 125.4,\n", - " 'width': 38.279991149902344,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Prawns'},\n", - " {'top': 470.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '180 cals'},\n", - " {'top': 470.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '100 cals'},\n", - " {'top': 470.19,\n", - " 'left': 467.28,\n", - " 'width': 54.000030517578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low- Med'}],\n", - " [{'top': 492.75,\n", - " 'left': 131.76,\n", - " 'width': 28.680007934570312,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Pork'},\n", - " {'top': 492.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '320 cals'},\n", - " {'top': 492.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '290 cals'},\n", - " {'top': 492.75,\n", - " 'left': 467.76,\n", - " 'width': 53.03997802734375,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Med-High'}],\n", - " [{'top': 515.19,\n", - " 'left': 122.88,\n", - " 'width': 43.31999969482422,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Pork pie'},\n", - " {'top': 515.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '320 cals'},\n", - " {'top': 515.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '450 cals'},\n", - " {'top': 515.19,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}],\n", - " [{'top': 537.75,\n", - " 'left': 127.08,\n", - " 'width': 35.03999328613281,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Rabbit'},\n", - " {'top': 537.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 537.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '180 cals'},\n", - " {'top': 537.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 560.19,\n", - " 'left': 111.24,\n", - " 'width': 66.72000885009766,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Salmon fresh'},\n", - " {'top': 560.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 560.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '180 cals'},\n", - " {'top': 560.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 582.75,\n", - " 'left': 91.92,\n", - " 'width': 105.36000061035156,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Sardines tinned in oil'},\n", - " {'top': 582.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 582.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 582.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 605.19,\n", - " 'left': 83.28,\n", - " 'width': 122.63999938964844,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Sardines in tomato sauce'},\n", - " {'top': 605.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '180 cals'},\n", - " {'top': 605.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '180 cals'},\n", - " {'top': 605.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 627.75,\n", - " 'left': 98.04,\n", - " 'width': 92.99999237060547,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Sausage pork fried'},\n", - " {'top': 627.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '250 cals'},\n", - " {'top': 627.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '320 cals'},\n", - " {'top': 627.75,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}],\n", - " [{'top': 650.19,\n", - " 'left': 93.72,\n", - " 'width': 101.63999938964844,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Sausage pork grilled'},\n", - " {'top': 650.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 650.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '280 cals'},\n", - " {'top': 650.19,\n", - " 'left': 467.76,\n", - " 'width': 53.03997802734375,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Med-High'}],\n", - " [{'top': 672.75,\n", - " 'left': 113.52,\n", - " 'width': 62.040000915527344,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Sausage roll'},\n", - " {'top': 672.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '290 cals'},\n", - " {'top': 672.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '480 cals'},\n", - " {'top': 672.75,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}],\n", - " [{'top': 695.19,\n", - " 'left': 98.28,\n", - " 'width': 92.63999938964844,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Scampi fried in oil'},\n", - " {'top': 695.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '400 cals'},\n", - " {'top': 695.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '340 cals'},\n", - " {'top': 695.19,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}],\n", - " [{'top': 717.75,\n", - " 'left': 96.96,\n", - " 'width': 95.2800064086914,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Steak & kidney pie'},\n", - " {'top': 717.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '400 cals'},\n", - " {'top': 717.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '350 cals'},\n", - " {'top': 717.75,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}]]}]" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"./tmp/pdf/Food Calories List.pdf\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpages\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moutput_format\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"json\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" + ] } ], "source": [ @@ -1216,387 +148,16 @@ "metadata": {}, "outputs": [ { - "data": { - "text/plain": [ - "[ 0 1 \\\n", - " 0 BREADS & CEREALS Portion size * \n", - " 1 Bagel ( 1 average ) 140 cals (45g) \n", - " 2 Biscuit digestives 86 cals (per biscuit) \n", - " 3 Jaffa cake 48 cals (per biscuit) \n", - " 4 Bread white (thick slice) 96 cals (1 slice 40g) \n", - " 5 Bread wholemeal (thick) 88 cals (1 slice 40g) \n", - " 6 Chapatis 250 cals \n", - " 7 Cornflakes 130 cals (35g) \n", - " 8 Crackerbread 17 cals per slice \n", - " 9 Cream crackers 35 cals (per cracker) \n", - " 10 Crumpets 93 cals (per crumpet) \n", - " 11 Flapjacks basic fruit mix 320 cals \n", - " 12 Macaroni (boiled) 238 cals (250g) \n", - " 13 Muesli 195 cals (50g) \n", - " 14 Naan bread (normal) 300 cals (small plate size) \n", - " 15 Noodles (boiled) 175 cals (250g) \n", - " 16 Pasta ( normal boiled ) 330 cals (300g) \n", - " 17 Pasta (wholemeal boiled ) 315 cals (300g) \n", - " 18 Porridge oats (with water) 193 cals (350g) \n", - " 19 Potatoes** (boiled) 210 cals (300g) \n", - " 20 Potatoes** (roast) 420 cals (300g) \n", - " \n", - " 2 3 4 \n", - " 0 per 100 grams (3.5 oz) NaN energy content \n", - " 1 310 cals NaN Medium \n", - " 2 480 cals NaN High \n", - " 3 370 cals NaN Med-High \n", - " 4 240 cals NaN Medium \n", - " 5 220 cals NaN Low-med \n", - " 6 300 cals NaN Medium \n", - " 7 370 cals NaN Med-High \n", - " 8 325 cals NaN Low Calorie \n", - " 9 440 cals NaN Low / portion \n", - " 10 198 cals NaN Low-Med \n", - " 11 500 cals NaN High \n", - " 12 95 cals NaN Low calorie \n", - " 13 390 cals NaN Med-high \n", - " 14 320 cals NaN Medium \n", - " 15 70 cals NaN Low calorie \n", - " 16 110 cals NaN Low calorie \n", - " 17 105 cals NaN Low calorie \n", - " 18 55 cals NaN Low calorie \n", - " 19 70 cals NaN Low calorie \n", - " 20 140 cals NaN Medium ,\n", - " 0 1 2 3\n", - " 0 Rice (white boiled) 420 cals (300g) 140 cals Low calorie\n", - " 1 Rice (egg-fried) 500 cals 200 cals High in portion\n", - " 2 Rice ( Brown ) 405 cals (300g) 135 cals Low calorie\n", - " 3 Rice cakes 28 Cals = 1 slice 373 Cals Medium\n", - " 4 Ryvita Multi grain 37 Cals per slice 331 Cals Medium\n", - " 5 Ryvita + seed & Oats 180 Cals 4 slices 362 Cals Medium\n", - " 6 Spaghetti (boiled) 303 cals (300g) 101 cals Low calorie,\n", - " 0 1 2 3 \\\n", - " 0 Meats & Fish Portion size * per 100 grams (3.5 oz) NaN \n", - " 1 Anchovies tinned 300 cals 300 cals NaN \n", - " 2 Bacon average fried 250 cals (2 rashers) 500 cals NaN \n", - " 3 Bacon average grilled 150 cals 380 cals NaN \n", - " 4 Beef (roast) 300 cals 280 cals NaN \n", - " 5 Beef burgers frozen 320 cals 280 cals NaN \n", - " 6 Chicken 220 cals 200 cals NaN \n", - " 7 Cockles 50 cals 50 cals NaN \n", - " 8 Cod fresh 150 cals 100 cals NaN \n", - " 9 Cod chip shop food 400 cals 200 cals NaN \n", - " 10 Crab fresh 200 cals 110 cals NaN \n", - " 11 Duck roast 400 cals 430 cals NaN \n", - " \n", - " 4 \n", - " 0 energy content \n", - " 1 Medium \n", - " 2 High \n", - " 3 Med-High \n", - " 4 Medium \n", - " 5 Med-High \n", - " 6 Medium \n", - " 7 Low \n", - " 8 Low calorie \n", - " 9 Med-High \n", - " 10 low calorie \n", - " 11 High ,\n", - " 0 1 2 3\n", - " 0 Fish cake 90 cals per cake 200 cals Medium\n", - " 1 Fish fingers 50 cals per piece 220 cals Medium\n", - " 2 Gammon 320 cals 280 cals Med-High\n", - " 3 Haddock fresh 200 cals 110 cals Low calorie\n", - " 4 Halibut fresh 220 cals 125 cals Low calorie\n", - " 5 Ham 6 cals 240 cals Medium\n", - " 6 Herring fresh grilled 300 cals 200 cals Medium\n", - " 7 Kidney 200 cals 160 cals Medium\n", - " 8 Kipper 200 cals 120 cals Low calorie\n", - " 9 Liver 200 cals 150 cals Medium\n", - " 10 Liver pate 150 cals 300 cals Medium\n", - " 11 Lamb (roast) 300 cals 300 cals Med-High\n", - " 12 Lobster boiled 200 cals 100 cals Low calorie\n", - " 13 Luncheon meat 300 cals 400 cals High\n", - " 14 Mackeral 320 cals 300 cals Medium\n", - " 15 Mussels 90 cals 90 cals Low-Med\n", - " 16 Pheasant roast 200 cals 200 cals Medium\n", - " 17 Pilchards (tinned) 140 cals 140 cals Medium\n", - " 18 Prawns 180 cals 100 cals Low- Med\n", - " 19 Pork 320 cals 290 cals Med-High\n", - " 20 Pork pie 320 cals 450 cals High\n", - " 21 Rabbit 200 cals 180 cals Medium\n", - " 22 Salmon fresh 220 cals 180 cals Medium\n", - " 23 Sardines tinned in oil 220 cals 220 cals Medium\n", - " 24 Sardines in tomato sauce 180 cals 180 cals Medium\n", - " 25 Sausage pork fried 250 cals 320 cals High\n", - " 26 Sausage pork grilled 220 cals 280 cals Med-High\n", - " 27 Sausage roll 290 cals 480 cals High\n", - " 28 Scampi fried in oil 400 cals 340 cals High\n", - " 29 Steak & kidney pie 400 cals 350 cals High,\n", - " 0 1 2 3\n", - " 0 Taramasalata 130 cals 490 cals High\n", - " 1 Trout fresh 200 cals 120 cals Low calorie\n", - " 2 Tuna tinned water 100 cals 100 cals Low calorie\n", - " 3 Tuna tinned oil 180 cals 180 cals Medium\n", - " 4 Turkey 200 cals 160 cals Medium\n", - " 5 Veal 300 cals 240 cals Medium,\n", - " 0 1 2 3 \\\n", - " 0 Fruits & Vegetables Portion size * per 100 grams (3.5 oz) NaN \n", - " 1 Apple 44 calories 44 calories NaN \n", - " 2 Banana 107 cals 65 calories NaN \n", - " 3 Beans baked beans 170 cals 80 calories NaN \n", - " 4 Beans dried (boiled) 180 cals 130 calories NaN \n", - " 5 Blackberries 25 cals 25 calories NaN \n", - " 6 Blackcurrant 30 cals 30 calories NaN \n", - " 7 Broccoli 27 cals 32 cals NaN \n", - " 8 Cabbage (boiled) 15 calories 20 calories NaN \n", - " 9 Carrot (boiled) 16 calories 25 calories NaN \n", - " 10 Cauliflower (boiled) 20 calories 30 calories NaN \n", - " 11 Celery (boiled) 5 calories 10 calories NaN \n", - " 12 Cherry 35 calories 50 calories NaN \n", - " 13 Courgette 8 cals 20 cals NaN \n", - " 14 Cucumber 3 calories 10 calories NaN \n", - " 15 Dates 100 calories 235 calories NaN \n", - " 16 Grapes 55 calories 62 calories NaN \n", - " 17 Grapefruit 32 calories 32 calories NaN \n", - " 18 Kiwi 40 calories 50 calories NaN \n", - " 19 Leek (boiled) 10 calories 20 calories NaN \n", - " \n", - " 4 \n", - " 0 energy content \n", - " 1 Low calorie \n", - " 2 Low calorie \n", - " 3 Low calorie \n", - " 4 Low calorie \n", - " 5 Low calorie \n", - " 6 Low calorie \n", - " 7 Very low \n", - " 8 Low calorie \n", - " 9 Low calorie \n", - " 10 Low calorie \n", - " 11 Low calorie \n", - " 12 Low calorie \n", - " 13 Very low cal \n", - " 14 Low calorie \n", - " 15 Med-High \n", - " 16 Low calorie \n", - " 17 Low calorie \n", - " 18 Low calorie \n", - " 19 Low calorie ,\n", - " 0 1 2 3\n", - " 0 Lentils (boiled) 150 calories 100 calories Medium\n", - " 1 Lettuce 4 calories 15 calories Very Low\n", - " 2 Melon 14 calories 28 calories Medium\n", - " 3 Mushrooms raw one NaN NaN NaN\n", - " 4 average 3 cals 15 cals Very low cal\n", - " 5 Mushrooms (boiled) 12 calories 12 calories Low calorie\n", - " 6 Mushrooms (fried) 100 calories 145 calories High\n", - " 7 Olives 50 calories 80 calories Low calorie\n", - " 8 Onion (boiled) 14 calories 18 calories Low calorie\n", - " 9 One red Onion 49 cals 33 cals Low calorie\n", - " 10 Onions spring 3 cals 25 cals Very low cal\n", - " 11 Onion (fried) 86 calories 155 calories High\n", - " 12 Orange 40 calories 30 calories Low calorie\n", - " 13 Peas 210 calories 148 calories Medium\n", - " 14 Peas dried & boiled 200 calories 120 calories Low calorie\n", - " 15 Peach 35 calories 30 calories Low calorie\n", - " 16 Pear 45 calories 38 calories Low calorie\n", - " 17 Pepper yellow 6 cals 16 cals Very low\n", - " 18 Pineapple 40 calories 40 calories Low calorie\n", - " 19 Plum 30 calories 39 calories Low calorie\n", - " 20 Spinach 8 calories 8 calories Low calorie\n", - " 21 Strawberries (1 average) 10 calories 30 calories Low calorie\n", - " 22 Sweetcorn 95 calories 130 calories Medium\n", - " 23 Sweetcorn on the cob 70 calories 70 calories Low calorie\n", - " 24 Tomato 30 calories 20 calories Low calorie\n", - " 25 Tomato cherry 6 cals ( 3 toms) 17 Cals Very low cal\n", - " 26 Tomato puree 70 calories 70 calories Low-Medium\n", - " 27 Watercress 5 calories 20 calories Low calorie,\n", - " 0 1 \\\n", - " 0 Milk & Dairy produce Portion size * \n", - " 1 Cheese average 110 cals (25g) \n", - " 2 Cheddar types average reduced NaN \n", - " 3 fat 130 \n", - " 4 Cheese spreads average 90 cals \n", - " 5 Cottage cheese low fat 40 calories \n", - " 6 Cottage cheese 49 cals \n", - " 7 Cream cheese 200 cals \n", - " 8 Cream fresh half 128 cals \n", - " 9 Cream fresh single 160 cals \n", - " 10 Cream fresh double 340 cals \n", - " 11 Cream fresh clotted 480 cals \n", - " 12 Custard 210 cals \n", - " 13 Eggs ( 1 average size) 90 cals \n", - " 14 Eggs fried 120 cals \n", - " 15 Fromage frais 125 cals \n", - " 16 Ice cream 200 cals \n", - " 17 Milk whole 175 cals (250ml/half pint) \n", - " 18 Milk semi-skimmed 125 cals (250ml/half pint) \n", - " 19 Milk skimmed 95 cals (250ml/half pint) \n", - " 20 Milk Soya 90 cals \n", - " 21 Mousse flavored 120 cals \n", - " 22 Omelette with cheese 300 cals \n", - " 23 Trifle with cream 290 cals \n", - " 24 Yogurt natural 90 cals \n", - " 25 Yogurt reduced fat 70 cals \n", - " \n", - " 2 3 4 \n", - " 0 per 100 grams (3.5 oz) NaN energy content \n", - " 1 440 cals NaN High \n", - " 2 NaN NaN NaN \n", - " 3 260 calories NaN Medium \n", - " 4 270 NaN Medium \n", - " 5 80 cals NaN low - med \n", - " 6 98 cals NaN Low calorie \n", - " 7 428 cals NaN High \n", - " 8 160 cals NaN Med-High \n", - " 9 200 cals NaN Med-High \n", - " 10 430 cals NaN High \n", - " 11 600 cals NaN High \n", - " 12 100 cals NaN Medium \n", - " 13 150 cals NaN Medium \n", - " 14 180 cals NaN Med-High \n", - " 15 125 cals NaN Low calorie \n", - " 16 180 cals NaN Medium \n", - " 17 70 cals NaN Med-High \n", - " 18 50 cals NaN Medium \n", - " 19 38 cals NaN Low calorie \n", - " 20 36 cals NaN Low calorie \n", - " 21 140 cals NaN Medium \n", - " 22 266 cals NaN Medium \n", - " 23 190 cals NaN Medium \n", - " 24 60 cals NaN Low calorie \n", - " 25 45 cals NaN Low calorie ,\n", - " 0 1 \\\n", - " 0 Fats & Sugars Portion size * \n", - " 1 PURE FAT 9 cals (1 gram) \n", - " 2 Bombay mix 250 cals \n", - " 3 Butter 112 cals \n", - " 4 Chewing gum 8 cals per piece \n", - " 5 Chocolate 200 cals \n", - " 6 Cod liver oil 135 cals (1 tbspoon) \n", - " 7 Corn snack 125 cals \n", - " 8 Crisps (chips US) average 100 cals \n", - " 9 Honey 42 cals \n", - " 10 Jam 38 cals \n", - " 11 Lard 225 cals \n", - " 12 Low fat spread 50 cals \n", - " 13 Margarine 50 cals \n", - " 14 Mars bar 240 cals \n", - " 15 Mint sweets 10 cals per piece \n", - " 16 Oils -corn, sunflower, olive 135 cals (1 Tbspoon) \n", - " 17 Popcorn average 150 cals \n", - " 18 Sugar white table sugar 20 cals (1 tspoon) \n", - " 19 Sweets (boiled) 100 cals \n", - " 20 Syrup 15 cals \n", - " 21 Toffee 100 cals \n", - " \n", - " 2 3 4 \n", - " 0 per 100 grams (3.5 oz) NaN energy content \n", - " 1 900 cals NaN High \n", - " 2 500 cals NaN High \n", - " 3 750 cals NaN High \n", - " 4 - NaN Low calorie \n", - " 5 500 cals NaN High \n", - " 6 900 cals NaN High \n", - " 7 500 cals NaN High \n", - " 8 500 cals NaN High \n", - " 9 280 cals NaN Medium \n", - " 10 250 cals NaN Medium \n", - " 11 890 cals NaN High \n", - " 12 400 cals NaN High \n", - " 13 750 cals NaN High \n", - " 14 480 cals NaN Med-High \n", - " 15 - NaN High \n", - " 16 900 cals NaN High \n", - " 17 460 cals NaN High \n", - " 18 400 cals NaN Medium \n", - " 19 300 cals NaN Med-High \n", - " 20 300 cals NaN Medium \n", - " 21 400 cals NaN High ,\n", - " 0 1 2 \\\n", - " 0 Fruit Calories per piece Carbs (grams) \n", - " 1 Apple (1 average) 44 calories 10.5 \n", - " 2 Apple cooking 35 calories 9 \n", - " 3 Apricot 30 calories 6.7 \n", - " 4 Avocado 150 calories 2 \n", - " 5 Banana 107 calories 26 \n", - " 6 Blackberries each 1 calorie 0.2 \n", - " 7 Blackcurrant each 1.1 calorie 0.25 \n", - " 8 Blueberries (new) 100g 49 Cals ( 100g ) 15 g \n", - " 9 Cherry each 2.4 calories 0.6 \n", - " 10 Clementine 24 cals 5 \n", - " 11 Currants 5 calories 1.4 \n", - " 12 Damson 28 calories 7.2 \n", - " 13 One average date 5g 5 cals 1.2 \n", - " 14 Dates with inverted sugar 100g 250 calories 63 \n", - " 15 Figs 10 calories 2.4 \n", - " 16 Gooseberries 2.6 calories 0.65 \n", - " 17 Grapes 100g Seedless 50 cals 15 \n", - " 18 one average Grape 6g 3 calories 0.9 \n", - " 19 Grapefruit whole 100 calories 23 \n", - " 20 Guava 24 calories 4.4 \n", - " 21 Kiwi 34 calories 8 \n", - " 22 Lemon 20 calories 3.4 \n", - " 23 Lychees 3 calories 0.7 \n", - " 24 Mango 40 calories 9.5 \n", - " 25 Melon Honeydew (130g) 36 calories 9 \n", - " 26 Melon Canteloupe (130g) 25 cals 6 \n", - " 27 Nectarines 42 calories 9 \n", - " 28 Olives 6.8 calories trace \n", - " \n", - " 3 \n", - " 0 Water Content \n", - " 1 85 % \n", - " 2 88 % \n", - " 3 85 % \n", - " 4 60 % \n", - " 5 75 % \n", - " 6 85 % \n", - " 7 77 % \n", - " 8 81 % \n", - " 9 83 % \n", - " 10 66 % \n", - " 11 16 % \n", - " 12 70 % \n", - " 13 14 % \n", - " 14 12 % \n", - " 15 24 % \n", - " 16 80 % \n", - " 17 82 % \n", - " 18 82 % \n", - " 19 65 % \n", - " 20 85 % \n", - " 21 75 % \n", - " 22 85 % \n", - " 23 80 % \n", - " 24 80 % \n", - " 25 90 % \n", - " 26 93 % \n", - " 27 80 % \n", - " 28 63 % ,\n", - " 0 1 2 3\n", - " 0 Orange average 35 calories 8.5 73 %\n", - " 1 Orange large 350g 100 Cals 22g 75 %\n", - " 2 Papaya Diced (small handful) 67 Cals (20g) 17g -\n", - " 3 Passion Fruit 30 calories 3 50 %\n", - " 4 Paw Paw 28 calories 6 70 %\n", - " 5 Peach 35 calories 7 80 %\n", - " 6 Pear 45 calories 12 77 %\n", - " 7 Pineapple 50 calories 12 85 %\n", - " 8 Plum 25 calories 6 79 %\n", - " 9 Prunes 9 calories 2.2 37 %\n", - " 10 Raisins 5 calories 1.4 13 %\n", - " 11 Raspberries each 1.1 calories 0.2 87 %\n", - " 12 Rhubarb 8 calories 0.8 95 %\n", - " 13 Satsuma one average 112g 29 cals 6.5 88 %\n", - " 14 Satsumas 100g 35 calories 8.5 88 %\n", - " 15 Strawberries (1 average) 2.7 calories 0.6 90 %\n", - " 16 Sultanas 5 calories 1.4 16 %\n", - " 17 Tangerine 26 calories 6 60 %\n", - " 18 Tomatoes (1 average size) 9 cals 2.2 93 %\n", - " 19 Tomatoes Cherry (1 average size) 2 calories 0.5 90 %]" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"./tmp/pdf/Food Calories List.pdf\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpages\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'all'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmultiple_tables\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" + ] } ], "source": [ @@ -1607,7 +168,9 @@ { "cell_type": "code", "execution_count": 7, - "metadata": {}, + "metadata": { + "scrolled": true + }, "outputs": [ { "data": { @@ -1893,1219 +456,40 @@ "metadata": {}, "outputs": [ { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
0123
0Fruits & VegetablesPortion size *oz)energy content
1Apple44 calories44 caloriesLow calorie
2Banana107 cals65 caloriesLow calorie
3Beans baked beans170 cals80 caloriesLow calorie
4Beans dried (boiled)180 cals130 caloriesLow calorie
5Blackberries25 cals25 caloriesLow calorie
6Blackcurrant30 cals30 caloriesLow calorie
7Broccoli27 cals32 calsVery low
8Cabbage (boiled)15 calories20 caloriesLow calorie
9Carrot (boiled)16 calories25 caloriesLow calorie
10Cauliflower (boiled)20 calories30 caloriesLow calorie
11Celery (boiled)5 calories10 caloriesLow calorie
12Cherry35 calories50 caloriesLow calorie
13Courgette8 cals20 calsVery low cal
14Cucumber3 calories10 caloriesLow calorie
15Dates100 calories235 caloriesMed-High
16Grapes55 calories62 caloriesLow calorie
17Grapefruit32 calories32 caloriesLow calorie
18Kiwi40 calories50 caloriesLow calorie
19Leek (boiled)10 calories20 caloriesLow calorie
\n", - "
" - ], - "text/plain": [ - " 0 1 2 3\n", - "0 Fruits & Vegetables Portion size * oz) energy content\n", - "1 Apple 44 calories 44 calories Low calorie\n", - "2 Banana 107 cals 65 calories Low calorie\n", - "3 Beans baked beans 170 cals 80 calories Low calorie\n", - "4 Beans dried (boiled) 180 cals 130 calories Low calorie\n", - "5 Blackberries 25 cals 25 calories Low calorie\n", - "6 Blackcurrant 30 cals 30 calories Low calorie\n", - "7 Broccoli 27 cals 32 cals Very low\n", - "8 Cabbage (boiled) 15 calories 20 calories Low calorie\n", - "9 Carrot (boiled) 16 calories 25 calories Low calorie\n", - "10 Cauliflower (boiled) 20 calories 30 calories Low calorie\n", - "11 Celery (boiled) 5 calories 10 calories Low calorie\n", - "12 Cherry 35 calories 50 calories Low calorie\n", - "13 Courgette 8 cals 20 cals Very low cal\n", - "14 Cucumber 3 calories 10 calories Low calorie\n", - "15 Dates 100 calories 235 calories Med-High\n", - "16 Grapes 55 calories 62 calories Low calorie\n", - "17 Grapefruit 32 calories 32 calories Low calorie\n", - "18 Kiwi 40 calories 50 calories Low calorie\n", - "19 Leek (boiled) 10 calories 20 calories Low calorie" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\", encoding = 'ISO-8859-1',\n", - " stream=True, area = [269.875, 12.75, 790.5, 961], pages = 4, guess = False, pandas_options={'header':None})\n", - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
McKinsey Global Institute
0Disruptive technologies: Advances that will tr...
1Exhibit E2
2Speed, scope, and economic value at stake of 1...
3Illustrative rates of technology improvement I...
4and diffusion resources that could be impacted...
5Mobile $5 million vs. $40024.3 billion $1.7 tr...
6Internet Price of the fastest supercomputer in...
7an iPhone 4 today, equal in performance (MFLOP...
86x 1 billion Interaction and transaction worker
9Growth in sales of smartphones and tablets sin...
10launch of iPhone in 2007 40% of global workfor...
11Automation 100x 230+ million $9+ trillion
12of knowledge Increase in computing power from ...
13work (chess champion in 1997) to Watson (Jeopa...
14winner in 2011) Smartphone users, with potenti...
15400+ million automated digital assistance apps
16Increase in number of users of intelligent dig...
17assistants like Siri and Google Now in past 5 ...
18The Internet 300% 1 trillion $36 trillion
19of Things Increase in connected machine-to-mac...
20over past 5 years across industries such as ma...
2180–90% health care, and mining and mining)
22Price decline in MEMS (microelectromechanical ...
23systems) sensors in past 5 years Global machin...
24connections across sectors like transportation,
25security, health care, and utilities
26Cloud 18 months 2 billion $1.7 trillion
27technology Time to double server performance p...
283x like Gmail, Yahoo, and Hotmail $3 trillion
29Monthly cost of owning a server vs. renting in...
......
52storage Price decline for a lithium-ion batter...
53electric vehicle since 2009 1.2 billion gasoli...
54People without access to electricity $100 billion
55Estimated value of electricity for
56households currently without access
573D printing 90% 320 million $11 trillion
58Lower price for a home 3D printer vs. 4 years ...
594x workforce $85 billion
60Increase in additive manufacturing revenue in ...
6110 years Annual number of toys manufactured gl...
62Advanced $1,000 vs. $50 7.6 million tons $1.2 ...
63materials Difference in price of 1 gram of nan...
6410 years 45,000 metric tons sales
65115x Annual global carbon fiber consumption $4...
66Strength-to-weight ratio of carbon nanotubes v...
67Advanced 3x 22 billion $800 billion
68oil and gas Increase in efficiency of US gas w...
69exploration 2x produced globally gas
70and recovery Increase in efficiency of US oil ...
71Barrels of crude oil produced globally Revenue...
72Renewable 85% 21,000 TWh $3.5 trillion
73energy Lower price for a solar photovoltaic ce...
742000 13 billion tons $80 billion
7519x Annual CO2 emissions from electricity Valu...
76Growth in solar photovoltaic and wind generati...
77capacity since 2000 and planes
781 Not comprehensive; indicative groups, produc...
792 For CDC-7600, considered the world’s faste...
803 Baxter is a general-purpose basic manufactur...
81SOURCE: McKinsey Global Institute analysis
\n", - "

82 rows × 1 columns

\n", - "
" - ], - "text/plain": [ - " McKinsey Global Institute\n", - "0 Disruptive technologies: Advances that will tr...\n", - "1 Exhibit E2\n", - "2 Speed, scope, and economic value at stake of 1...\n", - "3 Illustrative rates of technology improvement I...\n", - "4 and diffusion resources that could be impacted...\n", - "5 Mobile $5 million vs. $40024.3 billion $1.7 tr...\n", - "6 Internet Price of the fastest supercomputer in...\n", - "7 an iPhone 4 today, equal in performance (MFLOP...\n", - "8 6x 1 billion Interaction and transaction worker\n", - "9 Growth in sales of smartphones and tablets sin...\n", - "10 launch of iPhone in 2007 40% of global workfor...\n", - "11 Automation 100x 230+ million $9+ trillion\n", - "12 of knowledge Increase in computing power from ...\n", - "13 work (chess champion in 1997) to Watson (Jeopa...\n", - "14 winner in 2011) Smartphone users, with potenti...\n", - "15 400+ million automated digital assistance apps\n", - "16 Increase in number of users of intelligent dig...\n", - "17 assistants like Siri and Google Now in past 5 ...\n", - "18 The Internet 300% 1 trillion $36 trillion\n", - "19 of Things Increase in connected machine-to-mac...\n", - "20 over past 5 years across industries such as ma...\n", - "21 80–90% health care, and mining and mining)\n", - "22 Price decline in MEMS (microelectromechanical ...\n", - "23 systems) sensors in past 5 years Global machin...\n", - "24 connections across sectors like transportation,\n", - "25 security, health care, and utilities\n", - "26 Cloud 18 months 2 billion $1.7 trillion\n", - "27 technology Time to double server performance p...\n", - "28 3x like Gmail, Yahoo, and Hotmail $3 trillion\n", - "29 Monthly cost of owning a server vs. renting in...\n", - ".. ...\n", - "52 storage Price decline for a lithium-ion batter...\n", - "53 electric vehicle since 2009 1.2 billion gasoli...\n", - "54 People without access to electricity $100 billion\n", - "55 Estimated value of electricity for\n", - "56 households currently without access\n", - "57 3D printing 90% 320 million $11 trillion\n", - "58 Lower price for a home 3D printer vs. 4 years ...\n", - "59 4x workforce $85 billion\n", - "60 Increase in additive manufacturing revenue in ...\n", - "61 10 years Annual number of toys manufactured gl...\n", - "62 Advanced $1,000 vs. $50 7.6 million tons $1.2 ...\n", - "63 materials Difference in price of 1 gram of nan...\n", - "64 10 years 45,000 metric tons sales\n", - "65 115x Annual global carbon fiber consumption $4...\n", - "66 Strength-to-weight ratio of carbon nanotubes v...\n", - "67 Advanced 3x 22 billion $800 billion\n", - "68 oil and gas Increase in efficiency of US gas w...\n", - "69 exploration 2x produced globally gas\n", - "70 and recovery Increase in efficiency of US oil ...\n", - "71 Barrels of crude oil produced globally Revenue...\n", - "72 Renewable 85% 21,000 TWh $3.5 trillion\n", - "73 energy Lower price for a solar photovoltaic ce...\n", - "74 2000 13 billion tons $80 billion\n", - "75 19x Annual CO2 emissions from electricity Valu...\n", - "76 Growth in solar photovoltaic and wind generati...\n", - "77 capacity since 2000 and planes\n", - "78 1 Not comprehensive; indicative groups, produc...\n", - "79 2 For CDC-7600, considered the world’s faste...\n", - "80 3 Baxter is a general-purpose basic manufactur...\n", - "81 SOURCE: McKinsey Global Institute analysis\n", - "\n", - "[82 rows x 1 columns]" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = read_pdf(\"./tmp/pdf/output.pdf\", encoding = 'ISO-8859-1',\n", - " stream=True, guess = False)\n", - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Unnamed: 0Unnamed: 1over past 5 years across industries such as manufacturing,industries (manufacturing, health care,
0NaNNaN80–90% health care, and miningand mining)
1NaNNaNPrice decline in MEMS (microelectromechanical ...NaN
2NaNNaNsystems) sensors in past 5 years Global machin...NaN
3NaNNaNconnections across sectors like transportation,NaN
4NaNNaNsecurity, health care, and utilitiesNaN
5NaNCloud18 months 2 billion$1.7 trillion
6NaNtechnologyTime to double server performance per dollar G...GDP related to the Internet
7NaNNaN3x like Gmail, Yahoo, and Hotmail$3 trillion
8NaNNaNMonthly cost of owning a server vs. renting in...Enterprise IT spend
9NaNNaNthe cloud North American institutions hosting ...NaN
10NaNNaNto host critical applications on the cloudNaN
11NaNAdvanced75–85% 320 million$6 trillion
12NaNroboticsLower price for Baxter3 than a typical industr...Manufacturing worker employment
13NaNNaN170% workforcecosts, 19% of global employment costs
14NaNNaNGrowth in sales of industrial robots, 2009–1...$2–3 trillion
15NaNNaNAnnual major surgeriesCost of major surgeries
16NaNAutonomous7 1 billion$4 trillion
17NaNand near-Miles driven by top-performing driverless car ...Automobile industry revenue
18NaNautonomousDARPA Grand Challenge along a 150-mile route 4...$155 billion
19NaNvehicles1,540 Civilian, military, and general aviation...Revenue from sales of civilian, military,
20NaNNaNMiles cumulatively driven by cars competing in...and general aviation aircraft
21NaNNaNGrand ChallengeNaN
22NaNNaN300,000+NaN
23NaNNaNMiles driven by Google’s autonomous cars wit...NaN
24NaNNaN1 accident (which was human-caused)NaN
25NaNNext-10 months 26 million$6.5 trillion
26NaNgenerationTime to double sequencing speed per dollar Ann...Global health-care costs
27NaNgenomics100x disease, or type 2 diabetes$1.1 trillion
28NaNNaNIncrease in acreage of genetically modified cr...Global value of wheat, rice, maize, soy,
29NaNNaN1996–2012 People employed in agricultureand barley
30NaNEnergy40% 1 billion$2.5 trillion
31NaNstoragePrice decline for a lithium-ion battery pack i...Revenue from global consumption of
32NaNNaNelectric vehicle since 2009 1.2 billiongasoline and diesel
33NaNNaNPeople without access to electricity$100 billion
34NaNNaNNaNEstimated value of electricity for
35NaNNaNNaNhouseholds currently without access
36NaN3D printing90% 320 million$11 trillion
37NaNNaNLower price for a home 3D printer vs. 4 years ...Global manufacturing GDP
38NaNNaN4x workforce$85 billion
39NaNNaNIncrease in additive manufacturing revenue in ...Revenue from global toy sales
40NaNNaN10 years Annual number of toys manufactured gl...NaN
41NaNAdvanced$1,000 vs. $50 7.6 million tons$1.2 trillion
42NaNmaterialsDifference in price of 1 gram of nanotubes ove...Revenue from global semiconductor
43NaNNaN10 years 45,000 metric tonssales
44NaNNaN115x Annual global carbon fiber consumption$4 billion
45NaNNaNStrength-to-weight ratio of carbon nanotubes v...Revenue from global carbon fiber sales
46NaNAdvanced3x 22 billion$800 billion
47NaNoil and gasIncrease in efficiency of US gas wells, 2007â€...Revenue from global sales of natural
48NaNexploration2x produced globallygas
49NaNand recoveryIncrease in efficiency of US oil wells, 2007â€...$3.4 trillion
50NaNNaNBarrels of crude oil produced globallyRevenue from global sales of crude oil
51NaNRenewable85% 21,000 TWh$3.5 trillion
52NaNenergyLower price for a solar photovoltaic cell per ...Value of global electricity consumption
53NaNNaN2000 13 billion tons$80 billion
54NaNNaN19x Annual CO2 emissions from electricityValue of global carbon market
55NaNNaNGrowth in solar photovoltaic and wind generati...transactions
56NaNNaNcapacity since 2000 and planesNaN
571.0Not comprehensive; indicative groups, products...NaNNaN
582.0For CDC-7600, considered the world’s fastest...NaNNaN
593.0Baxter is a general-purpose basic manufacturin...NaNNaN
\n", - "
" - ], - "text/plain": [ - " Unnamed: 0 Unnamed: 1 \\\n", - "0 NaN NaN \n", - "1 NaN NaN \n", - "2 NaN NaN \n", - "3 NaN NaN \n", - "4 NaN NaN \n", - "5 NaN Cloud \n", - "6 NaN technology \n", - "7 NaN NaN \n", - "8 NaN NaN \n", - "9 NaN NaN \n", - "10 NaN NaN \n", - "11 NaN Advanced \n", - "12 NaN robotics \n", - "13 NaN NaN \n", - "14 NaN NaN \n", - "15 NaN NaN \n", - "16 NaN Autonomous \n", - "17 NaN and near- \n", - "18 NaN autonomous \n", - "19 NaN vehicles \n", - "20 NaN NaN \n", - "21 NaN NaN \n", - "22 NaN NaN \n", - "23 NaN NaN \n", - "24 NaN NaN \n", - "25 NaN Next- \n", - "26 NaN generation \n", - "27 NaN genomics \n", - "28 NaN NaN \n", - "29 NaN NaN \n", - "30 NaN Energy \n", - "31 NaN storage \n", - "32 NaN NaN \n", - "33 NaN NaN \n", - "34 NaN NaN \n", - "35 NaN NaN \n", - "36 NaN 3D printing \n", - "37 NaN NaN \n", - "38 NaN NaN \n", - "39 NaN NaN \n", - "40 NaN NaN \n", - "41 NaN Advanced \n", - "42 NaN materials \n", - "43 NaN NaN \n", - "44 NaN NaN \n", - "45 NaN NaN \n", - "46 NaN Advanced \n", - "47 NaN oil and gas \n", - "48 NaN exploration \n", - "49 NaN and recovery \n", - "50 NaN NaN \n", - "51 NaN Renewable \n", - "52 NaN energy \n", - "53 NaN NaN \n", - "54 NaN NaN \n", - "55 NaN NaN \n", - "56 NaN NaN \n", - "57 1.0 Not comprehensive; indicative groups, products... \n", - "58 2.0 For CDC-7600, considered the world’s fastest... \n", - "59 3.0 Baxter is a general-purpose basic manufacturin... \n", - "\n", - " over past 5 years across industries such as manufacturing, \\\n", - "0 80–90% health care, and mining \n", - "1 Price decline in MEMS (microelectromechanical ... \n", - "2 systems) sensors in past 5 years Global machin... \n", - "3 connections across sectors like transportation, \n", - "4 security, health care, and utilities \n", - "5 18 months 2 billion \n", - "6 Time to double server performance per dollar G... \n", - "7 3x like Gmail, Yahoo, and Hotmail \n", - "8 Monthly cost of owning a server vs. renting in... \n", - "9 the cloud North American institutions hosting ... \n", - "10 to host critical applications on the cloud \n", - "11 75–85% 320 million \n", - "12 Lower price for Baxter3 than a typical industr... \n", - "13 170% workforce \n", - "14 Growth in sales of industrial robots, 2009–1... \n", - "15 Annual major surgeries \n", - "16 7 1 billion \n", - "17 Miles driven by top-performing driverless car ... \n", - "18 DARPA Grand Challenge along a 150-mile route 4... \n", - "19 1,540 Civilian, military, and general aviation... \n", - "20 Miles cumulatively driven by cars competing in... \n", - "21 Grand Challenge \n", - "22 300,000+ \n", - "23 Miles driven by Google’s autonomous cars wit... \n", - "24 1 accident (which was human-caused) \n", - "25 10 months 26 million \n", - "26 Time to double sequencing speed per dollar Ann... \n", - "27 100x disease, or type 2 diabetes \n", - "28 Increase in acreage of genetically modified cr... \n", - "29 1996–2012 People employed in agriculture \n", - "30 40% 1 billion \n", - "31 Price decline for a lithium-ion battery pack i... \n", - "32 electric vehicle since 2009 1.2 billion \n", - "33 People without access to electricity \n", - "34 NaN \n", - "35 NaN \n", - "36 90% 320 million \n", - "37 Lower price for a home 3D printer vs. 4 years ... \n", - "38 4x workforce \n", - "39 Increase in additive manufacturing revenue in ... \n", - "40 10 years Annual number of toys manufactured gl... \n", - "41 $1,000 vs. $50 7.6 million tons \n", - "42 Difference in price of 1 gram of nanotubes ove... \n", - "43 10 years 45,000 metric tons \n", - "44 115x Annual global carbon fiber consumption \n", - "45 Strength-to-weight ratio of carbon nanotubes v... \n", - "46 3x 22 billion \n", - "47 Increase in efficiency of US gas wells, 2007â€... \n", - "48 2x produced globally \n", - "49 Increase in efficiency of US oil wells, 2007â€... \n", - "50 Barrels of crude oil produced globally \n", - "51 85% 21,000 TWh \n", - "52 Lower price for a solar photovoltaic cell per ... \n", - "53 2000 13 billion tons \n", - "54 19x Annual CO2 emissions from electricity \n", - "55 Growth in solar photovoltaic and wind generati... \n", - "56 capacity since 2000 and planes \n", - "57 NaN \n", - "58 NaN \n", - "59 NaN \n", - "\n", - " industries (manufacturing, health care, \n", - "0 and mining) \n", - "1 NaN \n", - "2 NaN \n", - "3 NaN \n", - "4 NaN \n", - "5 $1.7 trillion \n", - "6 GDP related to the Internet \n", - "7 $3 trillion \n", - "8 Enterprise IT spend \n", - "9 NaN \n", - "10 NaN \n", - "11 $6 trillion \n", - "12 Manufacturing worker employment \n", - "13 costs, 19% of global employment costs \n", - "14 $2–3 trillion \n", - "15 Cost of major surgeries \n", - "16 $4 trillion \n", - "17 Automobile industry revenue \n", - "18 $155 billion \n", - "19 Revenue from sales of civilian, military, \n", - "20 and general aviation aircraft \n", - "21 NaN \n", - "22 NaN \n", - "23 NaN \n", - "24 NaN \n", - "25 $6.5 trillion \n", - "26 Global health-care costs \n", - "27 $1.1 trillion \n", - "28 Global value of wheat, rice, maize, soy, \n", - "29 and barley \n", - "30 $2.5 trillion \n", - "31 Revenue from global consumption of \n", - "32 gasoline and diesel \n", - "33 $100 billion \n", - "34 Estimated value of electricity for \n", - "35 households currently without access \n", - "36 $11 trillion \n", - "37 Global manufacturing GDP \n", - "38 $85 billion \n", - "39 Revenue from global toy sales \n", - "40 NaN \n", - "41 $1.2 trillion \n", - "42 Revenue from global semiconductor \n", - "43 sales \n", - "44 $4 billion \n", - "45 Revenue from global carbon fiber sales \n", - "46 $800 billion \n", - "47 Revenue from global sales of natural \n", - "48 gas \n", - "49 $3.4 trillion \n", - "50 Revenue from global sales of crude oil \n", - "51 $3.5 trillion \n", - "52 Value of global electricity consumption \n", - "53 $80 billion \n", - "54 Value of global carbon market \n", - "55 transactions \n", - "56 NaN \n", - "57 NaN \n", - "58 NaN \n", - "59 NaN " - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\", encoding = 'ISO-8859-1',\n\u001b[0;32m----> 2\u001b[0;31m stream=True, area = [269.875, 12.75, 790.5, 961], pages = 4, guess = False, pandas_options={'header':None})\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" + ] } ], + "source": [ + "df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\", encoding = 'ISO-8859-1',\n", + " stream=True, area = [269.875, 12.75, 790.5, 961], pages = 4, guess = False, pandas_options={'header':None})\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = read_pdf(\"./tmp/pdf/output.pdf\", encoding = 'ISO-8859-1',\n", + " stream=True, guess = False)\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "df = read_pdf(\"./tmp/pdf/output.pdf\", encoding = 'ISO-8859-1',\n", " stream=True, area=[269.875, 12.75, 790.5, 961], guess = False)\n", @@ -3131,66 +515,9 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
0123
1Bagel ( 1 average )140 cals (45g)310 calsMedium
2Biscuit digestives86 cals (per biscuit)480 calsHigh
\n", - "
" - ], - "text/plain": [ - " 0 1 2 3\n", - "1 Bagel ( 1 average ) 140 cals (45g) 310 cals Medium\n", - "2 Biscuit digestives 86 cals (per biscuit) 480 cals High" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import camelot\n", "tables = camelot.read_pdf(\"./tmp/pdf//Food Calories List.pdf\")\n", @@ -3199,47 +526,9 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "-- -------------\n", - " 0 Mobile\n", - " Internet\n", - " 1 Automation\n", - " of knowledge\n", - " work\n", - " 2 The Internet\n", - " of Things\n", - " 3 Cloud\n", - " technology\n", - " 4 Advanced\n", - " robotics\n", - " 5 Autonomous\n", - " and near-\n", - " autonomous\n", - " vehicles\n", - " 6 Next-\n", - " generation\n", - " genomics\n", - " 7 Energy\n", - " storage\n", - " 8 3D printing\n", - " 9 Advanced\n", - " materials\n", - "10 Advanced oil\n", - " and gas\n", - " exploration\n", - " and recovery\n", - "11 Renewable\n", - " energy\n", - "-- -------------\n" - ] - } - ], + "outputs": [], "source": [ "tables1 = camelot.read_pdf(\"./tmp/pdf/MGI_Disruptive_technologies_Full_report_May2013.pdf\", pages='32', area=[269.875, 120.75, 790.5, 561])\n", "print (tabulate(tables1[0].df))" @@ -3247,57 +536,9 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "30\n", - "NOK\n", - "31\n", - "NOK\n", - "32\n", - "-- -------------\n", - " 0 Mobile\n", - " Internet\n", - " 1 Automation\n", - " of knowledge\n", - " work\n", - " 2 The Internet\n", - " of Things\n", - " 3 Cloud\n", - " technology\n", - " 4 Advanced\n", - " robotics\n", - " 5 Autonomous\n", - " and near-\n", - " autonomous\n", - " vehicles\n", - " 6 Next-\n", - " generation\n", - " genomics\n", - " 7 Energy\n", - " storage\n", - " 8 3D printing\n", - " 9 Advanced\n", - " materials\n", - "10 Advanced oil\n", - " and gas\n", - " exploration\n", - " and recovery\n", - "11 Renewable\n", - " energy\n", - "-- -------------\n", - "NOK\n", - "33\n", - "NOK\n", - "34\n", - "NOK\n" - ] - } - ], + "outputs": [], "source": [ "for i in range(30,35):\n", " print (i)\n", @@ -3324,21 +565,18 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 9, "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "b' Fish cake\\n 90 cals per cake\\n 200 cals\\n Medium\\n Fish fingers\\n 50 cals per piece\\n 220 cals\\n Medium\\n Gammon\\n 320 cals\\n 280 cals\\n Med\\n-High\\n Haddock fresh\\n 200 cals\\n 110 cals\\n Low calorie\\n Halibut fresh\\n 220 cals\\n 125 cals\\n Low calorie\\n Ha\\nm 6 cals\\n 240 cals\\n Medium\\n Herring fresh grilled\\n 300 cals\\n 200 cals\\n Medium\\n Kidney\\n 200 cals\\n 160 cals\\n Medium\\n Kipper\\n 200 cals\\n 120 cals\\n Low calorie\\n Liver\\n 200 cals\\n 150 cals\\n Medium\\n Liver\\n pate\\n 150 cals\\n 300 cals\\n Medium\\n Lamb (roast)\\n 300 cals\\n 300 cals\\n Med\\n-High\\n Lobster boiled\\n 200 cals\\n 100 cals\\n Low calorie\\n Luncheon meat\\n 300 cals\\n 400 cals\\n High\\n Mackeral\\n 320 cals\\n 300 cal\\ns Medium\\n Mussels\\n 90 cals\\n 90 cals\\n Low\\n-Med\\n Pheasant roast\\n 200 cals\\n 200 cals\\n Medium\\n Pilchards (tinned)\\n 140 cals\\n 140 cals\\n Medium\\n Prawns\\n 180 cals\\n 100 cals\\n Low\\n- Med\\n Pork \\n 320 cals\\n 290 cals\\n Med\\n-High\\n Pork pie\\n 320 cals\\n 450 cals\\n High\\n Rabbit\\n 200 cals\\n 180 cals\\n Medium\\n Salmon fresh\\n 220 cals\\n 180 cals\\n Medium\\n Sardines tinned in oil\\n 220 cals\\n 220 cals\\n Medium\\n Sardines in tomato sauce\\n 180 cals\\n 180 cals\\n Medium\\n Sausage pork fried\\n 250 cals\\n 320 cals\\n High\\n Sausage pork grilled\\n 220 cals\\n 280 cals\\n Med\\n-High\\n Sausage roll\\n 290 cals\\n 480 cals\\n High\\n Scampi fried in oil\\n 400 cals\\n 340 cals\\n High\\n Steak & kidney pie\\n 400 cals\\n 350 cals\\n High\\n '\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]\n" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mPyPDF2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mpdf_file\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'./tmp/pdf/Food Calories List.pdf'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'rb'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mread_pdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mPyPDF2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mPdfFileReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpdf_file\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mnumber_of_pages\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetNumPages\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mpage\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetPage\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" ] } ], @@ -3354,18 +592,18 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 10, "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "[' Fish cake' ' 90 cals per cake' ' 200 cals' ' Medium']\n", - "[' Fish fingers' ' 50 cals per piece' ' 220 cals' ' Medium']\n", - "[' Gammon' ' 320 cals' ' 280 cals' ' Med']\n", - "['-High' ' Haddock fresh' ' 200 cals' ' 110 cals']\n", - "[' Low calorie' ' Halibut fresh' ' 220 cals' ' 125 cals']\n" + "ename": "NameError", + "evalue": "name 'page_content' is not defined", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mtable_list\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpage_content\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'\\n'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0ml\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnumpy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray_split\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m/\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mNameError\u001b[0m: name 'page_content' is not defined" ] } ], @@ -3378,6 +616,20 @@ " print(l[i])" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "code", "execution_count": null, @@ -3402,7 +654,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.8" } }, "nbformat": 4, diff --git a/notebooks/Python group and sort a list of lists by a specific index,pattern.ipynb b/notebooks/Python group and sort a list of lists by a specific index,pattern.ipynb new file mode 100644 index 0000000..396395f --- /dev/null +++ b/notebooks/Python group and sort a list of lists by a specific index,pattern.ipynb @@ -0,0 +1,424 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "movies = [\n", + "1, \"Avatar\" ,'good',\n", + "2, \"Titanic\" ,'not bad',\n", + "3, \"Star Wars: The Force Awakens\" ,'good',\n", + "4, \"Jurassic World\" ,'good',\n", + "5, \"The Avengers\" ,'not bad',\n", + "6, \"Furious 7\" ,'not bad',\n", + "7, \"Avengers: Age of Ultron\" ,'good',\n", + "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n", + "9, \"Frozen\" ,'good',\n", + "\n", + "\n", + "\"The Birth of a Nation\" ,1915,\n", + "\"The Birth of a Nation\" ,1940,\n", + "\"Gone with the Wind\" ,1940,\n", + "\"Gone with the Wind\" ,1963,\n", + "\"Gone with the Wind\" ,1963,\n", + "\"The Sound of Music\" ,1966]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def sortGroupList(list_unsorted, category, category2, short=True):\n", + " listx = []\n", + " listy = []\n", + " last_section = 0\n", + " for i in range(0, len(list_unsorted), 3):\n", + " if list_unsorted[i + 2] == category:\n", + " listy.append(list_unsorted[i])\n", + " listy.append(list_unsorted[i + 1])\n", + " if not short:\n", + " listy.append(list_unsorted[i + 2])\n", + " last_section = i+2\n", + " elif list_unsorted[i + 2] == category2:\n", + " listx.append(list_unsorted[i])\n", + " listx.append(list_unsorted[i + 1])\n", + " if not short:\n", + " listx.append(list_unsorted[i + 2])\n", + " last_section = i + 2\n", + " header_category = [' - ' + category + ' - ']\n", + " header_category2 = [' - ' + category2 + ' - ']\n", + " header_category3 = [' - ' + ' - ']\n", + " return header_category + listy + header_category2 + listx + header_category3 + list_unsorted[last_section:]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sortGroupList(movies, 'good', 'not bad')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "movies = [\n", + "1, \"Avatar\" ,2009,\n", + "2, \"Titanic\" ,1997,\n", + "3, \"Star Wars: The Force Awakens\" ,2015,\n", + "4, \"Jurassic World\" ,2015,\n", + "5, \"The Avengers\" ,2012,\n", + "6, \"Furious 7\" ,2015,\n", + "7, \"Avengers: Age of Ultron\" ,2015,\n", + "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,2011,\n", + "9, \"Frozen\" ,2013,\n", + "\n", + "\n", + "\"The Birth of a Nation\" ,1915,\n", + "\"The Birth of a Nation\" ,1940,\n", + "\"Gone with the Wind\" ,1940,\n", + "\"Gone with the Wind\" ,1963,\n", + "\"The Sound of Music\" ,1966]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(len(movies))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "years = [str(x) for x in range(1997, 2015)]\n", + "years" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def sortGroupList(list_unsorted):\n", + " listx = []\n", + " listy = []\n", + " for i in range(0, len(list_unsorted), 3):\n", + " if list_unsorted[i + 2] in years:\n", + " listy.append(list_unsorted[i])\n", + " listy.append(list_unsorted[i + 1])\n", + " listy.append(list_unsorted[i + 2])\n", + " else:\n", + " listx.append(list_unsorted[i])\n", + " listx.append(list_unsorted[i + 1])\n", + " listx.append(list_unsorted[i + 2])\n", + " for i in listy:\n", + " print(i)\n", + " for i in listx:\n", + " print(i)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sortGroupList(movies)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "movies = [\n", + "1, \"Avatar\" ,'good',\n", + "2, \"Titanic\" ,'not bad',\n", + "3, \"Star Wars: The Force Awakens\" ,'good',\n", + "4, \"Jurassic World\" ,'good',\n", + "5, \"The Avengers\" ,'not bad',\n", + "6, \"Furious 7\" ,'not bad',\n", + "7, \"Avengers: Age of Ultron\" ,'good',\n", + "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n", + "9, \"Frozen\" ,'good',\n", + "\n", + "\n", + "\"The Birth of a Nation\" ,1915,\n", + "\"The Birth of a Nation\" ,1940,\n", + "\"Gone with the Wind\" ,1940,\n", + "\"Gone with the Wind\" ,1963,\n", + "\"The Sound of Music\" ,1966]\n", + "df = pd.DataFrame(movies)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "types = []\n", + "raw_list = []\n", + "for e in movies:\n", + " types.append(type(e))\n", + " if isinstance(e, int):\n", + " raw_list.append(1)\n", + " else:\n", + " raw_list.append(0)\n", + "df1 = pd.DataFrame({'elem':movies, 'types':types}) " + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "raw_list = [1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1]\n", + "movies = [\n", + "1, \"Avatar\" ,'good',\n", + "2, \"Titanic\" ,'not bad',\n", + "3, \"Star Wars: The Force Awakens\" ,'good',\n", + "4, \"Jurassic World\" ,'good',\n", + "5, \"The Avengers\" ,'not bad',\n", + "6, \"Furious 7\" ,'not bad',\n", + "7, \"Avengers: Age of Ultron\" ,'good',\n", + "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n", + "9, \"Frozen\" ,'good',\n", + "\n", + "\n", + "\"The Birth of a Nation\" ,1915,\n", + "\"The Birth of a Nation\" ,1940,\n", + "\"Gone with the Wind\" ,1940,\n", + "\"Gone with the Wind\" ,1963,\n", + "\"The Sound of Music\" ,1966]" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[0, 1]\n", + "[0, 1]\n", + "[0, 1]\n", + "[0, 1]\n", + "[0, 1]\n" + ] + }, + { + "data": { + "text/plain": [ + "[[1, 'Avatar', 'good'],\n", + " [2, 'Titanic', 'not bad'],\n", + " [3, 'Star Wars: The Force Awakens', 'good'],\n", + " [4, 'Jurassic World', 'good'],\n", + " [5, 'The Avengers', 'not bad'],\n", + " [6, 'Furious 7', 'not bad'],\n", + " [7, 'Avengers: Age of Ultron', 'good'],\n", + " [8, 'Harry Potter and the Deathly Hallows – Part 2', 'not bad'],\n", + " [9, 'Frozen', 'good']]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "patern1 = [1, 0, 0]\n", + "patern2 = [1, 0]\n", + "\n", + "len1 = len(patern1)\n", + "len2 = len(patern2)\n", + "\n", + "output1 = []\n", + "output2 = []\n", + "\n", + "while(raw_list):\n", + " if raw_list[:len1] == patern1: \n", + " output1.append(movies[:len1])\n", + " raw_list = raw_list[len1:]\n", + " movies = movies[len1:]\n", + " else:\n", + " print(raw_list[:len2])\n", + " output2.append(movies[:len2])\n", + " raw_list = raw_list[len2:]\n", + " movies = movies[len2:]\n", + " \n", + "output1" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['The Birth of a Nation', 1915],\n", + " ['The Birth of a Nation', 1940],\n", + " ['Gone with the Wind', 1940],\n", + " ['Gone with the Wind', 1963],\n", + " ['The Sound of Music', 1966]]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "output2" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "new_list = sorted(output1, key=lambda x: x[2])" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[[1, 'Avatar', 'good'],\n", + " [3, 'Star Wars: The Force Awakens', 'good'],\n", + " [4, 'Jurassic World', 'good'],\n", + " [7, 'Avengers: Age of Ultron', 'good'],\n", + " [9, 'Frozen', 'good'],\n", + " [2, 'Titanic', 'not bad'],\n", + " [5, 'The Avengers', 'not bad'],\n", + " [6, 'Furious 7', 'not bad'],\n", + " [8, 'Harry Potter and the Deathly Hallows – Part 2', 'not bad']]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_list" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Python make groups in a list" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Simple grouping" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/notebooks/Scrape wiki tables with pandas and python.ipynb b/notebooks/Scrape wiki tables with pandas and python.ipynb index 3264206..0a8dd21 100644 --- a/notebooks/Scrape wiki tables with pandas and python.ipynb +++ b/notebooks/Scrape wiki tables with pandas and python.ipynb @@ -35,7 +35,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 14, "metadata": {}, "outputs": [ { @@ -58,7 +58,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 15, "metadata": {}, "outputs": [ { @@ -102,8 +102,8 @@ " 1\n", " 1\n", " Russia*\n", - " 13100000\n", - " 17,125,200 including European part\n", + " 13000000\n", + " 17,125,191 km² including European Russia[1]\n", " NaN\n", " \n", " \n", @@ -117,7 +117,7 @@ " \n", " 3\n", " 3\n", - " India\n", + " India[2]\n", " 3287263\n", " NaN\n", " NaN\n", @@ -126,8 +126,8 @@ " 4\n", " 4\n", " Kazakhstan*\n", - " 2455034\n", - " 2,724,902 km² including European part\n", + " 2544900\n", + " 2,724,900 km² including European part\n", " NaN\n", " \n", " \n", @@ -166,16 +166,16 @@ " 9\n", " 9\n", " Pakistan\n", - " 796095\n", - " 882,363 km² including Gilgit-Baltistan and AJK\n", + " 881913\n", + " NaN\n", " NaN\n", " \n", " \n", " 10\n", " 10\n", " Turkey*\n", - " 747272\n", - " 783,562 km² including European part\n", + " 759592\n", + " 783,356 km² including East Thrace\n", " NaN\n", " \n", " \n", @@ -358,8 +358,8 @@ " 33\n", " 33\n", " Azerbaijan*\n", - " 86600\n", - " Sometimes considered part of Europe\n", + " 86000\n", + " Located in the Caucasus, between Europe and Asia\n", " NaN\n", " \n", " \n", @@ -375,7 +375,7 @@ " 35\n", " Georgia*\n", " 69000\n", - " Sometimes considered part of Europe\n", + " Located in the Caucasus, between Europe and Asia\n", " NaN\n", " \n", " \n", @@ -389,128 +389,120 @@ " \n", " 37\n", " 37\n", - " Egypt*\n", - " 60000\n", - " 1,002,450 km² including African part\n", + " Bhutan\n", + " 38394\n", + " NaN\n", " NaN\n", " \n", " \n", " 38\n", " 38\n", - " Bhutan\n", - " 38394\n", - " NaN\n", + " Taiwan\n", + " 36193\n", + " partially recognized state/not a UN member\n", " NaN\n", " \n", " \n", " 39\n", " 39\n", - " Taiwan\n", - " 36193\n", - " excludes Hong Kong, Macau, Mainland China and ...\n", + " Armenia\n", + " 29843\n", + " Located in the Caucasus, between Europe and Asia\n", " NaN\n", " \n", " \n", " 40\n", " 40\n", - " Armenia*\n", - " 29843\n", - " Sometimes considered part of Europe\n", + " Israel\n", + " 22072\n", + " NaN\n", " NaN\n", " \n", " \n", " 41\n", " 41\n", - " Israel\n", - " 20273\n", + " Kuwait\n", + " 17818\n", " NaN\n", " NaN\n", " \n", " \n", " 42\n", " 42\n", - " Kuwait\n", - " 17818\n", + " Timor-Leste\n", + " 14874\n", " NaN\n", " NaN\n", " \n", " \n", " 43\n", " 43\n", - " Timor-Leste\n", - " 14874\n", + " Qatar\n", + " 11586\n", " NaN\n", " NaN\n", " \n", " \n", " 44\n", " 44\n", - " Qatar\n", - " 11586\n", + " Lebanon\n", + " 10452\n", " NaN\n", " NaN\n", " \n", " \n", " 45\n", " 45\n", - " Lebanon\n", - " 10452\n", + " Cyprus\n", + " 9251\n", " NaN\n", " NaN\n", " \n", " \n", " 46\n", " 46\n", - " Cyprus\n", - " 9251\n", - " 5,896 km² excluding Northern Cyprus. Political...\n", + " Palestine\n", + " 6220\n", + " partially recognized state/non-member observer...\n", " NaN\n", " \n", " \n", " 47\n", " 47\n", - " Palestine\n", - " 6220\n", + " Brunei\n", + " 5765\n", " NaN\n", " NaN\n", " \n", " \n", " 48\n", " 48\n", - " Brunei\n", - " 5765\n", + " Bahrain\n", + " 760\n", " NaN\n", " NaN\n", " \n", " \n", " 49\n", " 49\n", - " Bahrain\n", - " 765\n", + " Singapore\n", + " 697\n", " NaN\n", " NaN\n", " \n", " \n", " 50\n", " 50\n", - " Singapore\n", - " 716\n", - " NaN\n", - " NaN\n", - " \n", - " \n", - " 51\n", - " 51\n", " Maldives\n", " 300\n", " NaN\n", " NaN\n", " \n", " \n", - " 52\n", + " 51\n", " NaN\n", " Total\n", - " 44579000\n", + " 44528251\n", " NaN\n", " NaN\n", " \n", @@ -521,16 +513,16 @@ "text/plain": [ " 0 1 2 \\\n", "0 Rank Country Area (km²) \n", - "1 1 Russia* 13100000 \n", + "1 1 Russia* 13000000 \n", "2 2 China 9596961 \n", - "3 3 India 3287263 \n", - "4 4 Kazakhstan* 2455034 \n", + "3 3 India[2] 3287263 \n", + "4 4 Kazakhstan* 2544900 \n", "5 5 Saudi Arabia 2149690 \n", "6 6 Iran 1648195 \n", "7 7 Mongolia 1564110 \n", "8 8 Indonesia* 1472639 \n", - "9 9 Pakistan 796095 \n", - "10 10 Turkey* 747272 \n", + "9 9 Pakistan 881913 \n", + "10 10 Turkey* 759592 \n", "11 11 Myanmar 676578 \n", "12 12 Afghanistan 652230 \n", "13 13 Yemen 527968 \n", @@ -553,39 +545,38 @@ "30 30 North Korea 120538 \n", "31 31 South Korea 100210 \n", "32 32 Jordan 89342 \n", - "33 33 Azerbaijan* 86600 \n", + "33 33 Azerbaijan* 86000 \n", "34 34 United Arab Emirates 83600 \n", "35 35 Georgia* 69000 \n", "36 36 Sri Lanka 65610 \n", - "37 37 Egypt* 60000 \n", - "38 38 Bhutan 38394 \n", - "39 39 Taiwan 36193 \n", - "40 40 Armenia* 29843 \n", - "41 41 Israel 20273 \n", - "42 42 Kuwait 17818 \n", - "43 43 Timor-Leste 14874 \n", - "44 44 Qatar 11586 \n", - "45 45 Lebanon 10452 \n", - "46 46 Cyprus 9251 \n", - "47 47 Palestine 6220 \n", - "48 48 Brunei 5765 \n", - "49 49 Bahrain 765 \n", - "50 50 Singapore 716 \n", - "51 51 Maldives 300 \n", - "52 NaN Total 44579000 \n", + "37 37 Bhutan 38394 \n", + "38 38 Taiwan 36193 \n", + "39 39 Armenia 29843 \n", + "40 40 Israel 22072 \n", + "41 41 Kuwait 17818 \n", + "42 42 Timor-Leste 14874 \n", + "43 43 Qatar 11586 \n", + "44 44 Lebanon 10452 \n", + "45 45 Cyprus 9251 \n", + "46 46 Palestine 6220 \n", + "47 47 Brunei 5765 \n", + "48 48 Bahrain 760 \n", + "49 49 Singapore 697 \n", + "50 50 Maldives 300 \n", + "51 NaN Total 44528251 \n", "\n", " 3 4 \n", "0 Notes NaN \n", - "1 17,125,200 including European part NaN \n", + "1 17,125,191 km² including European Russia[1] NaN \n", "2 excludes Hong Kong, Macau, Taiwan and disputed... NaN \n", "3 NaN NaN \n", - "4 2,724,902 km² including European part NaN \n", + "4 2,724,900 km² including European part NaN \n", "5 NaN NaN \n", "6 NaN NaN \n", "7 NaN NaN \n", "8 1,904,569 km² including Oceanian part NaN \n", - "9 882,363 km² including Gilgit-Baltistan and AJK NaN \n", - "10 783,562 km² including European part NaN \n", + "9 NaN NaN \n", + "10 783,356 km² including East Thrace NaN \n", "11 NaN NaN \n", "12 NaN NaN \n", "13 NaN NaN \n", @@ -608,29 +599,28 @@ "30 NaN NaN \n", "31 NaN NaN \n", "32 NaN NaN \n", - "33 Sometimes considered part of Europe NaN \n", + "33 Located in the Caucasus, between Europe and Asia NaN \n", "34 NaN NaN \n", - "35 Sometimes considered part of Europe NaN \n", + "35 Located in the Caucasus, between Europe and Asia NaN \n", "36 NaN NaN \n", - "37 1,002,450 km² including African part NaN \n", - "38 NaN NaN \n", - "39 excludes Hong Kong, Macau, Mainland China and ... NaN \n", - "40 Sometimes considered part of Europe NaN \n", + "37 NaN NaN \n", + "38 partially recognized state/not a UN member NaN \n", + "39 Located in the Caucasus, between Europe and Asia NaN \n", + "40 NaN NaN \n", "41 NaN NaN \n", "42 NaN NaN \n", "43 NaN NaN \n", "44 NaN NaN \n", "45 NaN NaN \n", - "46 5,896 km² excluding Northern Cyprus. Political... NaN \n", + "46 partially recognized state/non-member observer... NaN \n", "47 NaN NaN \n", "48 NaN NaN \n", "49 NaN NaN \n", "50 NaN NaN \n", - "51 NaN NaN \n", - "52 NaN NaN " + "51 NaN NaN " ] }, - "execution_count": 2, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } @@ -641,30 +631,30 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Extracted 2 wikitables\n" + "Extracted 7 wikitables\n" ] } ], "source": [ "# extract several tables from wikipedia from a single page\n", "from pandas.io.html import read_html\n", - "page = 'https://en.wikipedia.org/wiki/List_of_UFC_events'\n", + "page = 'https://en.wikipedia.org/wiki/New_York_City'\n", "\n", - "wikitables = read_html(page, index_col=0, attrs={\"class\":\"wikitable\"})\n", + "wikitables = read_html(page, index_col=0, attrs={\"class\":\"wikitable\"}, header=None)\n", "\n", "print (\"Extracted {num} wikitables\".format(num=len(wikitables)))" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 2, "metadata": {}, "outputs": [ { @@ -680,78 +670,135 @@ " vertical-align: top;\n", " }\n", "\n", - " .dataframe thead th {\n", - " text-align: right;\n", + " .dataframe thead tr th {\n", + " text-align: left;\n", " }\n", "\n", "\n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", "
1234
0
EventDateVenueLocationRef.
UFC on ESPN 4Jun 29, 2019TBATBA[9]New York City's five boroughsvteNew York City's five boroughsvte
UFC on ESPN+ 11Jun 22, 2019TBATBA[9]JurisdictionJurisdictionPopulationGross Domestic ProductLand areaDensity
UFC 238Jun 8, 2019TBATBA[9]BoroughCountyEstimate (2018)[150]billions(US$)[151]per capita(US$)square milessquarekmpersons / sq. mipersons /km2
UFC on ESPN+ 10Jun 1, 2019TBATBA[9]The BronxBronx143213242.6952920042.10109.043465313231
BrooklynKings258283091.5593460070.82183.423713714649
ManhattanNew York1628701600.24436090022.8359.137203327826
QueensQueens227890693.31039600108.53281.09214608354
Staten IslandRichmond47617914.5143030058.37151.1881123132
\n", "" ], "text/plain": [ - " 1 2 3 4\n", - "0 \n", - "Event Date Venue Location Ref.\n", - "UFC on ESPN 4 Jun 29, 2019 TBA TBA [9]\n", - "UFC on ESPN+ 11 Jun 22, 2019 TBA TBA [9]\n", - "UFC 238 Jun 8, 2019 TBA TBA [9]\n", - "UFC on ESPN+ 10 Jun 1, 2019 TBA TBA [9]" + "New York City's five boroughsvte New York City's five boroughsvte \\\n", + "Jurisdiction Jurisdiction \n", + "Borough County \n", + "The Bronx Bronx \n", + "Brooklyn Kings \n", + "Manhattan New York \n", + "Queens Queens \n", + "Staten Island Richmond \n", + "\n", + "New York City's five boroughsvte \\\n", + "Jurisdiction Population Gross Domestic Product \n", + "Borough Estimate (2018)[150] billions(US$)[151] \n", + "The Bronx 1432132 42.695 \n", + "Brooklyn 2582830 91.559 \n", + "Manhattan 1628701 600.244 \n", + "Queens 2278906 93.310 \n", + "Staten Island 476179 14.514 \n", + "\n", + "New York City's five boroughsvte \\\n", + "Jurisdiction Land area \n", + "Borough per capita(US$) square miles squarekm \n", + "The Bronx 29200 42.10 109.04 \n", + "Brooklyn 34600 70.82 183.42 \n", + "Manhattan 360900 22.83 59.13 \n", + "Queens 39600 108.53 281.09 \n", + "Staten Island 30300 58.37 151.18 \n", + "\n", + "New York City's five boroughsvte \n", + "Jurisdiction Density \n", + "Borough persons / sq. mi persons /km2 \n", + "The Bronx 34653 13231 \n", + "Brooklyn 37137 14649 \n", + "Manhattan 72033 27826 \n", + "Queens 21460 8354 \n", + "Staten Island 8112 3132 " ] }, - "execution_count": 4, + "execution_count": 2, "metadata": {}, "output_type": "execute_result" } @@ -762,16 +809,16 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "(470, 6)" + "(17, 13)" ] }, - "execution_count": 5, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } @@ -782,7 +829,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -798,783 +845,573 @@ " vertical-align: top;\n", " }\n", "\n", - " .dataframe thead th {\n", - " text-align: right;\n", + " .dataframe thead tr th {\n", + " text-align: left;\n", " }\n", "\n", "\n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", "
123456
0vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b]vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b]
MonthJanFebMarAprMayJunJulAugSepOctNovDecYear
#EventDateVenueLocationAttendanceRef.
465UFC Fight Night: Assunção vs. Moraes 2Feb 2, 2019Centro de Formação Olímpica do NordesteFortaleza, Brazil10040[21]
UFC 233Jan 26, 2019Honda CenterAnaheim, California, U.S.Cancelled[22]
464UFC Fight Night: Cejudo vs. DillashawJan 19, 2019Barclays CenterBrooklyn, New York, U.S.12152[23]
463UFC 232: Jones vs. Gustafsson 2Dec 29, 2018The ForumInglewood, California, U.S.15862[24]
462UFC on Fox: Lee vs. Iaquinta 2Dec 15, 2018Fiserv ForumMilwaukee, Wisconsin, U.S.9010[25]
461UFC 231: Holloway vs. OrtegaDec 8, 2018Scotiabank ArenaToronto, Ontario, Canada19039[26]
460UFC Fight Night: dos Santos vs. TuivasaDec 2, 2018Adelaide Entertainment CentreAdelaide, Australia8652[27]
459The Ultimate Fighter: Heavy Hitters FinaleNov 30, 2018Pearl TheatreLas Vegas, Nevada, U.S.2020[28]
458UFC Fight Night: Blaydes vs. Ngannou 2Nov 24, 2018Cadillac ArenaBeijing, China10302[29]
457UFC Fight Night: Magny vs. PonzinibbioNov 17, 2018Estadio Mary Terán de WeissBuenos Aires, Argentina10245[30]
456UFC Fight Night: Korean Zombie vs. RodríguezNov 10, 2018Pepsi CenterDenver, Colorado, U.S.11426[31]
455UFC 230: Cormier vs. LewisNov 3, 2018Madison Square GardenNew York City, New York, U.S.17011[32]
454UFC Fight Night: Volkan vs. SmithOct 27, 2018Avenir CentreMoncton, New Brunswick, Canada6282[33]
453UFC 229: Khabib vs. McGregorOct 6, 2018T-Mobile ArenaLas Vegas, Nevada, U.S.20034[34]
452UFC Fight Night: Santos vs. AndersSep 22, 2018Ginásio do IbirapueraSão Paulo, Brazil9485[35]
451UFC Fight Night: Hunt vs. OleinikSep 15, 2018Olimpiyskiy StadiumMoscow, Russia22603[36]
450UFC 228: Woodley vs. TillSep 8, 2018American Airlines CenterDallas, Texas, U.S.14073[37]
449UFC Fight Night: Gaethje vs. VickAug 25, 2018Pinnacle Bank ArenaLincoln, Nebraska, U.S.6409[38]
448UFC 227: Dillashaw vs. Garbrandt 2Aug 4, 2018Staples CenterLos Angeles, California, U.S.17794[39]
447UFC on Fox: Alvarez vs. Poirier 2Jul 28, 2018Scotiabank SaddledomeCalgary, Alberta, Canada10603[40]
446UFC Fight Night: Shogun vs. SmithJul 22, 2018Barclaycard ArenaHamburg, Germany7798[41]
445UFC Fight Night: dos Santos vs. IvanovJul 14, 2018CenturyLink ArenaBoise, Idaho, U.S.5648[42]
444UFC 226: Miocic vs. CormierJul 7, 2018T-Mobile ArenaLas Vegas, Nevada, U.S.17464[43]
443The Ultimate Fighter: Undefeated FinaleJul 6, 2018Palms Casino ResortLas Vegas, Nevada, U.S.2123[44]
442UFC Fight Night: Cowboy vs. EdwardsJun 23, 2018Singapore Indoor StadiumKallang, Singapore6419[45]
441UFC 225: Whittaker vs. Romero 2Jun 9, 2018United CenterChicago, Illinois, U.S.18117[46]
440UFC Fight Night: Rivera vs. MoraesJun 1, 2018Adirondack Bank CenterUtica, New York, U.S.5063[47]
439UFC Fight Night: Thompson vs. TillMay 27, 2018Echo ArenaLiverpool, England, U.K.8520[48]
438UFC Fight Night: Maia vs. UsmanMay 19, 2018Movistar ArenaSantiago, Chile11082[49]
.....................
030UFC 26: Ultimate Field of DreamsJun 9, 2000Five Seasons Events CenterCedar Rapids, Iowa, U.S.1100[409]
029UFC 25: Ultimate Japan 3Apr 14, 2000Yoyogi National GymnasiumTokyo, JapanNaNNaN
028UFC 24: First DefenseMar 10, 2000Lake Charles Civic CenterLake Charles, Louisiana, U.S.NaNNaN
027UFC 23: Ultimate Japan 2Nov 19, 1999Tokyo Bay NK HallChiba, JapanNaNNaN
026UFC 22: Only One Can be ChampionSep 24, 1999Lake Charles Civic CenterLake Charles, Louisiana, U.S.NaNNaN
025UFC 21: Return of the ChampionsJul 16, 1999Five Seasons Events CenterCedar Rapids, Iowa, U.S.NaNNaN
024UFC 20: Battle for the GoldMay 7, 1999Boutwell Memorial AuditoriumBirmingham, Alabama, U.S.NaNNaN
023UFC 19: Ultimate Young GunsMar 5, 1999Casino Magic Bay St. LouisBay St. Louis, Mississippi, U.S.NaNNaN
022UFC 18: The Road to the Heavyweight TitleJan 8, 1999Pontchartrain CenterNew Orleans, Louisiana, U.S.NaNNaN
021UFC Brazil: Ultimate BrazilOct 16, 1998Ginásio da PortuguesaSão Paulo, BrazilNaNNaN
020UFC 17: RedemptionMay 15, 1998Mobile Civic CenterMobile, Alabama, U.S.NaNNaN
019UFC 16: Battle in the BayouMar 13, 1998Pontchartrain CenterNew Orleans, Louisiana, U.S.4600[410]
018UFC Japan: Ultimate JapanDec 21, 1997Yokohama ArenaYokohama, Japan5000[411]
017UFC 15: Collision CourseOct 17, 1997Casino Magic Bay St. LouisBay St. Louis, Mississippi, U.S.NaNNaN
016UFC 14: ShowdownJul 27, 1997Boutwell Memorial AuditoriumBirmingham, Alabama, U.S.5000[412]
015UFC 13: The Ultimate ForceMay 30, 1997Augusta Civic CenterAugusta, Georgia, U.S.5100[413]
014UFC 12: Judgement DayFeb 7, 1997Dothan Civic CenterDothan, Alabama, U.S.3100[414]
013UFC: The Ultimate Ultimate 2Dec 7, 1996Fair Park ArenaBirmingham, Alabama, U.S.6000[415]
012UFC 11: The Proving GroundSep 20, 1996Augusta Civic CenterAugusta, Georgia, U.S.4500[416]
011UFC 10: The TournamentJul 12, 1996Fair Park ArenaBirmingham, Alabama, U.S.4300[417]
010UFC 9: Motor City MadnessMay 17, 1996Cobo ArenaDetroit, Michigan, U.S.10000[418]
009UFC 8: David vs. GoliathFeb 16, 1996Coliseo Rubén RodríguezBayamón, Puerto Rico13000[419]
008UFC: The Ultimate UltimateDec 16, 1995Mammoth GardensDenver, Colorado, U.S.2800[420]
007UFC 7: The Brawl in BuffaloSep 8, 1995Buffalo Memorial AuditoriumBuffalo, New York, U.S.9000[421]
006UFC 6: Clash of the TitansJul 14, 1995Casper Events CenterCasper, Wyoming, U.S.2700[422]
005UFC 5: The Return of the BeastApr 7, 1995Independence ArenaCharlotte, North Carolina, U.S.6000[423]
004UFC 4: Revenge of the WarriorsDec 16, 1994Expo Square PavilionTulsa, Oklahoma, U.S.5857[424]
003UFC 3: The American DreamSep 9, 1994Grady Cole CenterCharlotte, North Carolina, U.S.NaNNaNRecord high °F (°C)72(22)78(26)86(30)96(36)99(37)101(38)106(41)104(40)102(39)94(34)84(29)75(24)106(41)
Mean maximum °F (°C)59.6(15.3)60.7(15.9)71.5(21.9)83.0(28.3)88.0(31.1)92.3(33.5)95.4(35.2)93.7(34.3)88.5(31.4)78.8(26.0)71.3(21.8)62.2(16.8)97.0(36.1)
Average high °F (°C)38.3(3.5)41.6(5.3)49.7(9.8)61.2(16.2)70.8(21.6)79.3(26.3)84.1(28.9)82.6(28.1)75.2(24.0)63.8(17.7)53.8(12.1)43.0(6.1)62.0(16.7)
Daily mean °F (°C)32.6(0.3)35.3(1.8)42.5(5.8)53.0(11.7)62.4(16.9)71.5(21.9)76.5(24.7)75.2(24.0)68.0(20.0)56.9(13.8)47.7(8.7)37.5(3.1)55.0(12.8)
Average low °F (°C)26.9(−2.8)28.9(−1.7)35.2(1.8)44.8(7.1)54.0(12.2)63.6(17.6)68.8(20.4)67.8(19.9)60.8(16.0)50.0(10.0)41.6(5.3)32.0(0.0)48.0(8.9)
Mean minimum °F (°C)9.2(−12.7)12.8(−10.7)18.5(−7.5)32.3(0.2)43.5(6.4)52.9(11.6)60.3(15.7)58.8(14.9)48.6(9.2)38.0(3.3)27.7(−2.4)15.6(−9.1)7.0(−13.9)
Record low °F (°C)−6(−21)−15(−26)3(−16)12(−11)32(0)44(7)52(11)50(10)39(4)28(−2)5(−15)−13(−25)−15(−26)
Average precipitation inches (mm)3.65(93)3.09(78)4.36(111)4.50(114)4.19(106)4.41(112)4.60(117)4.44(113)4.28(109)4.40(112)4.02(102)4.00(102)49.94(1,268)
Average snowfall inches (cm)7.0(18)9.2(23)3.9(9.9)0.6(1.5)0(0)0(0)0(0)0(0)0(0)0(0)0.3(0.76)4.8(12)25.8(66)
Average precipitation days (≥ 0.01 in)10.49.210.911.511.111.210.49.58.78.99.610.6122.0
Average snowy days (≥ 0.1 in)4.02.81.80.30000000.22.311.4
Average relative humidity (%)61.560.258.555.362.765.264.266.067.865.664.664.163.0
Mean monthly sunshine hours162.7163.1212.5225.6256.6257.3268.2268.2219.3211.2151.0139.02534.7
Percent possible sunshine54555757575759635961514857
002UFC 2: No Way OutMar 11, 1994Mammoth GardensDenver, Colorado, U.S.2000[425]Average ultraviolet index2346788864215
001UFC 1: The BeginningNov 12, 1993McNichols Sports ArenaDenver, Colorado, U.S.7800[426]Source #1: NOAA (relative humidity and sun 1961–1990)[196][210][192][211]Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...
Source #2: Weather Atlas[212] See Geography of New York City for additional climate information from the outer boroughs.Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...
\n", - "

470 rows × 6 columns

\n", "" ], "text/plain": [ - " 1 2 \\\n", - "0 \n", - "# Event Date \n", - "465 UFC Fight Night: Assunção vs. Moraes 2 Feb 2, 2019 \n", - "– UFC 233 Jan 26, 2019 \n", - "464 UFC Fight Night: Cejudo vs. Dillashaw Jan 19, 2019 \n", - "463 UFC 232: Jones vs. Gustafsson 2 Dec 29, 2018 \n", - "462 UFC on Fox: Lee vs. Iaquinta 2 Dec 15, 2018 \n", - "461 UFC 231: Holloway vs. Ortega Dec 8, 2018 \n", - "460 UFC Fight Night: dos Santos vs. Tuivasa Dec 2, 2018 \n", - "459 The Ultimate Fighter: Heavy Hitters Finale Nov 30, 2018 \n", - "458 UFC Fight Night: Blaydes vs. Ngannou 2 Nov 24, 2018 \n", - "457 UFC Fight Night: Magny vs. Ponzinibbio Nov 17, 2018 \n", - "456 UFC Fight Night: Korean Zombie vs. Rodríguez Nov 10, 2018 \n", - "455 UFC 230: Cormier vs. Lewis Nov 3, 2018 \n", - "454 UFC Fight Night: Volkan vs. Smith Oct 27, 2018 \n", - "453 UFC 229: Khabib vs. McGregor Oct 6, 2018 \n", - "452 UFC Fight Night: Santos vs. Anders Sep 22, 2018 \n", - "451 UFC Fight Night: Hunt vs. Oleinik Sep 15, 2018 \n", - "450 UFC 228: Woodley vs. Till Sep 8, 2018 \n", - "449 UFC Fight Night: Gaethje vs. Vick Aug 25, 2018 \n", - "448 UFC 227: Dillashaw vs. Garbrandt 2 Aug 4, 2018 \n", - "447 UFC on Fox: Alvarez vs. Poirier 2 Jul 28, 2018 \n", - "446 UFC Fight Night: Shogun vs. Smith Jul 22, 2018 \n", - "445 UFC Fight Night: dos Santos vs. Ivanov Jul 14, 2018 \n", - "444 UFC 226: Miocic vs. Cormier Jul 7, 2018 \n", - "443 The Ultimate Fighter: Undefeated Finale Jul 6, 2018 \n", - "442 UFC Fight Night: Cowboy vs. Edwards Jun 23, 2018 \n", - "441 UFC 225: Whittaker vs. Romero 2 Jun 9, 2018 \n", - "440 UFC Fight Night: Rivera vs. Moraes Jun 1, 2018 \n", - "439 UFC Fight Night: Thompson vs. Till May 27, 2018 \n", - "438 UFC Fight Night: Maia vs. Usman May 19, 2018 \n", - ".. ... ... \n", - "030 UFC 26: Ultimate Field of Dreams Jun 9, 2000 \n", - "029 UFC 25: Ultimate Japan 3 Apr 14, 2000 \n", - "028 UFC 24: First Defense Mar 10, 2000 \n", - "027 UFC 23: Ultimate Japan 2 Nov 19, 1999 \n", - "026 UFC 22: Only One Can be Champion Sep 24, 1999 \n", - "025 UFC 21: Return of the Champions Jul 16, 1999 \n", - "024 UFC 20: Battle for the Gold May 7, 1999 \n", - "023 UFC 19: Ultimate Young Guns Mar 5, 1999 \n", - "022 UFC 18: The Road to the Heavyweight Title Jan 8, 1999 \n", - "021 UFC Brazil: Ultimate Brazil Oct 16, 1998 \n", - "020 UFC 17: Redemption May 15, 1998 \n", - "019 UFC 16: Battle in the Bayou Mar 13, 1998 \n", - "018 UFC Japan: Ultimate Japan Dec 21, 1997 \n", - "017 UFC 15: Collision Course Oct 17, 1997 \n", - "016 UFC 14: Showdown Jul 27, 1997 \n", - "015 UFC 13: The Ultimate Force May 30, 1997 \n", - "014 UFC 12: Judgement Day Feb 7, 1997 \n", - "013 UFC: The Ultimate Ultimate 2 Dec 7, 1996 \n", - "012 UFC 11: The Proving Ground Sep 20, 1996 \n", - "011 UFC 10: The Tournament Jul 12, 1996 \n", - "010 UFC 9: Motor City Madness May 17, 1996 \n", - "009 UFC 8: David vs. Goliath Feb 16, 1996 \n", - "008 UFC: The Ultimate Ultimate Dec 16, 1995 \n", - "007 UFC 7: The Brawl in Buffalo Sep 8, 1995 \n", - "006 UFC 6: Clash of the Titans Jul 14, 1995 \n", - "005 UFC 5: The Return of the Beast Apr 7, 1995 \n", - "004 UFC 4: Revenge of the Warriors Dec 16, 1994 \n", - "003 UFC 3: The American Dream Sep 9, 1994 \n", - "002 UFC 2: No Way Out Mar 11, 1994 \n", - "001 UFC 1: The Beginning Nov 12, 1993 \n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Jan \n", + "Record high °F (°C) 72(22) \n", + "Mean maximum °F (°C) 59.6(15.3) \n", + "Average high °F (°C) 38.3(3.5) \n", + "Daily mean °F (°C) 32.6(0.3) \n", + "Average low °F (°C) 26.9(−2.8) \n", + "Mean minimum °F (°C) 9.2(−12.7) \n", + "Record low °F (°C) −6(−21) \n", + "Average precipitation inches (mm) 3.65(93) \n", + "Average snowfall inches (cm) 7.0(18) \n", + "Average precipitation days (≥ 0.01 in) 10.4 \n", + "Average snowy days (≥ 0.1 in) 4.0 \n", + "Average relative humidity (%) 61.5 \n", + "Mean monthly sunshine hours 162.7 \n", + "Percent possible sunshine 54 \n", + "Average ultraviolet index 2 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Feb \n", + "Record high °F (°C) 78(26) \n", + "Mean maximum °F (°C) 60.7(15.9) \n", + "Average high °F (°C) 41.6(5.3) \n", + "Daily mean °F (°C) 35.3(1.8) \n", + "Average low °F (°C) 28.9(−1.7) \n", + "Mean minimum °F (°C) 12.8(−10.7) \n", + "Record low °F (°C) −15(−26) \n", + "Average precipitation inches (mm) 3.09(78) \n", + "Average snowfall inches (cm) 9.2(23) \n", + "Average precipitation days (≥ 0.01 in) 9.2 \n", + "Average snowy days (≥ 0.1 in) 2.8 \n", + "Average relative humidity (%) 60.2 \n", + "Mean monthly sunshine hours 163.1 \n", + "Percent possible sunshine 55 \n", + "Average ultraviolet index 3 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Mar \n", + "Record high °F (°C) 86(30) \n", + "Mean maximum °F (°C) 71.5(21.9) \n", + "Average high °F (°C) 49.7(9.8) \n", + "Daily mean °F (°C) 42.5(5.8) \n", + "Average low °F (°C) 35.2(1.8) \n", + "Mean minimum °F (°C) 18.5(−7.5) \n", + "Record low °F (°C) 3(−16) \n", + "Average precipitation inches (mm) 4.36(111) \n", + "Average snowfall inches (cm) 3.9(9.9) \n", + "Average precipitation days (≥ 0.01 in) 10.9 \n", + "Average snowy days (≥ 0.1 in) 1.8 \n", + "Average relative humidity (%) 58.5 \n", + "Mean monthly sunshine hours 212.5 \n", + "Percent possible sunshine 57 \n", + "Average ultraviolet index 4 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Apr \n", + "Record high °F (°C) 96(36) \n", + "Mean maximum °F (°C) 83.0(28.3) \n", + "Average high °F (°C) 61.2(16.2) \n", + "Daily mean °F (°C) 53.0(11.7) \n", + "Average low °F (°C) 44.8(7.1) \n", + "Mean minimum °F (°C) 32.3(0.2) \n", + "Record low °F (°C) 12(−11) \n", + "Average precipitation inches (mm) 4.50(114) \n", + "Average snowfall inches (cm) 0.6(1.5) \n", + "Average precipitation days (≥ 0.01 in) 11.5 \n", + "Average snowy days (≥ 0.1 in) 0.3 \n", + "Average relative humidity (%) 55.3 \n", + "Mean monthly sunshine hours 225.6 \n", + "Percent possible sunshine 57 \n", + "Average ultraviolet index 6 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", "\n", - " 3 \\\n", - "0 \n", - "# Venue \n", - "465 Centro de Formação Olímpica do Nordeste \n", - "– Honda Center \n", - "464 Barclays Center \n", - "463 The Forum \n", - "462 Fiserv Forum \n", - "461 Scotiabank Arena \n", - "460 Adelaide Entertainment Centre \n", - "459 Pearl Theatre \n", - "458 Cadillac Arena \n", - "457 Estadio Mary Terán de Weiss \n", - "456 Pepsi Center \n", - "455 Madison Square Garden \n", - "454 Avenir Centre \n", - "453 T-Mobile Arena \n", - "452 Ginásio do Ibirapuera \n", - "451 Olimpiyskiy Stadium \n", - "450 American Airlines Center \n", - "449 Pinnacle Bank Arena \n", - "448 Staples Center \n", - "447 Scotiabank Saddledome \n", - "446 Barclaycard Arena \n", - "445 CenturyLink Arena \n", - "444 T-Mobile Arena \n", - "443 Palms Casino Resort \n", - "442 Singapore Indoor Stadium \n", - "441 United Center \n", - "440 Adirondack Bank Center \n", - "439 Echo Arena \n", - "438 Movistar Arena \n", - ".. ... \n", - "030 Five Seasons Events Center \n", - "029 Yoyogi National Gymnasium \n", - "028 Lake Charles Civic Center \n", - "027 Tokyo Bay NK Hall \n", - "026 Lake Charles Civic Center \n", - "025 Five Seasons Events Center \n", - "024 Boutwell Memorial Auditorium \n", - "023 Casino Magic Bay St. Louis \n", - "022 Pontchartrain Center \n", - "021 Ginásio da Portuguesa \n", - "020 Mobile Civic Center \n", - "019 Pontchartrain Center \n", - "018 Yokohama Arena \n", - "017 Casino Magic Bay St. Louis \n", - "016 Boutwell Memorial Auditorium \n", - "015 Augusta Civic Center \n", - "014 Dothan Civic Center \n", - "013 Fair Park Arena \n", - "012 Augusta Civic Center \n", - "011 Fair Park Arena \n", - "010 Cobo Arena \n", - "009 Coliseo Rubén Rodríguez \n", - "008 Mammoth Gardens \n", - "007 Buffalo Memorial Auditorium \n", - "006 Casper Events Center \n", - "005 Independence Arena \n", - "004 Expo Square Pavilion \n", - "003 Grady Cole Center \n", - "002 Mammoth Gardens \n", - "001 McNichols Sports Arena \n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month May \n", + "Record high °F (°C) 99(37) \n", + "Mean maximum °F (°C) 88.0(31.1) \n", + "Average high °F (°C) 70.8(21.6) \n", + "Daily mean °F (°C) 62.4(16.9) \n", + "Average low °F (°C) 54.0(12.2) \n", + "Mean minimum °F (°C) 43.5(6.4) \n", + "Record low °F (°C) 32(0) \n", + "Average precipitation inches (mm) 4.19(106) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 11.1 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 62.7 \n", + "Mean monthly sunshine hours 256.6 \n", + "Percent possible sunshine 57 \n", + "Average ultraviolet index 7 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", "\n", - " 4 5 6 \n", - "0 \n", - "# Location Attendance Ref. \n", - "465 Fortaleza, Brazil 10040 [21] \n", - "– Anaheim, California, U.S. Cancelled [22] \n", - "464 Brooklyn, New York, U.S. 12152 [23] \n", - "463 Inglewood, California, U.S. 15862 [24] \n", - "462 Milwaukee, Wisconsin, U.S. 9010 [25] \n", - "461 Toronto, Ontario, Canada 19039 [26] \n", - "460 Adelaide, Australia 8652 [27] \n", - "459 Las Vegas, Nevada, U.S. 2020 [28] \n", - "458 Beijing, China 10302 [29] \n", - "457 Buenos Aires, Argentina 10245 [30] \n", - "456 Denver, Colorado, U.S. 11426 [31] \n", - "455 New York City, New York, U.S. 17011 [32] \n", - "454 Moncton, New Brunswick, Canada 6282 [33] \n", - "453 Las Vegas, Nevada, U.S. 20034 [34] \n", - "452 São Paulo, Brazil 9485 [35] \n", - "451 Moscow, Russia 22603 [36] \n", - "450 Dallas, Texas, U.S. 14073 [37] \n", - "449 Lincoln, Nebraska, U.S. 6409 [38] \n", - "448 Los Angeles, California, U.S. 17794 [39] \n", - "447 Calgary, Alberta, Canada 10603 [40] \n", - "446 Hamburg, Germany 7798 [41] \n", - "445 Boise, Idaho, U.S. 5648 [42] \n", - "444 Las Vegas, Nevada, U.S. 17464 [43] \n", - "443 Las Vegas, Nevada, U.S. 2123 [44] \n", - "442 Kallang, Singapore 6419 [45] \n", - "441 Chicago, Illinois, U.S. 18117 [46] \n", - "440 Utica, New York, U.S. 5063 [47] \n", - "439 Liverpool, England, U.K. 8520 [48] \n", - "438 Santiago, Chile 11082 [49] \n", - ".. ... ... ... \n", - "030 Cedar Rapids, Iowa, U.S. 1100 [409] \n", - "029 Tokyo, Japan NaN NaN \n", - "028 Lake Charles, Louisiana, U.S. NaN NaN \n", - "027 Chiba, Japan NaN NaN \n", - "026 Lake Charles, Louisiana, U.S. NaN NaN \n", - "025 Cedar Rapids, Iowa, U.S. NaN NaN \n", - "024 Birmingham, Alabama, U.S. NaN NaN \n", - "023 Bay St. Louis, Mississippi, U.S. NaN NaN \n", - "022 New Orleans, Louisiana, U.S. NaN NaN \n", - "021 São Paulo, Brazil NaN NaN \n", - "020 Mobile, Alabama, U.S. NaN NaN \n", - "019 New Orleans, Louisiana, U.S. 4600 [410] \n", - "018 Yokohama, Japan 5000 [411] \n", - "017 Bay St. Louis, Mississippi, U.S. NaN NaN \n", - "016 Birmingham, Alabama, U.S. 5000 [412] \n", - "015 Augusta, Georgia, U.S. 5100 [413] \n", - "014 Dothan, Alabama, U.S. 3100 [414] \n", - "013 Birmingham, Alabama, U.S. 6000 [415] \n", - "012 Augusta, Georgia, U.S. 4500 [416] \n", - "011 Birmingham, Alabama, U.S. 4300 [417] \n", - "010 Detroit, Michigan, U.S. 10000 [418] \n", - "009 Bayamón, Puerto Rico 13000 [419] \n", - "008 Denver, Colorado, U.S. 2800 [420] \n", - "007 Buffalo, New York, U.S. 9000 [421] \n", - "006 Casper, Wyoming, U.S. 2700 [422] \n", - "005 Charlotte, North Carolina, U.S. 6000 [423] \n", - "004 Tulsa, Oklahoma, U.S. 5857 [424] \n", - "003 Charlotte, North Carolina, U.S. NaN NaN \n", - "002 Denver, Colorado, U.S. 2000 [425] \n", - "001 Denver, Colorado, U.S. 7800 [426] \n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Jun \n", + "Record high °F (°C) 101(38) \n", + "Mean maximum °F (°C) 92.3(33.5) \n", + "Average high °F (°C) 79.3(26.3) \n", + "Daily mean °F (°C) 71.5(21.9) \n", + "Average low °F (°C) 63.6(17.6) \n", + "Mean minimum °F (°C) 52.9(11.6) \n", + "Record low °F (°C) 44(7) \n", + "Average precipitation inches (mm) 4.41(112) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 11.2 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 65.2 \n", + "Mean monthly sunshine hours 257.3 \n", + "Percent possible sunshine 57 \n", + "Average ultraviolet index 8 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", "\n", - "[470 rows x 6 columns]" + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Jul \n", + "Record high °F (°C) 106(41) \n", + "Mean maximum °F (°C) 95.4(35.2) \n", + "Average high °F (°C) 84.1(28.9) \n", + "Daily mean °F (°C) 76.5(24.7) \n", + "Average low °F (°C) 68.8(20.4) \n", + "Mean minimum °F (°C) 60.3(15.7) \n", + "Record low °F (°C) 52(11) \n", + "Average precipitation inches (mm) 4.60(117) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 10.4 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 64.2 \n", + "Mean monthly sunshine hours 268.2 \n", + "Percent possible sunshine 59 \n", + "Average ultraviolet index 8 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Aug \n", + "Record high °F (°C) 104(40) \n", + "Mean maximum °F (°C) 93.7(34.3) \n", + "Average high °F (°C) 82.6(28.1) \n", + "Daily mean °F (°C) 75.2(24.0) \n", + "Average low °F (°C) 67.8(19.9) \n", + "Mean minimum °F (°C) 58.8(14.9) \n", + "Record low °F (°C) 50(10) \n", + "Average precipitation inches (mm) 4.44(113) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 9.5 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 66.0 \n", + "Mean monthly sunshine hours 268.2 \n", + "Percent possible sunshine 63 \n", + "Average ultraviolet index 8 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Sep \n", + "Record high °F (°C) 102(39) \n", + "Mean maximum °F (°C) 88.5(31.4) \n", + "Average high °F (°C) 75.2(24.0) \n", + "Daily mean °F (°C) 68.0(20.0) \n", + "Average low °F (°C) 60.8(16.0) \n", + "Mean minimum °F (°C) 48.6(9.2) \n", + "Record low °F (°C) 39(4) \n", + "Average precipitation inches (mm) 4.28(109) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 8.7 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 67.8 \n", + "Mean monthly sunshine hours 219.3 \n", + "Percent possible sunshine 59 \n", + "Average ultraviolet index 6 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Oct \n", + "Record high °F (°C) 94(34) \n", + "Mean maximum °F (°C) 78.8(26.0) \n", + "Average high °F (°C) 63.8(17.7) \n", + "Daily mean °F (°C) 56.9(13.8) \n", + "Average low °F (°C) 50.0(10.0) \n", + "Mean minimum °F (°C) 38.0(3.3) \n", + "Record low °F (°C) 28(−2) \n", + "Average precipitation inches (mm) 4.40(112) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 8.9 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 65.6 \n", + "Mean monthly sunshine hours 211.2 \n", + "Percent possible sunshine 61 \n", + "Average ultraviolet index 4 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Nov \n", + "Record high °F (°C) 84(29) \n", + "Mean maximum °F (°C) 71.3(21.8) \n", + "Average high °F (°C) 53.8(12.1) \n", + "Daily mean °F (°C) 47.7(8.7) \n", + "Average low °F (°C) 41.6(5.3) \n", + "Mean minimum °F (°C) 27.7(−2.4) \n", + "Record low °F (°C) 5(−15) \n", + "Average precipitation inches (mm) 4.02(102) \n", + "Average snowfall inches (cm) 0.3(0.76) \n", + "Average precipitation days (≥ 0.01 in) 9.6 \n", + "Average snowy days (≥ 0.1 in) 0.2 \n", + "Average relative humidity (%) 64.6 \n", + "Mean monthly sunshine hours 151.0 \n", + "Percent possible sunshine 51 \n", + "Average ultraviolet index 2 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Dec \n", + "Record high °F (°C) 75(24) \n", + "Mean maximum °F (°C) 62.2(16.8) \n", + "Average high °F (°C) 43.0(6.1) \n", + "Daily mean °F (°C) 37.5(3.1) \n", + "Average low °F (°C) 32.0(0.0) \n", + "Mean minimum °F (°C) 15.6(−9.1) \n", + "Record low °F (°C) −13(−25) \n", + "Average precipitation inches (mm) 4.00(102) \n", + "Average snowfall inches (cm) 4.8(12) \n", + "Average precipitation days (≥ 0.01 in) 10.6 \n", + "Average snowy days (≥ 0.1 in) 2.3 \n", + "Average relative humidity (%) 64.1 \n", + "Mean monthly sunshine hours 139.0 \n", + "Percent possible sunshine 48 \n", + "Average ultraviolet index 1 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \n", + "Month Year \n", + "Record high °F (°C) 106(41) \n", + "Mean maximum °F (°C) 97.0(36.1) \n", + "Average high °F (°C) 62.0(16.7) \n", + "Daily mean °F (°C) 55.0(12.8) \n", + "Average low °F (°C) 48.0(8.9) \n", + "Mean minimum °F (°C) 7.0(−13.9) \n", + "Record low °F (°C) −15(−26) \n", + "Average precipitation inches (mm) 49.94(1,268) \n", + "Average snowfall inches (cm) 25.8(66) \n", + "Average precipitation days (≥ 0.01 in) 122.0 \n", + "Average snowy days (≥ 0.1 in) 11.4 \n", + "Average relative humidity (%) 63.0 \n", + "Mean monthly sunshine hours 2534.7 \n", + "Percent possible sunshine 57 \n", + "Average ultraviolet index 5 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... " ] }, - "execution_count": 6, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -1585,7 +1422,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -1601,78 +1438,135 @@ " vertical-align: top;\n", " }\n", "\n", - " .dataframe thead th {\n", - " text-align: right;\n", + " .dataframe thead tr th {\n", + " text-align: left;\n", " }\n", "\n", "\n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", "
1234
0
EventDateVenueLocationRef.New York City's five boroughsvteNew York City's five boroughsvte
UFC on ESPN 4Jun 29, 2019TBATBA[9]JurisdictionJurisdictionPopulationGross Domestic ProductLand areaDensity
UFC on ESPN+ 11Jun 22, 2019TBATBA[9]
UFC 238Jun 8, 2019TBATBA[9]BoroughCountyEstimate (2018)[150]billions(US$)[151]per capita(US$)square milessquarekmpersons / sq. mipersons /km2
UFC on ESPN+ 10Jun 1, 2019TBATBA[9]The BronxBronx143213242.6952920042.10109.043465313231
BrooklynKings258283091.5593460070.82183.423713714649
ManhattanNew York1628701600.24436090022.8359.137203327826
QueensQueens227890693.31039600108.53281.09214608354
Staten IslandRichmond47617914.5143030058.37151.1881123132
\n", "" ], "text/plain": [ - " 1 2 3 4\n", - "0 \n", - "Event Date Venue Location Ref.\n", - "UFC on ESPN 4 Jun 29, 2019 TBA TBA [9]\n", - "UFC on ESPN+ 11 Jun 22, 2019 TBA TBA [9]\n", - "UFC 238 Jun 8, 2019 TBA TBA [9]\n", - "UFC on ESPN+ 10 Jun 1, 2019 TBA TBA [9]" + "New York City's five boroughsvte New York City's five boroughsvte \\\n", + "Jurisdiction Jurisdiction \n", + "Borough County \n", + "The Bronx Bronx \n", + "Brooklyn Kings \n", + "Manhattan New York \n", + "Queens Queens \n", + "Staten Island Richmond \n", + "\n", + "New York City's five boroughsvte \\\n", + "Jurisdiction Population Gross Domestic Product \n", + "Borough Estimate (2018)[150] billions(US$)[151] \n", + "The Bronx 1432132 42.695 \n", + "Brooklyn 2582830 91.559 \n", + "Manhattan 1628701 600.244 \n", + "Queens 2278906 93.310 \n", + "Staten Island 476179 14.514 \n", + "\n", + "New York City's five boroughsvte \\\n", + "Jurisdiction Land area \n", + "Borough per capita(US$) square miles squarekm \n", + "The Bronx 29200 42.10 109.04 \n", + "Brooklyn 34600 70.82 183.42 \n", + "Manhattan 360900 22.83 59.13 \n", + "Queens 39600 108.53 281.09 \n", + "Staten Island 30300 58.37 151.18 \n", + "\n", + "New York City's five boroughsvte \n", + "Jurisdiction Density \n", + "Borough persons / sq. mi persons /km2 \n", + "The Bronx 34653 13231 \n", + "Brooklyn 37137 14649 \n", + "Manhattan 72033 27826 \n", + "Queens 21460 8354 \n", + "Staten Island 8112 3132 " ] }, - "execution_count": 7, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } @@ -2661,7 +2555,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.9" } }, "nbformat": 4, diff --git a/notebooks/What_is_the_usage_of_*_asterisk_in_Python.ipynb b/notebooks/What_is_the_usage_of_*_asterisk_in_Python.ipynb index 6d0cb36..e7c1c50 100644 --- a/notebooks/What_is_the_usage_of_*_asterisk_in_Python.ipynb +++ b/notebooks/What_is_the_usage_of_*_asterisk_in_Python.ipynb @@ -67,10 +67,10 @@ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", - "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m 2 *** 2\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" - ] + ], + "output_type": "error" } ], "source": [ diff --git a/notebooks/csv/data.csv.zip b/notebooks/csv/data.csv.zip new file mode 100644 index 0000000..1acfcc2 Binary files /dev/null and b/notebooks/csv/data.csv.zip differ diff --git a/notebooks/csv/data_201901.csv b/notebooks/csv/data_201901.csv new file mode 100644 index 0000000..ec916f4 --- /dev/null +++ b/notebooks/csv/data_201901.csv @@ -0,0 +1,3 @@ +col1,col2,col3 +A,B,1 +AA,BB,2 \ No newline at end of file diff --git a/notebooks/csv/data_201902.csv b/notebooks/csv/data_201902.csv new file mode 100644 index 0000000..223cfe2 --- /dev/null +++ b/notebooks/csv/data_201902.csv @@ -0,0 +1,3 @@ +col1,col2,col3 +C,D,3 +CC,DD,4 \ No newline at end of file diff --git a/notebooks/csv/data_202001.csv b/notebooks/csv/data_202001.csv new file mode 100644 index 0000000..52bdb1d --- /dev/null +++ b/notebooks/csv/data_202001.csv @@ -0,0 +1,3 @@ +col1,col2,col3,col4 +E,F,5,e5 +EE,FF,6,ee6 \ No newline at end of file diff --git a/notebooks/csv/data_202002.csv b/notebooks/csv/data_202002.csv new file mode 100644 index 0000000..56194e0 --- /dev/null +++ b/notebooks/csv/data_202002.csv @@ -0,0 +1,3 @@ +col1,col2,col3,col5 +H,J,7,77 +HH,JJ,8,88 \ No newline at end of file diff --git a/notebooks/csv/excel/example.xlsx b/notebooks/csv/excel/example.xlsx new file mode 100644 index 0000000..d58d686 Binary files /dev/null and b/notebooks/csv/excel/example.xlsx differ diff --git a/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb b/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb new file mode 100644 index 0000000..48f8734 --- /dev/null +++ b/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb @@ -0,0 +1,1372 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 20. Pandas - value_counts - multiple columns, all columns and bad data" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df_movie = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df_resp = pd.read_csv(\"../csv/other_text_responses.csv\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Pandas apply value_counts on multiple columns at once" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',\n", + " 'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',\n", + " 'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',\n", + " 'movie_title', 'num_voted_users', 'cast_total_facebook_likes',\n", + " 'actor_3_name', 'facenumber_in_poster', 'plot_keywords',\n", + " 'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',\n", + " 'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',\n", + " 'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],\n", + " dtype='object')" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_movie.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0.00NaNNaNNaNNaN907.089.0NaN26.0NaNNaN...NaNNaNNaNNaNNaNNaN55.0NaNNaN2181.0
1.00NaNNaN43.0NaNNaNNaNNaNNaNNaNNaN...51.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
1.18NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaN1.0NaN
1.20NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaN1.0NaN
1.33NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaN68.0NaN
\n", + "

5 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0.00 NaN NaN NaN NaN \n", + "1.00 NaN NaN 43.0 NaN \n", + "1.18 NaN NaN NaN NaN \n", + "1.20 NaN NaN NaN NaN \n", + "1.33 NaN NaN NaN NaN \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0.00 907.0 89.0 NaN \n", + "1.00 NaN NaN NaN \n", + "1.18 NaN NaN NaN \n", + "1.20 NaN NaN NaN \n", + "1.33 NaN NaN NaN \n", + "\n", + " actor_1_facebook_likes gross genres ... num_user_for_reviews \\\n", + "0.00 26.0 NaN NaN ... NaN \n", + "1.00 NaN NaN NaN ... 51.0 \n", + "1.18 NaN NaN NaN ... NaN \n", + "1.20 NaN NaN NaN ... NaN \n", + "1.33 NaN NaN NaN ... NaN \n", + "\n", + " language country content_rating budget title_year \\\n", + "0.00 NaN NaN NaN NaN NaN \n", + "1.00 NaN NaN NaN NaN NaN \n", + "1.18 NaN NaN NaN NaN NaN \n", + "1.20 NaN NaN NaN NaN NaN \n", + "1.33 NaN NaN NaN NaN NaN \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \n", + "0.00 55.0 NaN NaN 2181.0 \n", + "1.00 NaN NaN NaN NaN \n", + "1.18 NaN NaN 1.0 NaN \n", + "1.20 NaN NaN 1.0 NaN \n", + "1.33 NaN NaN 68.0 NaN \n", + "\n", + "[5 rows x 28 columns]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_movie.apply(pd.Series.value_counts).head()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(37410, 28)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_movie.apply(pd.Series.value_counts).shape" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colorcontent_rating
Black and White209.0NaN
ApprovedNaN55.0
Color4815.0NaN
GNaN112.0
GPNaN6.0
MNaN5.0
NC-17NaN7.0
Not RatedNaN116.0
PGNaN701.0
PG-13NaN1461.0
PassedNaN9.0
RNaN2118.0
TV-14NaN30.0
TV-GNaN10.0
TV-MANaN20.0
TV-PGNaN13.0
TV-YNaN1.0
TV-Y7NaN1.0
UnratedNaN62.0
XNaN13.0
\n", + "
" + ], + "text/plain": [ + " color content_rating\n", + " Black and White 209.0 NaN\n", + "Approved NaN 55.0\n", + "Color 4815.0 NaN\n", + "G NaN 112.0\n", + "GP NaN 6.0\n", + "M NaN 5.0\n", + "NC-17 NaN 7.0\n", + "Not Rated NaN 116.0\n", + "PG NaN 701.0\n", + "PG-13 NaN 1461.0\n", + "Passed NaN 9.0\n", + "R NaN 2118.0\n", + "TV-14 NaN 30.0\n", + "TV-G NaN 10.0\n", + "TV-MA NaN 20.0\n", + "TV-PG NaN 13.0\n", + "TV-Y NaN 1.0\n", + "TV-Y7 NaN 1.0\n", + "Unrated NaN 62.0\n", + "X NaN 13.0" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_movie[['color', 'content_rating']].apply(pd.Series.value_counts)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Pandas apply value_counts on all columns" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q12_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "LinkedIn 39\n", + "Medium 38\n", + "Linkedin 16\n", + "Coursera 16\n", + "Books 14\n", + "medium 11\n", + "Facebook 11\n", + "linkedin 9\n", + "books 8\n", + "Data Science Central 7\n", + "Name: Q12_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q13_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "mlcourse.ai 35\n", + "NPTEL 23\n", + "Youtube 22\n", + "Simplilearn 18\n", + "Pluralsight 17\n", + "Stepik 14\n", + "Data Science Academy 12\n", + "youtube 12\n", + "Springboard 11\n", + "Books 10\n", + "Name: Q13_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Python 89\n", + "python 45\n", + "None 36\n", + "Matlab 28\n", + "none 22\n", + "R 13\n", + "SQL 13\n", + "MATLAB 11\n", + "matlab 11\n", + "Python 9\n", + "Name: Q14_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_Part_1_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Excel 865\n", + "Microsoft Excel 392\n", + "excel 263\n", + "MS Excel 67\n", + "Google Sheets 61\n", + "Google sheets 44\n", + "Microsoft excel 38\n", + "Excel 33\n", + "microsoft excel 27\n", + "EXCEL 25\n", + "Name: Q14_Part_1_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_Part_2_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "SAS 129\n", + "SPSS 116\n", + "R 60\n", + "spss 34\n", + "Spss 25\n", + "Python 21\n", + "Sas 18\n", + "Stata 15\n", + "python 14\n", + "R, Python 11\n", + "Name: Q14_Part_2_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_Part_3_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Tableau 260\n", + "Power BI 71\n", + "tableau 51\n", + "PowerBI 23\n", + "Salesforce 19\n", + "Tableau 16\n", + "Qlik 10\n", + "Power Bi 9\n", + "Spotfire 8\n", + "SAP 6\n", + "Name: Q14_Part_3_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_Part_4_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "JupyterLab 943\n", + "Jupyter 602\n", + "RStudio 516\n", + "Python 301\n", + "Jupyter Notebook 275\n", + "Rstudio 225\n", + "Jupyterlab 184\n", + "jupyter 183\n", + "Jupyter notebook 170\n", + "python 163\n", + "Name: Q14_Part_4_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_Part_5_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "AWS 203\n", + "GCP 134\n", + "Azure 87\n", + "aws 40\n", + "Aws 26\n", + "Google Colab 25\n", + "Databricks 17\n", + "gcp 15\n", + "Gcp 12\n", + "Colab 11\n", + "Name: Q14_Part_5_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q16_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Eclipse 47\n", + "IntelliJ 17\n", + "Intellij 14\n", + "eclipse 13\n", + "SAS 11\n", + "Google Colab 11\n", + "IntelliJ IDEA 10\n", + "Colab 9\n", + "Anaconda 9\n", + "Xcode 8\n", + "Name: Q16_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q17_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Databricks 43\n", + "databricks 12\n", + "Github 12\n", + "Anaconda 8\n", + "Domino Data Lab 5\n", + "Zeppelin 5\n", + "Domino 4\n", + "Jupyter Notebook 4\n", + "Anaconda 3\n", + "Jupyter 3\n", + "Name: Q17_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q18_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "C# 198\n", + "Scala 100\n", + "SAS 79\n", + "Julia 46\n", + "PHP 43\n", + "VBA 30\n", + "Ruby 27\n", + "c# 27\n", + "Go 27\n", + "Swift 20\n", + "Name: Q18_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q19_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Julia 22\n", + "Scala 9\n", + "C# 6\n", + "SAS 4\n", + "Swift 4\n", + "Octave 4\n", + "scala 2\n", + "Rust 2\n", + "mathematica 2\n", + "julia 2\n", + "Name: Q19_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q20_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Tableau 31\n", + "Excel 13\n", + "MATLAB 12\n", + "Power BI 10\n", + "Pandas 8\n", + "tableau 6\n", + "pandas 5\n", + "PowerBI 5\n", + "Dash 4\n", + "Spotfire 4\n", + "Name: Q20_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q21_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "FPGA 7\n", + "Laptop 4\n", + "Fpga 2\n", + "Planning to use GPUs 1\n", + "spark databricks on AWS 1\n", + "Edge neurocomputing chips like Intel's NCS. 1\n", + "but i wana to use gpu 1\n", + "Intel NCS2 1\n", + "Parallel comp with MPI 1\n", + "paperspace uses GPU's I believe... I prefer them figuring that part out 1\n", + "Name: Q21_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q24_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "SVM 32\n", + "Support Vector Machines 11\n", + "KNN 7\n", + "Clustering 6\n", + "svm 4\n", + "Support Vector Machine 4\n", + "SVM, KNN 4\n", + "SVMs 4\n", + "Support vector machine 3\n", + "KMeans 2\n", + "Name: Q24_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q25_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "DataRobot 12\n", + "catalyst 7\n", + "Catalyst 6\n", + "Microsoft ML 2\n", + "Datarobot 2\n", + "sklearn 2\n", + "fastai 2\n", + "Weka 1\n", + "Stepwise regression 1\n", + "SAS, ScykitLearn 1\n", + "Name: Q25_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q26_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Handwriting Recognize tools 1\n", + "Custom Built Tools 1\n", + "everytime daily constantly, never stoping, never ceasing to fail with measurable risk and damage to never stop knowing the perfect limit of our capacity and perfection. Cancerous attitude towards ourselves but victorious for our comformists. 1\n", + "Anomaly detection on videos 1\n", + "SSD-Keras 1\n", + "Fast.ai 1\n", + "Wavenet 1\n", + "openCV 1\n", + "GIS 1\n", + "text processing 1\n", + "Name: Q26_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q27_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "SpaCy 2\n", + "Text mining by R and Python libraries only 1\n", + "Retrofitting 1\n", + "OWL 1\n", + "Flair 1\n", + "Making your own 1\n", + "Stopwords, Lemmatization, TF-IDF, BoW 1\n", + "fastai 1\n", + "svm 1\n", + "FastAI 1\n", + "Name: Q27_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q28_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Catboost 24\n", + "CatBoost 14\n", + "catboost 13\n", + "h2o 11\n", + "H2O 10\n", + "MATLAB 9\n", + "Chainer 6\n", + "MXNet 4\n", + "Catalyst 4\n", + "Caffe 4\n", + "Name: Q28_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q29_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Digital Ocean 5\n", + "DigitalOcean 4\n", + "Tencent Cloud 3\n", + "DataRobot 2\n", + "Google Colab 2\n", + "Private cloud 2\n", + "paperspace 2\n", + "SAS Cloud 2\n", + "Databricks 2\n", + "OVH 2\n", + "Name: Q29_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q2_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "non-binary 4\n", + "Attack Helicopter 2\n", + "bionicle 2\n", + "T-rex shaped meteor made out of cheese 1\n", + "Pharoah 1\n", + "queer 1\n", + "Lvl 129 Dust Devil 1\n", + "What is your gender? - Prefer to self-describe - Text 1\n", + "genderfluid helicopter 1\n", + "none 1\n", + "Name: Q2_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q30_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "IBM Cloud 7\n", + "Databricks 6\n", + "AWS EMR 4\n", + "AWS SageMaker 4\n", + "AWS S3 3\n", + "Azure Databricks 3\n", + "IBM Watson 3\n", + "Inhouse 2\n", + "AWS Fargate 2\n", + "OpenShift 2\n", + "Name: Q30_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q31_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Snowflake 13\n", + "SAS 6\n", + "Spark 4\n", + "Cloudera 4\n", + "Hadoop 4\n", + "DataRobot 4\n", + "IBM Cloud Pak for Data 3\n", + "Splunk 3\n", + "Apache Spark 3\n", + "Tableau 2\n", + "Name: Q31_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q32_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "DataRobot 15\n", + "Knime 11\n", + "KNIME 8\n", + "IBM Watson Studio 7\n", + "Alteryx 4\n", + "MATLAB 4\n", + "IBM Cloud 3\n", + "H2O 3\n", + "Datarobot 3\n", + "IBM Watson 3\n", + "Name: Q32_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q33_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "IBM AutoAI 6\n", + "Prevision.io 4\n", + "H2O AutoML 3\n", + "H20 AutoML 2\n", + "H2O.ai AutoML 2\n", + "SAS 2\n", + "prevision.io 2\n", + "Which automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Other - Text 1\n", + "Watson ML 1\n", + "custom 1\n", + "Name: Q33_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q34_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Snowflake 18\n", + "DB2 17\n", + "MongoDB 15\n", + "Teradata 12\n", + "IBM DB2 7\n", + "Mongo 6\n", + "SAP HANA 6\n", + "MariaDB 5\n", + "SAS 5\n", + "Hadoop 4\n", + "Name: Q34_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q5_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Professor 43\n", + "Machine Learning Engineer 32\n", + "Teacher 19\n", + "Consultant 19\n", + "Lecturer 14\n", + "CTO 13\n", + "CEO 13\n", + "Engineer 12\n", + "Mechanical Engineer 11\n", + "Solution Architect 11\n", + "Name: Q5_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q9_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "i'm professor 1\n", + "I am a Technical Project Manager and I involve with customer product owner and my teams to analyse and suggest business decisions. 1\n", + "Architecture 1\n", + "Model methodology development 1\n", + "Produce data driven research 1\n", + "human-centered data science research 1\n", + "\"> 1\n", + "Analyze business systems and processes; recommend solutions. 1\n", + "Support a product that provides analytics & ML libraries 1\n", + "Conceptualize workflows and design experiments 1\n", + "Name: Q9_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# List top values per question\n", + "for col in df_resp.columns:\n", + " print('-' * 40 + col + '-' * 40 , end=' - ')\n", + " display(df_resp[col].value_counts().head(10))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TypeError: unhashable type: 'dict'\n", + "df = pd.DataFrame({'a':[1,2,3], 'b':[{'c':1}, {'d':3}, {'c':5, 'd':6}], 'c':[[1],[2],[3]]})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Pandas apply value_counts on column with bad data" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Power BI', 'PowerBI', 'Power Bi']" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import difflib \n", + "difflib.get_close_matches('Power BI', ['Power BI', 'tableau', 'PowerBI', 'Power Bi','Salesforce', 'Tableau ', 'Qlik', 'Power bi'], n=3, cutoff=0.6)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "import difflib \n", + "\n", + "correct_values = {}\n", + "words = df_resp.Q14_Part_3_TEXT.value_counts(ascending=True).index\n", + "\n", + "for keyword in words:\n", + " similar = difflib.get_close_matches(keyword, words, n=20, cutoff=0.6)\n", + " for x in similar:\n", + " correct_values[x] = keyword\n", + " \n", + "df_resp[\"corr\"] = df_resp[\"Q14_Part_3_TEXT\"].map(correct_values)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Tableau 345\n", + "Power BI 137\n", + "Salesforce 43\n", + "Qlik 27\n", + "Spotfire 17\n", + " ... \n", + "tableau which is very fast and easy to analyse 1\n", + "We use Tableau to analyse through histograms,bargraphs and many more tools in Tableau 1\n", + "Izenda, Excel, XtraReports 1\n", + "ssrs 1\n", + "XLcubed 1\n", + "Name: corr, Length: 179, dtype: int64" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_resp[\"corr\"].value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Tableau 260\n", + "Power BI 71\n", + "tableau 51\n", + "PowerBI 23\n", + "Salesforce 19\n", + " ... \n", + "Domo 1\n", + "MySQL Client, Tableau 1\n", + "Datastudio 1\n", + "Abinitio 1\n", + "XLcubed 1\n", + "Name: Q14_Part_3_TEXT, Length: 339, dtype: int64" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_resp[\"Q14_Part_3_TEXT\"].value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/pandas/21. pandas-dataframe-sampling-rows-or-columns.ipynb b/notebooks/pandas/21. pandas-dataframe-sampling-rows-or-columns.ipynb new file mode 100644 index 0000000..35e9f2b --- /dev/null +++ b/notebooks/pandas/21. pandas-dataframe-sampling-rows-or-columns.ipynb @@ -0,0 +1,3681 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 21. Pandas - Random Sample of a subset of a dataframe" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Random sampling of rows, columns from DataFrame with sample()" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
2525ColorJoel Schumacher106.098.0541.071.0David Murray214.01569918.0Biography|Crime|Drama|Thriller...113.0EnglishIrelandR17000000.02003.096.06.92.350
\n", + "

1 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "2525 Color Joel Schumacher 106.0 98.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "2525 541.0 71.0 David Murray \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "2525 214.0 1569918.0 Biography|Crime|Drama|Thriller ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "2525 113.0 English Ireland R 17000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "2525 2003.0 96.0 6.9 2.35 \n", + "\n", + " movie_facebook_likes \n", + "2525 0 \n", + "\n", + "[1 rows x 28 columns]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Default behavior of sample()\n", + "df.sample()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
535ColorDennis Dugan179.0102.0221.04000.0Adam Sandler12000.0162001186.0Comedy...311.0EnglishUSAPG-1380000000.02010.011000.06.01.8512000
2987ColorFred Schepisi61.0109.040.0794.0Ray Winstone5000.02326407.0Drama...99.0EnglishUKR12000000.02001.01000.07.02.35305
1475ColorDavid Koepp248.091.0192.0346.0Dania Ramirez23000.020275446.0Action|Crime|Thriller...178.0EnglishUSAPG-1335000000.02012.01000.06.52.3520000
\n", + "

3 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "535 Color Dennis Dugan 179.0 102.0 \n", + "2987 Color Fred Schepisi 61.0 109.0 \n", + "1475 Color David Koepp 248.0 91.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "535 221.0 4000.0 Adam Sandler \n", + "2987 40.0 794.0 Ray Winstone \n", + "1475 192.0 346.0 Dania Ramirez \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "535 12000.0 162001186.0 Comedy ... \n", + "2987 5000.0 2326407.0 Drama ... \n", + "1475 23000.0 20275446.0 Action|Crime|Thriller ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "535 311.0 English USA PG-13 80000000.0 \n", + "2987 99.0 English UK R 12000000.0 \n", + "1475 178.0 English USA PG-13 35000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "535 2010.0 11000.0 6.0 1.85 \n", + "2987 2001.0 1000.0 7.0 2.35 \n", + "1475 2012.0 1000.0 6.5 2.35 \n", + "\n", + " movie_facebook_likes \n", + "535 12000 \n", + "2987 305 \n", + "1475 20000 \n", + "\n", + "[3 rows x 28 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# return n rows\n", + "df.sample(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colornum_critic_for_reviewstitle_year
0Color723.02009.0
1Color302.02007.0
2Color602.02015.0
3Color813.02012.0
4NaNNaNNaN
\n", + "
" + ], + "text/plain": [ + " color num_critic_for_reviews title_year\n", + "0 Color 723.0 2009.0\n", + "1 Color 302.0 2007.0\n", + "2 Color 602.0 2015.0\n", + "3 Color 813.0 2012.0\n", + "4 NaN NaN NaN" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# columns\n", + "df.sample(3, axis=1).head()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
2389ColorTerry Gilliam156.0118.00.0551.0Michael Jeter40000.010562387.0Adventure|Comedy|Drama...648.0EnglishUSAR18500000.01998.0693.07.72.3515000
4233Black and WhiteTay Garnett7.0119.010.0275.0Greer Garson509.0NaNDrama...29.0EnglishUSAPassed2160000.01945.0284.07.51.3768
4737ColorGreg Harrison46.086.07.017.0Ari Gold328.01114943.0Drama|Music...74.0EnglishUSAR500000.02000.027.06.51.850
3717ColorMike Flanagan336.0104.059.0202.0Rory Cochrane972.027689474.0Horror|Mystery...339.0EnglishUSAR5000000.02013.0407.06.52.3523000
1854ColorGarry Marshall200.0113.00.0307.0Common22000.054540525.0Comedy|Romance...134.0EnglishUSAPG-1356000000.02011.0988.05.71.8520000
\n", + "

5 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "2389 Color Terry Gilliam 156.0 118.0 \n", + "4233 Black and White Tay Garnett 7.0 119.0 \n", + "4737 Color Greg Harrison 46.0 86.0 \n", + "3717 Color Mike Flanagan 336.0 104.0 \n", + "1854 Color Garry Marshall 200.0 113.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "2389 0.0 551.0 Michael Jeter \n", + "4233 10.0 275.0 Greer Garson \n", + "4737 7.0 17.0 Ari Gold \n", + "3717 59.0 202.0 Rory Cochrane \n", + "1854 0.0 307.0 Common \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "2389 40000.0 10562387.0 Adventure|Comedy|Drama ... \n", + "4233 509.0 NaN Drama ... \n", + "4737 328.0 1114943.0 Drama|Music ... \n", + "3717 972.0 27689474.0 Horror|Mystery ... \n", + "1854 22000.0 54540525.0 Comedy|Romance ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "2389 648.0 English USA R 18500000.0 \n", + "4233 29.0 English USA Passed 2160000.0 \n", + "4737 74.0 English USA R 500000.0 \n", + "3717 339.0 English USA R 5000000.0 \n", + "1854 134.0 English USA PG-13 56000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "2389 1998.0 693.0 7.7 2.35 \n", + "4233 1945.0 284.0 7.5 1.37 \n", + "4737 2000.0 27.0 6.5 1.85 \n", + "3717 2013.0 407.0 6.5 2.35 \n", + "1854 2011.0 988.0 5.7 1.85 \n", + "\n", + " movie_facebook_likes \n", + "2389 15000 \n", + "4233 68 \n", + "4737 0 \n", + "3717 23000 \n", + "1854 20000 \n", + "\n", + "[5 rows x 28 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# The fraction of rows and columns: frac\n", + "df.sample(frac=0.001)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
4232ColorHayley Cloake11.090.00.0306.0Kayla Ewell676.0NaNThriller...9.0EnglishUSAR2200000.02008.0399.04.3NaN77
4877ColorTom Seidman4.098.03.0104.0Derek Brandon337.0NaNDrama|Family...10.0EnglishUSAPG250000.02010.0168.06.2NaN0
3399ColorMike Leigh248.0129.0608.0386.0Imelda Staunton1000.03205244.0Comedy|Drama...141.0EnglishUKPG-1310000000.02010.0579.07.32.350
\n", + "

3 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "4232 Color Hayley Cloake 11.0 90.0 \n", + "4877 Color Tom Seidman 4.0 98.0 \n", + "3399 Color Mike Leigh 248.0 129.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "4232 0.0 306.0 Kayla Ewell \n", + "4877 3.0 104.0 Derek Brandon \n", + "3399 608.0 386.0 Imelda Staunton \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "4232 676.0 NaN Thriller ... \n", + "4877 337.0 NaN Drama|Family ... \n", + "3399 1000.0 3205244.0 Comedy|Drama ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "4232 9.0 English USA R 2200000.0 \n", + "4877 10.0 English USA PG 250000.0 \n", + "3399 141.0 English UK PG-13 10000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "4232 2008.0 399.0 4.3 NaN \n", + "4877 2010.0 168.0 6.2 NaN \n", + "3399 2010.0 579.0 7.3 2.35 \n", + "\n", + " movie_facebook_likes \n", + "4232 77 \n", + "4877 0 \n", + "3399 0 \n", + "\n", + "[3 rows x 28 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# sample with seed\n", + "df.sample(n=3, random_state=5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. p.random.choice" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
893ColorGabriele Muccino202.0123.0125.0835.0Rosario Dawson10000.069951824.0Drama|Romance...599.0EnglishUSAPG-1355000000.02008.03000.07.72.3526000
818ColorBrian Levant57.090.032.0809.0Mark Addy1000.035231365.0Comedy|Family|Romance|Sci-Fi...85.0EnglishUSAPG60000000.02000.0891.03.61.85500
460ColorSimon West139.0123.0165.0744.0Monica Potter12000.0101087161.0Action|Crime|Thriller...339.0EnglishUSAR75000000.01997.0878.06.82.350
772ColorClint Eastwood306.0134.016000.0204.0Morgan Freeman13000.037479778.0Biography|Drama|History|Sport...259.0EnglishUSAPG-1360000000.02009.011000.07.42.3523000
269ColorLen Wiseman354.0129.0235.0297.0Jonathan Sadowski13000.0134520804.0Action|Adventure|Thriller...782.0EnglishUSAPG-13110000000.02007.0300.07.22.350
\n", + "

5 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "893 Color Gabriele Muccino 202.0 123.0 \n", + "818 Color Brian Levant 57.0 90.0 \n", + "460 Color Simon West 139.0 123.0 \n", + "772 Color Clint Eastwood 306.0 134.0 \n", + "269 Color Len Wiseman 354.0 129.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "893 125.0 835.0 Rosario Dawson \n", + "818 32.0 809.0 Mark Addy \n", + "460 165.0 744.0 Monica Potter \n", + "772 16000.0 204.0 Morgan Freeman \n", + "269 235.0 297.0 Jonathan Sadowski \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "893 10000.0 69951824.0 Drama|Romance ... \n", + "818 1000.0 35231365.0 Comedy|Family|Romance|Sci-Fi ... \n", + "460 12000.0 101087161.0 Action|Crime|Thriller ... \n", + "772 13000.0 37479778.0 Biography|Drama|History|Sport ... \n", + "269 13000.0 134520804.0 Action|Adventure|Thriller ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "893 599.0 English USA PG-13 55000000.0 \n", + "818 85.0 English USA PG 60000000.0 \n", + "460 339.0 English USA R 75000000.0 \n", + "772 259.0 English USA PG-13 60000000.0 \n", + "269 782.0 English USA PG-13 110000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "893 2008.0 3000.0 7.7 2.35 \n", + "818 2000.0 891.0 3.6 1.85 \n", + "460 1997.0 878.0 6.8 2.35 \n", + "772 2009.0 11000.0 7.4 2.35 \n", + "269 2007.0 300.0 7.2 2.35 \n", + "\n", + " movie_facebook_likes \n", + "893 26000 \n", + "818 500 \n", + "460 0 \n", + "772 23000 \n", + "269 0 \n", + "\n", + "[5 rows x 28 columns]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# nu,py choice for DataFrame sampling\n", + "import numpy as np\n", + "chosen_idx = np.random.choice(1000, replace=False, size=5)\n", + "df.iloc[chosen_idx]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Random sample of rows based on column values" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Color - " + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
2693ColorAlbert Brooks97.097.0745.0745.0Bradley Whitford12000.011614236.0Comedy...140.0EnglishUSAPG-1315000000.01999.0821.05.61.37251
1613ColorAlexander Payne217.0125.0729.0322.0June Squibb442.065010106.0Comedy|Drama...612.0EnglishUSAR30000000.02002.0344.07.21.850
698ColorLawrence Kasdan40.0212.0759.0812.0Catherine O'Hara2000.025052000.0Adventure|Biography|Crime|Drama|Western...145.0EnglishUSAPG-1363000000.01994.0925.06.62.350
\n", + "

3 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "2693 Color Albert Brooks 97.0 97.0 \n", + "1613 Color Alexander Payne 217.0 125.0 \n", + "698 Color Lawrence Kasdan 40.0 212.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "2693 745.0 745.0 Bradley Whitford \n", + "1613 729.0 322.0 June Squibb \n", + "698 759.0 812.0 Catherine O'Hara \n", + "\n", + " actor_1_facebook_likes gross \\\n", + "2693 12000.0 11614236.0 \n", + "1613 442.0 65010106.0 \n", + "698 2000.0 25052000.0 \n", + "\n", + " genres ... num_user_for_reviews \\\n", + "2693 Comedy ... 140.0 \n", + "1613 Comedy|Drama ... 612.0 \n", + "698 Adventure|Biography|Crime|Drama|Western ... 145.0 \n", + "\n", + " language country content_rating budget title_year \\\n", + "2693 English USA PG-13 15000000.0 1999.0 \n", + "1613 English USA R 30000000.0 2002.0 \n", + "698 English USA PG-13 63000000.0 1994.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \n", + "2693 821.0 5.6 1.37 251 \n", + "1613 344.0 7.2 1.85 0 \n", + "698 925.0 6.6 2.35 0 \n", + "\n", + "[3 rows x 28 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " Black and White - " + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
4786Black and WhiteLloyd Bacon65.089.024.045.0Dick Powell610.02300000.0Comedy|Musical|Romance...97.0EnglishUSAUnrated439000.01933.0105.07.71.37439
3983Black and WhiteJohn Schlesinger88.0113.0154.077.0Barnard Hughes183.0NaNDrama...334.0EnglishUSAX3600000.01969.089.07.91.850
479Black and WhiteNaN31.025.0NaN474.0Agnes Moorehead1000.0NaNComedy|Family|Fantasy...71.0EnglishUSATV-GNaNNaN960.07.64.000
\n", + "

3 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "4786 Black and White Lloyd Bacon 65.0 89.0 \n", + "3983 Black and White John Schlesinger 88.0 113.0 \n", + "479 Black and White NaN 31.0 25.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "4786 24.0 45.0 Dick Powell \n", + "3983 154.0 77.0 Barnard Hughes \n", + "479 NaN 474.0 Agnes Moorehead \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "4786 610.0 2300000.0 Comedy|Musical|Romance ... \n", + "3983 183.0 NaN Drama ... \n", + "479 1000.0 NaN Comedy|Family|Fantasy ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "4786 97.0 English USA Unrated 439000.0 \n", + "3983 334.0 English USA X 3600000.0 \n", + "479 71.0 English USA TV-G NaN \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "4786 1933.0 105.0 7.7 1.37 \n", + "3983 1969.0 89.0 7.9 1.85 \n", + "479 NaN 960.0 7.6 4.00 \n", + "\n", + " movie_facebook_likes \n", + "4786 439 \n", + "3983 0 \n", + "479 0 \n", + "\n", + "[3 rows x 28 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# conditional DataFrame sampling few values - separate display \n", + "col = 'color'\n", + "for typ in list(df[col].dropna().unique()):\n", + " print(typ, end=' - ')\n", + " display(df[df[col] == typ].sample(3))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['PG-13', 'PG', 'G', 'R', 'TV-14', 'TV-PG', 'TV-MA', 'TV-G', 'Not Rated', 'Unrated', 'Approved', 'TV-Y', 'NC-17', 'X', 'TV-Y7', 'GP', 'Passed', 'M']\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
3758ColorBrian Dannelly121.092.012.0797.0Patrick Fugit3000.08786715.0Comedy|Drama...324.0EnglishUSAPG-135000000.02004.0835.06.91.850
64ColorAndrew Adamson284.0150.080.082.0Kiran Shah1000.0291709845.0Adventure|Family|Fantasy...1463.0EnglishUSAPG180000000.02005.0190.06.92.350
4725ColorJoe Camp5.086.024.0142.0Peter Breck407.039552600.0Adventure|Family|Romance...36.0EnglishUSAG500000.01974.0189.06.11.85816
2955ColorFranklin J. Schaffner55.0125.076.0801.0Anne Meara1000.0NaNDrama|Thriller...123.0EnglishUKR12000000.01978.0837.07.01.850
3509ColorNaN11.060.0NaN652.0Ashley Scott10000.0NaNAction|Drama|Mystery|Sci-Fi...160.0EnglishUSATV-14NaNNaN794.07.41.330
4803ColorNaN11.022.0NaN6.0Ron Lynch59.0NaNAnimation|Comedy|Drama...82.0EnglishUSATV-PGNaNNaN11.08.21.33526
826ColorNaN46.030.0NaN479.0Kristin Davis962.0NaNComedy|Romance...238.0EnglishUSATV-MANaNNaN722.07.01.330
3880ColorKenny Ortega57.098.0197.0578.0Corbin Bleu755.0NaNComedy|Drama|Family|Music|Musical|Romance...726.0EnglishUSATV-G4200000.02006.0632.05.21.330
4328Black and WhiteOrson Welles90.092.00.018.0Everett Sloane1000.07927.0Crime|Drama|Film-Noir|Mystery|Thriller...175.0EnglishUSANot Rated2300000.01947.029.07.71.370
4997ColorDavid Gordon Green75.090.0234.015.0Eddie Rouse552.0241816.0Drama...76.0EnglishUSAUnrated42000.02000.061.07.52.35451
4497Black and WhiteWalter Lang7.083.09.051.0Nigel Bruce94.0NaNDrama|Family|Fantasy...25.0EnglishUSAApprovedNaN1940.062.06.51.37548
1265ColorNaN3.030.0NaN12.0Melissa Altro51.0NaNAnimation|Comedy|Family...43.0EnglishCanadaTV-YNaNNaN21.07.41.33301
5025ColorJohn Waters73.0108.00.0105.0Mink Stole462.0180483.0Comedy|Crime|Horror...183.0EnglishUSANC-1710000.01972.0143.06.11.370
3559ColorBrian De Palma121.0104.00.0517.0David Margulies754.031899000.0Mystery|Romance|Thriller...201.0EnglishUSAX6500000.01980.0567.07.12.350
1972ColorNaN7.030.0NaN265.0Jennifer Hale971.0NaNAction|Animation|Comedy|Family|Fantasy|Sci-Fi...60.0EnglishUSATV-Y7NaNNaN918.07.24.00581
4529ColorDouglas Trumbull87.089.0136.042.0Ron Rifkin844.0NaNDrama|Sci-Fi...199.0EnglishUSAGP1000000.01972.0184.06.71.850
4812Black and WhiteHarry Beaumont36.0100.04.04.0Bessie Love77.02808000.0Musical|Romance...71.0EnglishUSAPassed379000.01929.028.06.31.37167
3584ColorGeorge Roy Hill130.0110.0131.0399.0Ted Cassidy640.0102308900.0Biography|Crime|Drama|Western...309.0EnglishUSAM6000000.01969.0566.08.12.350
\n", + "

18 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews \\\n", + "3758 Color Brian Dannelly 121.0 \n", + "64 Color Andrew Adamson 284.0 \n", + "4725 Color Joe Camp 5.0 \n", + "2955 Color Franklin J. Schaffner 55.0 \n", + "3509 Color NaN 11.0 \n", + "4803 Color NaN 11.0 \n", + "826 Color NaN 46.0 \n", + "3880 Color Kenny Ortega 57.0 \n", + "4328 Black and White Orson Welles 90.0 \n", + "4997 Color David Gordon Green 75.0 \n", + "4497 Black and White Walter Lang 7.0 \n", + "1265 Color NaN 3.0 \n", + "5025 Color John Waters 73.0 \n", + "3559 Color Brian De Palma 121.0 \n", + "1972 Color NaN 7.0 \n", + "4529 Color Douglas Trumbull 87.0 \n", + "4812 Black and White Harry Beaumont 36.0 \n", + "3584 Color George Roy Hill 130.0 \n", + "\n", + " duration director_facebook_likes actor_3_facebook_likes \\\n", + "3758 92.0 12.0 797.0 \n", + "64 150.0 80.0 82.0 \n", + "4725 86.0 24.0 142.0 \n", + "2955 125.0 76.0 801.0 \n", + "3509 60.0 NaN 652.0 \n", + "4803 22.0 NaN 6.0 \n", + "826 30.0 NaN 479.0 \n", + "3880 98.0 197.0 578.0 \n", + "4328 92.0 0.0 18.0 \n", + "4997 90.0 234.0 15.0 \n", + "4497 83.0 9.0 51.0 \n", + "1265 30.0 NaN 12.0 \n", + "5025 108.0 0.0 105.0 \n", + "3559 104.0 0.0 517.0 \n", + "1972 30.0 NaN 265.0 \n", + "4529 89.0 136.0 42.0 \n", + "4812 100.0 4.0 4.0 \n", + "3584 110.0 131.0 399.0 \n", + "\n", + " actor_2_name actor_1_facebook_likes gross \\\n", + "3758 Patrick Fugit 3000.0 8786715.0 \n", + "64 Kiran Shah 1000.0 291709845.0 \n", + "4725 Peter Breck 407.0 39552600.0 \n", + "2955 Anne Meara 1000.0 NaN \n", + "3509 Ashley Scott 10000.0 NaN \n", + "4803 Ron Lynch 59.0 NaN \n", + "826 Kristin Davis 962.0 NaN \n", + "3880 Corbin Bleu 755.0 NaN \n", + "4328 Everett Sloane 1000.0 7927.0 \n", + "4997 Eddie Rouse 552.0 241816.0 \n", + "4497 Nigel Bruce 94.0 NaN \n", + "1265 Melissa Altro 51.0 NaN \n", + "5025 Mink Stole 462.0 180483.0 \n", + "3559 David Margulies 754.0 31899000.0 \n", + "1972 Jennifer Hale 971.0 NaN \n", + "4529 Ron Rifkin 844.0 NaN \n", + "4812 Bessie Love 77.0 2808000.0 \n", + "3584 Ted Cassidy 640.0 102308900.0 \n", + "\n", + " genres ... num_user_for_reviews \\\n", + "3758 Comedy|Drama ... 324.0 \n", + "64 Adventure|Family|Fantasy ... 1463.0 \n", + "4725 Adventure|Family|Romance ... 36.0 \n", + "2955 Drama|Thriller ... 123.0 \n", + "3509 Action|Drama|Mystery|Sci-Fi ... 160.0 \n", + "4803 Animation|Comedy|Drama ... 82.0 \n", + "826 Comedy|Romance ... 238.0 \n", + "3880 Comedy|Drama|Family|Music|Musical|Romance ... 726.0 \n", + "4328 Crime|Drama|Film-Noir|Mystery|Thriller ... 175.0 \n", + "4997 Drama ... 76.0 \n", + "4497 Drama|Family|Fantasy ... 25.0 \n", + "1265 Animation|Comedy|Family ... 43.0 \n", + "5025 Comedy|Crime|Horror ... 183.0 \n", + "3559 Mystery|Romance|Thriller ... 201.0 \n", + "1972 Action|Animation|Comedy|Family|Fantasy|Sci-Fi ... 60.0 \n", + "4529 Drama|Sci-Fi ... 199.0 \n", + "4812 Musical|Romance ... 71.0 \n", + "3584 Biography|Crime|Drama|Western ... 309.0 \n", + "\n", + " language country content_rating budget title_year \\\n", + "3758 English USA PG-13 5000000.0 2004.0 \n", + "64 English USA PG 180000000.0 2005.0 \n", + "4725 English USA G 500000.0 1974.0 \n", + "2955 English UK R 12000000.0 1978.0 \n", + "3509 English USA TV-14 NaN NaN \n", + "4803 English USA TV-PG NaN NaN \n", + "826 English USA TV-MA NaN NaN \n", + "3880 English USA TV-G 4200000.0 2006.0 \n", + "4328 English USA Not Rated 2300000.0 1947.0 \n", + "4997 English USA Unrated 42000.0 2000.0 \n", + "4497 English USA Approved NaN 1940.0 \n", + "1265 English Canada TV-Y NaN NaN \n", + "5025 English USA NC-17 10000.0 1972.0 \n", + "3559 English USA X 6500000.0 1980.0 \n", + "1972 English USA TV-Y7 NaN NaN \n", + "4529 English USA GP 1000000.0 1972.0 \n", + "4812 English USA Passed 379000.0 1929.0 \n", + "3584 English USA M 6000000.0 1969.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \n", + "3758 835.0 6.9 1.85 0 \n", + "64 190.0 6.9 2.35 0 \n", + "4725 189.0 6.1 1.85 816 \n", + "2955 837.0 7.0 1.85 0 \n", + "3509 794.0 7.4 1.33 0 \n", + "4803 11.0 8.2 1.33 526 \n", + "826 722.0 7.0 1.33 0 \n", + "3880 632.0 5.2 1.33 0 \n", + "4328 29.0 7.7 1.37 0 \n", + "4997 61.0 7.5 2.35 451 \n", + "4497 62.0 6.5 1.37 548 \n", + "1265 21.0 7.4 1.33 301 \n", + "5025 143.0 6.1 1.37 0 \n", + "3559 567.0 7.1 2.35 0 \n", + "1972 918.0 7.2 4.00 581 \n", + "4529 184.0 6.7 1.85 0 \n", + "4812 28.0 6.3 1.37 167 \n", + "3584 566.0 8.1 2.35 0 \n", + "\n", + "[18 rows x 28 columns]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# conditional DataFrame sampling many values - grouped display \n", + "col = 'content_rating'\n", + "sample = []\n", + "\n", + "variants = list(df[col].dropna().unique())\n", + "print(variants)\n", + "\n", + "for typ in variants:\n", + " sample.append(df[df[col] == typ].sample())\n", + "pd.concat(sample)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Dataframe sampling with numpy and weights" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
2425Black and WhiteMartin Scorsese151.0121.017000.0356.0Cathy Moriarty22000.045250.0Biography|Drama|Sport...EnglishUSAR18000000.01980.0394.08.31.8501.0
1125Black and WhiteOliver Stone83.0212.00.0805.0Bob Hoskins12000.013560960.0Biography|Drama|History...EnglishUSAR50000000.01995.05000.07.12.359151.0
3539NaNRichard Rich2.045.024.029.0Kate Higgins122.0NaNAction|Adventure|Animation|Comedy|Drama|Family......NaNUSANaN7000000.02014.035.06.0NaN411.0
2944Black and WhiteMartin Campbell400.0144.0258.0834.0Tobias Menzies6000.0167007184.0Action|Adventure|Thriller...EnglishUKPG-13150000000.02006.01000.08.02.3501.0
4359Black and WhiteStanley Kubrick192.095.00.0277.0Slim Pickens654.0NaNComedy...EnglishUSAPG1800000.01964.0575.08.51.66180001.0
\n", + "

5 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "2425 Black and White Martin Scorsese 151.0 121.0 \n", + "1125 Black and White Oliver Stone 83.0 212.0 \n", + "3539 NaN Richard Rich 2.0 45.0 \n", + "2944 Black and White Martin Campbell 400.0 144.0 \n", + "4359 Black and White Stanley Kubrick 192.0 95.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "2425 17000.0 356.0 Cathy Moriarty \n", + "1125 0.0 805.0 Bob Hoskins \n", + "3539 24.0 29.0 Kate Higgins \n", + "2944 258.0 834.0 Tobias Menzies \n", + "4359 0.0 277.0 Slim Pickens \n", + "\n", + " actor_1_facebook_likes gross \\\n", + "2425 22000.0 45250.0 \n", + "1125 12000.0 13560960.0 \n", + "3539 122.0 NaN \n", + "2944 6000.0 167007184.0 \n", + "4359 654.0 NaN \n", + "\n", + " genres ... language country \\\n", + "2425 Biography|Drama|Sport ... English USA \n", + "1125 Biography|Drama|History ... English USA \n", + "3539 Action|Adventure|Animation|Comedy|Drama|Family... ... NaN USA \n", + "2944 Action|Adventure|Thriller ... English UK \n", + "4359 Comedy ... English USA \n", + "\n", + " content_rating budget title_year actor_2_facebook_likes \\\n", + "2425 R 18000000.0 1980.0 394.0 \n", + "1125 R 50000000.0 1995.0 5000.0 \n", + "3539 NaN 7000000.0 2014.0 35.0 \n", + "2944 PG-13 150000000.0 2006.0 1000.0 \n", + "4359 PG 1800000.0 1964.0 575.0 \n", + "\n", + " imdb_score aspect_ratio movie_facebook_likes weights \n", + "2425 8.3 1.85 0 1.0 \n", + "1125 7.1 2.35 915 1.0 \n", + "3539 6.0 NaN 41 1.0 \n", + "2944 8.0 2.35 0 1.0 \n", + "4359 8.5 1.66 18000 1.0 \n", + "\n", + "[5 rows x 29 columns]" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# excluding 'Color' values by applying weights 0 - Color and 1 - rest\n", + "df['weights'] = np.where(df['color'] == 'Color', .0, 1)\n", + "df.sample(frac=.001, weights='weights')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
3083ColorFloyd Mutrux5.099.011.0327.0Kelli Williams665.0125169.0Comedy|Drama...EnglishUSAR10500000.01994.0446.06.4NaN1191.0
3435ColorGeorge Tillman Jr.34.0115.088.0890.0Mekhi Phifer1000.043490057.0Comedy|Drama...EnglishUSAR7500000.01997.01000.06.91.855081.0
3940ColorRenny Harlin68.0102.0212.0195.0Lane Smith10000.0354704.0Crime|Drama|Horror|Thriller...EnglishUSAR1300000.01987.0633.05.91.853141.0
5006ColorDamir CaticNaN89.02.00.0Ron Gelner5.0NaNHorror...EnglishUSANot Rated60000.02013.00.05.4NaN481.0
1872ColorMichael Hoffman85.0118.097.0437.0Gerald McRaney775.026761283.0Drama|Romance...EnglishUSAPG-1326000000.02014.0523.06.72.35190001.0
\n", + "

5 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "3083 Color Floyd Mutrux 5.0 99.0 \n", + "3435 Color George Tillman Jr. 34.0 115.0 \n", + "3940 Color Renny Harlin 68.0 102.0 \n", + "5006 Color Damir Catic NaN 89.0 \n", + "1872 Color Michael Hoffman 85.0 118.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "3083 11.0 327.0 Kelli Williams \n", + "3435 88.0 890.0 Mekhi Phifer \n", + "3940 212.0 195.0 Lane Smith \n", + "5006 2.0 0.0 Ron Gelner \n", + "1872 97.0 437.0 Gerald McRaney \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "3083 665.0 125169.0 Comedy|Drama ... \n", + "3435 1000.0 43490057.0 Comedy|Drama ... \n", + "3940 10000.0 354704.0 Crime|Drama|Horror|Thriller ... \n", + "5006 5.0 NaN Horror ... \n", + "1872 775.0 26761283.0 Drama|Romance ... \n", + "\n", + " language country content_rating budget title_year \\\n", + "3083 English USA R 10500000.0 1994.0 \n", + "3435 English USA R 7500000.0 1997.0 \n", + "3940 English USA R 1300000.0 1987.0 \n", + "5006 English USA Not Rated 60000.0 2013.0 \n", + "1872 English USA PG-13 26000000.0 2014.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \\\n", + "3083 446.0 6.4 NaN 119 \n", + "3435 1000.0 6.9 1.85 508 \n", + "3940 633.0 5.9 1.85 314 \n", + "5006 0.0 5.4 NaN 48 \n", + "1872 523.0 6.7 2.35 19000 \n", + "\n", + " weights \n", + "3083 1.0 \n", + "3435 1.0 \n", + "3940 1.0 \n", + "5006 1.0 \n", + "1872 1.0 \n", + "\n", + "[5 rows x 29 columns]" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Including only 'Color' values by applying weights 1 - Color and 0 - rest\n", + "df['weights'] = np.where(df['color'] == 'Color', 1, 0.0)\n", + "df.sample(frac=.001, weights='weights')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 1, 0, 1, 1, 1, 1, 1, 1, 0])" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.where(df['color'] == 'Color', 1, 0)[270:280]" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['Color', 'Color', ' Black and White', 'Color', 'Color', 'Color', 'Color', 'Color', 'Color', nan]\n" + ] + } + ], + "source": [ + "print(list(df['color'][270:280]))" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2153 PG\n", + "2238 PG-13\n", + "836 PG-13\n", + "4725 G\n", + "3205 PG\n", + "3388 PG-13\n", + "152 PG-13\n", + "2574 PG-13\n", + "2724 PG-13\n", + "1854 PG-13\n", + "Name: content_rating, dtype: object" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Including/excluding list of values - with equal probability\n", + "df['weights'] = np.where(df['content_rating'].isin(['PG-13', 'PG', 'G']), 1, 0)\n", + "df.sample(frac=.002, weights='weights')['content_rating']" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PG-13 1461\n", + "PG 701\n", + "G 112\n", + "Name: content_rating, dtype: int64" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['content_rating'].isin(['PG-13', 'PG', 'G'])]['content_rating'].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Pandas sample rows by group" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
color
Black and White2734Black and WhiteFritz Lang260.0145.0756.018.0Gustav Fröhlich136.026435.0Drama|Sci-Fi...GermanGermanyNot Rated6000000.01927.023.08.31.33120000
4741Black and WhiteMorgan J. Freeman17.086.0204.0474.0Heather Matarazzo659.0334041.0Crime|Drama|Romance...EnglishUSAR500000.01997.0529.06.51.85510
1979Black and WhiteNeil Jordan44.0133.0277.08000.0Liam Neeson25000.011030963.0Biography|Drama|Thriller|War...EnglishUKR28000000.01996.014000.07.11.8500
Color3436ColorStanley Tong62.089.07.036.0Anita Mui186.032333860.0Action|Comedy...CantoneseHong KongR7500000.01995.0147.06.72.3500
734ColorOliver Stone171.0156.00.01000.0Dennis Quaid14000.075530832.0Drama|Sport...EnglishUSAR55000000.01999.02000.06.82.3500
2336ColorAndrew Jarecki140.0101.046.0902.0Kirsten Dunst33000.0578382.0Crime|Drama|Mystery|Romance|Thriller...EnglishUSARNaN2010.04000.06.31.8500
\n", + "

6 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name \\\n", + "color \n", + " Black and White 2734 Black and White Fritz Lang \n", + " 4741 Black and White Morgan J. Freeman \n", + " 1979 Black and White Neil Jordan \n", + "Color 3436 Color Stanley Tong \n", + " 734 Color Oliver Stone \n", + " 2336 Color Andrew Jarecki \n", + "\n", + " num_critic_for_reviews duration \\\n", + "color \n", + " Black and White 2734 260.0 145.0 \n", + " 4741 17.0 86.0 \n", + " 1979 44.0 133.0 \n", + "Color 3436 62.0 89.0 \n", + " 734 171.0 156.0 \n", + " 2336 140.0 101.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes \\\n", + "color \n", + " Black and White 2734 756.0 18.0 \n", + " 4741 204.0 474.0 \n", + " 1979 277.0 8000.0 \n", + "Color 3436 7.0 36.0 \n", + " 734 0.0 1000.0 \n", + " 2336 46.0 902.0 \n", + "\n", + " actor_2_name actor_1_facebook_likes gross \\\n", + "color \n", + " Black and White 2734 Gustav Fröhlich 136.0 26435.0 \n", + " 4741 Heather Matarazzo 659.0 334041.0 \n", + " 1979 Liam Neeson 25000.0 11030963.0 \n", + "Color 3436 Anita Mui 186.0 32333860.0 \n", + " 734 Dennis Quaid 14000.0 75530832.0 \n", + " 2336 Kirsten Dunst 33000.0 578382.0 \n", + "\n", + " genres ... language \\\n", + "color ... \n", + " Black and White 2734 Drama|Sci-Fi ... German \n", + " 4741 Crime|Drama|Romance ... English \n", + " 1979 Biography|Drama|Thriller|War ... English \n", + "Color 3436 Action|Comedy ... Cantonese \n", + " 734 Drama|Sport ... English \n", + " 2336 Crime|Drama|Mystery|Romance|Thriller ... English \n", + "\n", + " country content_rating budget title_year \\\n", + "color \n", + " Black and White 2734 Germany Not Rated 6000000.0 1927.0 \n", + " 4741 USA R 500000.0 1997.0 \n", + " 1979 UK R 28000000.0 1996.0 \n", + "Color 3436 Hong Kong R 7500000.0 1995.0 \n", + " 734 USA R 55000000.0 1999.0 \n", + " 2336 USA R NaN 2010.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "color \n", + " Black and White 2734 23.0 8.3 1.33 \n", + " 4741 529.0 6.5 1.85 \n", + " 1979 14000.0 7.1 1.85 \n", + "Color 3436 147.0 6.7 2.35 \n", + " 734 2000.0 6.8 2.35 \n", + " 2336 4000.0 6.3 1.85 \n", + "\n", + " movie_facebook_likes weights \n", + "color \n", + " Black and White 2734 12000 0 \n", + " 4741 51 0 \n", + " 1979 0 0 \n", + "Color 3436 0 0 \n", + " 734 0 0 \n", + " 2336 0 0 \n", + "\n", + "[6 rows x 29 columns]" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby('color').apply(lambda x: x.sample(n=3))" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
0Black and WhiteSteven Zaillian127.0128.0234.0581.0Anthony Hopkins14000.07221458.0Drama|Thriller...EnglishGermanyPG-1355000000.02006.012000.06.21.8501
1Black and WhiteRandy Moore143.090.013.0432.0Lee Armstrong977.0169719.0Fantasy|Horror...EnglishUSANot RatedNaN2013.0511.05.21.8500
2Black and WhiteTodd Haynes231.0135.0162.0228.0Heath Ledger23000.04001121.0Biography|Drama|Music...EnglishUSAR20000000.02007.013000.07.02.3500
3ColorMartin Brest94.0105.0102.0383.0Ronny Cox901.0234760500.0Action|Comedy|Crime...EnglishUSAR14000000.01984.0605.07.31.8500
4ColorDario Argento76.0120.0930.0433.0Adrienne Barbeau982.0349618.0Horror...EnglishItalyR9000000.01990.0602.06.11.853750
5ColorTyler Perry36.0113.00.0256.0Mary J. Blige607.051697449.0Comedy|Drama...EnglishUSAPG-1313000000.02009.0269.04.11.8510001
\n", + "

6 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Black and White Steven Zaillian 127.0 128.0 \n", + "1 Black and White Randy Moore 143.0 90.0 \n", + "2 Black and White Todd Haynes 231.0 135.0 \n", + "3 Color Martin Brest 94.0 105.0 \n", + "4 Color Dario Argento 76.0 120.0 \n", + "5 Color Tyler Perry 36.0 113.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 234.0 581.0 Anthony Hopkins \n", + "1 13.0 432.0 Lee Armstrong \n", + "2 162.0 228.0 Heath Ledger \n", + "3 102.0 383.0 Ronny Cox \n", + "4 930.0 433.0 Adrienne Barbeau \n", + "5 0.0 256.0 Mary J. Blige \n", + "\n", + " actor_1_facebook_likes gross genres ... language \\\n", + "0 14000.0 7221458.0 Drama|Thriller ... English \n", + "1 977.0 169719.0 Fantasy|Horror ... English \n", + "2 23000.0 4001121.0 Biography|Drama|Music ... English \n", + "3 901.0 234760500.0 Action|Comedy|Crime ... English \n", + "4 982.0 349618.0 Horror ... English \n", + "5 607.0 51697449.0 Comedy|Drama ... English \n", + "\n", + " country content_rating budget title_year actor_2_facebook_likes \\\n", + "0 Germany PG-13 55000000.0 2006.0 12000.0 \n", + "1 USA Not Rated NaN 2013.0 511.0 \n", + "2 USA R 20000000.0 2007.0 13000.0 \n", + "3 USA R 14000000.0 1984.0 605.0 \n", + "4 Italy R 9000000.0 1990.0 602.0 \n", + "5 USA PG-13 13000000.0 2009.0 269.0 \n", + "\n", + " imdb_score aspect_ratio movie_facebook_likes weights \n", + "0 6.2 1.85 0 1 \n", + "1 5.2 1.85 0 0 \n", + "2 7.0 2.35 0 0 \n", + "3 7.3 1.85 0 0 \n", + "4 6.1 1.85 375 0 \n", + "5 4.1 1.85 1000 1 \n", + "\n", + "[6 rows x 29 columns]" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby('color').apply(lambda x: x.sample(n=3)).reset_index(drop = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Bonus: get first and last rows of DataFrame" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...EnglishUSAPG-13237000000.02009.0936.07.91.78330001
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...EnglishUSAPG-13300000000.02007.05000.07.12.3501
\n", + "

2 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy ... \n", + "\n", + " language country content_rating budget title_year \\\n", + "0 English USA PG-13 237000000.0 2009.0 \n", + "1 English USA PG-13 300000000.0 2007.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \\\n", + "0 936.0 7.9 1.78 33000 \n", + "1 5000.0 7.1 2.35 0 \n", + "\n", + " weights \n", + "0 1 \n", + "1 1 \n", + "\n", + "[2 rows x 29 columns]" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
5041ColorDaniel Hsia14.0100.00.0489.0Daniel Henney946.010443.0Comedy|Drama|Romance...EnglishUSAPG-13NaN2012.0719.06.32.356601
5042ColorJon Gunn43.090.016.016.0Brian Herzlinger86.085222.0Documentary...EnglishUSAPG1100.02004.023.06.61.854561
\n", + "

2 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "5041 Color Daniel Hsia 14.0 100.0 \n", + "5042 Color Jon Gunn 43.0 90.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "5041 0.0 489.0 Daniel Henney \n", + "5042 16.0 16.0 Brian Herzlinger \n", + "\n", + " actor_1_facebook_likes gross genres ... language \\\n", + "5041 946.0 10443.0 Comedy|Drama|Romance ... English \n", + "5042 86.0 85222.0 Documentary ... English \n", + "\n", + " country content_rating budget title_year actor_2_facebook_likes \\\n", + "5041 USA PG-13 NaN 2012.0 719.0 \n", + "5042 USA PG 1100.0 2004.0 23.0 \n", + "\n", + " imdb_score aspect_ratio movie_facebook_likes weights \n", + "5041 6.3 2.35 660 1 \n", + "5042 6.6 1.85 456 1 \n", + "\n", + "[2 rows x 29 columns]" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.tail(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...EnglishUSAPG-13237000000.02009.0936.07.91.78330001
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...EnglishUSAPG-13300000000.02007.05000.07.12.3501
5041ColorDaniel Hsia14.0100.00.0489.0Daniel Henney946.010443.0Comedy|Drama|Romance...EnglishUSAPG-13NaN2012.0719.06.32.356601
5042ColorJon Gunn43.090.016.016.0Brian Herzlinger86.085222.0Documentary...EnglishUSAPG1100.02004.023.06.61.854561
\n", + "

4 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "5041 Color Daniel Hsia 14.0 100.0 \n", + "5042 Color Jon Gunn 43.0 90.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "5041 0.0 489.0 Daniel Henney \n", + "5042 16.0 16.0 Brian Herzlinger \n", + "\n", + " actor_1_facebook_likes gross genres \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy \n", + "5041 946.0 10443.0 Comedy|Drama|Romance \n", + "5042 86.0 85222.0 Documentary \n", + "\n", + " ... language country content_rating budget title_year \\\n", + "0 ... English USA PG-13 237000000.0 2009.0 \n", + "1 ... English USA PG-13 300000000.0 2007.0 \n", + "5041 ... English USA PG-13 NaN 2012.0 \n", + "5042 ... English USA PG 1100.0 2004.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \\\n", + "0 936.0 7.9 1.78 33000 \n", + "1 5000.0 7.1 2.35 0 \n", + "5041 719.0 6.3 2.35 660 \n", + "5042 23.0 6.6 1.85 456 \n", + "\n", + " weights \n", + "0 1 \n", + "1 1 \n", + "5041 1 \n", + "5042 1 \n", + "\n", + "[4 rows x 29 columns]" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# combine head and tail variant 1\n", + "rows = 2\n", + "df.head(rows).append(df.tail(rows))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/pandas/22.pandas-how-to-filter-results-of-value_counts.ipynb b/notebooks/pandas/22.pandas-how-to-filter-results-of-value_counts.ipynb new file mode 100644 index 0000000..4fb69be --- /dev/null +++ b/notebooks/pandas/22.pandas-how-to-filter-results-of-value_counts.ipynb @@ -0,0 +1,1019 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 22. Pandas How to filter results of value_counts?" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
\n", + "

2 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "0 3054.0 English USA PG-13 237000000.0 \n", + "1 1238.0 English USA PG-13 300000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "0 2009.0 936.0 7.9 1.78 \n", + "1 2007.0 5000.0 7.1 2.35 \n", + "\n", + " movie_facebook_likes \n", + "0 33000 \n", + "1 0 \n", + "\n", + "[2 rows x 28 columns]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# sample of the data\n", + "df.head(2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. How value counts works" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 English\n", + "1 English\n", + "2 English\n", + "3 English\n", + "4 NaN\n", + " ... \n", + "5038 English\n", + "5039 English\n", + "5040 English\n", + "5041 English\n", + "5042 English\n", + "Name: language, Length: 5043, dtype: object" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "col = 'language'\n", + "df[col]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "English 4704\n", + "French 73\n", + "Spanish 40\n", + "Hindi 28\n", + "Mandarin 26\n", + "German 19\n", + "Japanese 18\n", + "Russian 11\n", + "Cantonese 11\n", + "Italian 11\n", + "Portuguese 8\n", + "Korean 8\n", + "Arabic 5\n", + "Hebrew 5\n", + "Swedish 5\n", + "Danish 5\n", + "Persian 4\n", + "Norwegian 4\n", + "Polish 4\n", + "Dutch 4\n", + "Chinese 3\n", + "Thai 3\n", + "Icelandic 2\n", + "Dari 2\n", + "Zulu 2\n", + "None 2\n", + "Romanian 2\n", + "Aboriginal 2\n", + "Indonesian 2\n", + "Panjabi 1\n", + "Kazakh 1\n", + "Kannada 1\n", + "Aramaic 1\n", + "Urdu 1\n", + "Dzongkha 1\n", + "Czech 1\n", + "Tamil 1\n", + "Bosnian 1\n", + "Telugu 1\n", + "Hungarian 1\n", + "Filipino 1\n", + "Mongolian 1\n", + "Slovenian 1\n", + "Greek 1\n", + "Vietnamese 1\n", + "Maya 1\n", + "Swahili 1\n", + "Name: language, dtype: int64" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[col].value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dtype('int64')" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[col].value_counts().dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([4704, 73, 40, 28, 26, 19, 18, 11, 11, 11, 8,\n", + " 8, 5, 5, 5, 5, 4, 4, 4, 4, 3, 3,\n", + " 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1,\n", + " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", + " 1, 1, 1])" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[col].value_counts().values" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['English', 'French', 'Spanish', 'Hindi', 'Mandarin', 'German',\n", + " 'Japanese', 'Russian', 'Cantonese', 'Italian', 'Portuguese', 'Korean',\n", + " 'Arabic', 'Hebrew', 'Swedish', 'Danish', 'Persian', 'Norwegian',\n", + " 'Polish', 'Dutch', 'Chinese', 'Thai', 'Icelandic', 'Dari', 'Zulu',\n", + " 'None', 'Romanian', 'Aboriginal', 'Indonesian', 'Panjabi', 'Kazakh',\n", + " 'Kannada', 'Aramaic', 'Urdu', 'Dzongkha', 'Czech', 'Tamil', 'Bosnian',\n", + " 'Telugu', 'Hungarian', 'Filipino', 'Mongolian', 'Slovenian', 'Greek',\n", + " 'Vietnamese', 'Maya', 'Swahili'],\n", + " dtype='object')" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[col].value_counts().index" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Filter value_counts with isin" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2388 Chinese\n", + "2740 Thai\n", + "3022 Chinese\n", + "3311 Thai\n", + "3427 Chinese\n", + "3659 Thai\n", + "Name: language, dtype: object" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['language'].isin(df['language'].value_counts()[df['language'].value_counts()==3].index)].language" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 True\n", + "1 True\n", + "2 True\n", + "3 True\n", + "4 False\n", + " ... \n", + "5038 True\n", + "5039 True\n", + "5040 True\n", + "5041 True\n", + "5042 True\n", + "Name: language, Length: 5043, dtype: bool" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[col].isin(df[col].value_counts().index)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "English False\n", + "French False\n", + "Spanish False\n", + "Hindi False\n", + "Mandarin False\n", + "German False\n", + "Japanese False\n", + "Russian False\n", + "Cantonese False\n", + "Italian False\n", + "Portuguese False\n", + "Korean False\n", + "Arabic False\n", + "Hebrew False\n", + "Swedish False\n", + "Danish False\n", + "Persian False\n", + "Norwegian False\n", + "Polish False\n", + "Dutch False\n", + "Chinese True\n", + "Thai True\n", + "Icelandic False\n", + "Dari False\n", + "Zulu False\n", + "None False\n", + "Romanian False\n", + "Aboriginal False\n", + "Indonesian False\n", + "Panjabi False\n", + "Kazakh False\n", + "Kannada False\n", + "Aramaic False\n", + "Urdu False\n", + "Dzongkha False\n", + "Czech False\n", + "Tamil False\n", + "Bosnian False\n", + "Telugu False\n", + "Hungarian False\n", + "Filipino False\n", + "Mongolian False\n", + "Slovenian False\n", + "Greek False\n", + "Vietnamese False\n", + "Maya False\n", + "Swahili False\n", + "Name: language, dtype: bool" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['language'].value_counts()==3" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Chinese 3\n", + "Thai 3\n", + "Name: language, dtype: int64" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['language'].value_counts()[df['language'].value_counts()==3].index" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 False\n", + "1 False\n", + "2 False\n", + "3 False\n", + "4 False\n", + " ... \n", + "5038 False\n", + "5039 False\n", + "5040 False\n", + "5041 False\n", + "5042 False\n", + "Name: language, Length: 5043, dtype: bool" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['language'].isin(df['language'].value_counts()[df['language'].value_counts()==1].index)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "2388 Chinese\n", + "2740 Thai\n", + "3022 Chinese\n", + "3311 Thai\n", + "3427 Chinese\n", + "3659 Thai\n", + "Name: language, dtype: object" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['language'].isin(df['language'].value_counts()[df['language'].value_counts()==3].index)]['language']" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "English 4704\n", + "French 73\n", + "Spanish 40\n", + "Hindi 28\n", + "Mandarin 26\n", + "German 19\n", + "Japanese 18\n", + "Russian 11\n", + "Cantonese 11\n", + "Italian 11\n", + "Name: language, dtype: int64" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['language'].value_counts()[df['language'].value_counts()> 10]" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 English\n", + "1 English\n", + "2 English\n", + "3 English\n", + "5 English\n", + " ... \n", + "5038 English\n", + "5039 English\n", + "5040 English\n", + "5041 English\n", + "5042 English\n", + "Name: language, Length: 4941, dtype: object" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['language'].isin(df['language'].value_counts()[df['language'].value_counts() > 10].index)].language" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['Chinese', 'Thai'], dtype=object)" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['language'].isin(df['language'].value_counts()[df['language'].value_counts() == 3].index)].language.unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Use group by and lambda to simulate filter on value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
2388ColorDanny Pang23.0107.015.027.0Angelica Lee82.0NaNAction...4.0ChineseChinaPG-13NaN2013.039.05.71.85124
2740ColorTony Jaa110.0110.00.07.0Petchtai Wongkamlao64.0102055.0Action...72.0ThaiThailandR300000000.02008.045.06.22.350
3022ColorMabel Cheung6.0130.03.02.0Ching Wan Lau215.0NaNDrama...6.0ChineseChinaNaN12000000.02015.027.06.22.354
3311ColorChatrichalerm Yukol31.0300.06.06.0Chatchai Plengpanich7.0454255.0Action|Adventure|Drama|History|War...47.0ThaiThailandR400000000.02001.06.06.61.85124
3427ColorDennie Gordon11.0114.029.011.0Ruby Lin163.050000.0Action|Adventure|Comedy|Romance...2.0ChineseChinaNaNNaN2013.020.05.12.3581
3659ColorPrachya Pinkaew112.0111.064.0380.0Nathan Jones778.011905519.0Action|Crime|Drama|Thriller...214.0ThaiThailandR200000000.02005.0635.07.11.850
\n", + "

6 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "2388 Color Danny Pang 23.0 107.0 \n", + "2740 Color Tony Jaa 110.0 110.0 \n", + "3022 Color Mabel Cheung 6.0 130.0 \n", + "3311 Color Chatrichalerm Yukol 31.0 300.0 \n", + "3427 Color Dennie Gordon 11.0 114.0 \n", + "3659 Color Prachya Pinkaew 112.0 111.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "2388 15.0 27.0 Angelica Lee \n", + "2740 0.0 7.0 Petchtai Wongkamlao \n", + "3022 3.0 2.0 Ching Wan Lau \n", + "3311 6.0 6.0 Chatchai Plengpanich \n", + "3427 29.0 11.0 Ruby Lin \n", + "3659 64.0 380.0 Nathan Jones \n", + "\n", + " actor_1_facebook_likes gross genres \\\n", + "2388 82.0 NaN Action \n", + "2740 64.0 102055.0 Action \n", + "3022 215.0 NaN Drama \n", + "3311 7.0 454255.0 Action|Adventure|Drama|History|War \n", + "3427 163.0 50000.0 Action|Adventure|Comedy|Romance \n", + "3659 778.0 11905519.0 Action|Crime|Drama|Thriller \n", + "\n", + " ... num_user_for_reviews language country content_rating \\\n", + "2388 ... 4.0 Chinese China PG-13 \n", + "2740 ... 72.0 Thai Thailand R \n", + "3022 ... 6.0 Chinese China NaN \n", + "3311 ... 47.0 Thai Thailand R \n", + "3427 ... 2.0 Chinese China NaN \n", + "3659 ... 214.0 Thai Thailand R \n", + "\n", + " budget title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "2388 NaN 2013.0 39.0 5.7 1.85 \n", + "2740 300000000.0 2008.0 45.0 6.2 2.35 \n", + "3022 12000000.0 2015.0 27.0 6.2 2.35 \n", + "3311 400000000.0 2001.0 6.0 6.6 1.85 \n", + "3427 NaN 2013.0 20.0 5.1 2.35 \n", + "3659 200000000.0 2005.0 635.0 7.1 1.85 \n", + "\n", + " movie_facebook_likes \n", + "2388 124 \n", + "2740 0 \n", + "3022 4 \n", + "3311 124 \n", + "3427 81 \n", + "3659 0 \n", + "\n", + "[6 rows x 28 columns]" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby('language').filter(lambda x: len(x) == 3)" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2388 Chinese\n", + "2740 Thai\n", + "3022 Chinese\n", + "3311 Thai\n", + "3427 Chinese\n", + "3659 Thai\n", + "Name: language, dtype: object" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby('language').filter(lambda x: len(x) == 3)['language']" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['Chinese', 'Thai'], dtype=object)" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby('language').filter(lambda x: len(x) == 3)['language'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Bonus: Which is faster?" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "100 loops, best of 3: 10.9 ms per loop\n" + ] + } + ], + "source": [ + "%timeit df.groupby('language').filter(lambda x: len(x) == 3)['language']" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "100 loops, best of 3: 3.19 ms per loop\n" + ] + } + ], + "source": [ + "%timeit df[df['language'].isin(df['language'].value_counts()[df['language'].value_counts()==3].index)]['language']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/pandas/23.pandas-typeerror-unhashable-type-list-dict.ipynb b/notebooks/pandas/23.pandas-typeerror-unhashable-type-list-dict.ipynb new file mode 100644 index 0000000..281a81d --- /dev/null +++ b/notebooks/pandas/23.pandas-typeerror-unhashable-type-list-dict.ipynb @@ -0,0 +1,1056 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 23. Pandas TypeError: unhashable type: 'list'/'dict'\n", + "\n", + "Topics\n", + "\n", + "* apply value_counts for list/dict column\n", + "* value_counts for list column\n", + "* identify list/dict columns\n", + "* `TypeError: unhashable type: 'dict'`\n", + "* `TypeError: unhashable type: 'list'`\n", + "* Correct way to expand list column" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "pd.set_option('display.max_colwidth', -1)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame({'col1': [1, 2], 'col2': [[0.5, 0.1], [0.75, 0.25]],'col3': [{0:'a', 1:'b'}, {0:'c', 1:'d'}]})" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
01[0.5, 0.1]{0: 'a', 1: 'b'}
12[0.75, 0.25]{0: 'c', 1: 'd'}
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "0 1 [0.5, 0.1] {0: 'a', 1: 'b'}\n", + "1 2 [0.75, 0.25] {0: 'c', 1: 'd'}" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. TypeError: unhashable type: 'list'/'dict'" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "unhashable type: 'list'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# TypeError: unhashable type: 'list'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcol2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/base.py\u001b[0m in \u001b[0;36mvalue_counts\u001b[0;34m(self, normalize, sort, ascending, bins, dropna)\u001b[0m\n\u001b[1;32m 1390\u001b[0m \u001b[0mnormalize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mnormalize\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1391\u001b[0m \u001b[0mbins\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mbins\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1392\u001b[0;31m \u001b[0mdropna\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdropna\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1393\u001b[0m )\n\u001b[1;32m 1394\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36mvalue_counts\u001b[0;34m(values, sort, ascending, normalize, bins, dropna)\u001b[0m\n\u001b[1;32m 755\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 756\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 757\u001b[0;31m \u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcounts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_value_counts_arraylike\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdropna\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 758\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 759\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mIndex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36m_value_counts_arraylike\u001b[0;34m(values, dropna)\u001b[0m\n\u001b[1;32m 800\u001b[0m \u001b[0;31m# TODO: handle uint8\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 801\u001b[0m \u001b[0mf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhtable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"value_count_{dtype}\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mndtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 802\u001b[0;31m \u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcounts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdropna\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 803\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 804\u001b[0m \u001b[0mmask\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0misna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_func_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.value_count_object\u001b[0;34m()\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_func_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.value_count_object\u001b[0;34m()\u001b[0m\n", + "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'" + ] + } + ], + "source": [ + "# TypeError: unhashable type: 'list'\n", + "df.col2.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "unhashable type: 'dict'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# TypeError: unhashable type: 'dict'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcol3\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/base.py\u001b[0m in \u001b[0;36mvalue_counts\u001b[0;34m(self, normalize, sort, ascending, bins, dropna)\u001b[0m\n\u001b[1;32m 1390\u001b[0m \u001b[0mnormalize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mnormalize\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1391\u001b[0m \u001b[0mbins\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mbins\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1392\u001b[0;31m \u001b[0mdropna\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdropna\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1393\u001b[0m )\n\u001b[1;32m 1394\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36mvalue_counts\u001b[0;34m(values, sort, ascending, normalize, bins, dropna)\u001b[0m\n\u001b[1;32m 755\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 756\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 757\u001b[0;31m \u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcounts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_value_counts_arraylike\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdropna\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 758\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 759\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mIndex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36m_value_counts_arraylike\u001b[0;34m(values, dropna)\u001b[0m\n\u001b[1;32m 800\u001b[0m \u001b[0;31m# TODO: handle uint8\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 801\u001b[0m \u001b[0mf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhtable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"value_count_{dtype}\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mndtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 802\u001b[0;31m \u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcounts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdropna\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 803\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 804\u001b[0m \u001b[0mmask\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0misna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_func_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.value_count_object\u001b[0;34m()\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_func_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.value_count_object\u001b[0;34m()\u001b[0m\n", + "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'dict'" + ] + } + ], + "source": [ + "# TypeError: unhashable type: 'dict'\n", + "df.col3.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.groupby('col3').transform({'col1': [min], 'col2': max})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. How to detect if column contains list or dict" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "col1 int64 \n", + "col2 object\n", + "col3 object\n", + "dtype: object" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "col1 False\n", + "col2 True \n", + "col3 False\n", + "dtype: bool" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# detect list columns\n", + "df.applymap(lambda x: isinstance(x, list)).all()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "col1 False\n", + "col2 False\n", + "col3 True \n", + "dtype: bool" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# detect dict columns\n", + "df.applymap(lambda x: isinstance(x, dict)).all()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "col1 False\n", + "col2 True \n", + "col3 True \n", + "dtype: bool" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# detect dict or list columns\n", + "df.applymap(lambda x: isinstance(x, dict) or isinstance(x, list)).all()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.1 Convert the column to string and apply value_counts" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0.75, 0.25] 1\n", + "[0.5, 0.1] 1\n", + "Name: col2, dtype: int64" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['col2'].astype('str').value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{0: 'c', 1: 'd'} 1\n", + "{0: 'a', 1: 'b'} 1\n", + "Name: col3, dtype: int64" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['col3'].astype('str').value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.2 Convert the column to string and use group by" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "unhashable type: 'dict'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# TypeError: unhashable type: 'dict'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcol3\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnotna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroupby\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'col3'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcount\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/generic.py\u001b[0m in \u001b[0;36mcount\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1594\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1595\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_data_to_aggregate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1596\u001b[0;31m \u001b[0mids\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mngroups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgrouper\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroup_info\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1597\u001b[0m \u001b[0mmask\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mids\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1598\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32mpandas/_libs/properties.pyx\u001b[0m in \u001b[0;36mpandas._libs.properties.CachedProperty.__get__\u001b[0;34m()\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/ops.py\u001b[0m in \u001b[0;36mgroup_info\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 294\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0mcache_readonly\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 295\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mgroup_info\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 296\u001b[0;31m \u001b[0mcomp_ids\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mobs_group_ids\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_compressed_labels\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 297\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 298\u001b[0m \u001b[0mngroups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mobs_group_ids\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/ops.py\u001b[0m in \u001b[0;36m_get_compressed_labels\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 310\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 311\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_compressed_labels\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 312\u001b[0;31m \u001b[0mall_labels\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mping\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlabels\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mping\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroupings\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 313\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mall_labels\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 314\u001b[0m \u001b[0mgroup_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_group_index\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mall_labels\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxnull\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/ops.py\u001b[0m in \u001b[0;36m\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 310\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 311\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_compressed_labels\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 312\u001b[0;31m \u001b[0mall_labels\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mping\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlabels\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mping\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroupings\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 313\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mall_labels\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 314\u001b[0m \u001b[0mgroup_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_group_index\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mall_labels\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxnull\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/grouper.py\u001b[0m in \u001b[0;36mlabels\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 395\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mlabels\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 396\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_labels\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 397\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_labels\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 398\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_labels\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 399\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/grouper.py\u001b[0m in \u001b[0;36m_make_labels\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 419\u001b[0m \u001b[0muniques\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgrouper\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult_index\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 420\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 421\u001b[0;31m \u001b[0mlabels\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muniques\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0malgorithms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfactorize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgrouper\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 422\u001b[0m \u001b[0muniques\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mIndex\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0muniques\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 423\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_labels\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlabels\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/util/_decorators.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 206\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 207\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnew_arg_name\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnew_arg_value\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 208\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 209\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 210\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36mfactorize\u001b[0;34m(values, sort, order, na_sentinel, size_hint)\u001b[0m\n\u001b[1;32m 670\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 671\u001b[0m labels, uniques = _factorize_array(\n\u001b[0;32m--> 672\u001b[0;31m \u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mna_sentinel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mna_sentinel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msize_hint\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msize_hint\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mna_value\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mna_value\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 673\u001b[0m )\n\u001b[1;32m 674\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36m_factorize_array\u001b[0;34m(values, na_sentinel, size_hint, na_value)\u001b[0m\n\u001b[1;32m 506\u001b[0m \u001b[0mtable\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mhash_klass\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msize_hint\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 507\u001b[0m uniques, labels = table.factorize(\n\u001b[0;32m--> 508\u001b[0;31m \u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mna_sentinel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mna_sentinel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mna_value\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mna_value\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 509\u001b[0m )\n\u001b[1;32m 510\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.factorize\u001b[0;34m()\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable._unique\u001b[0;34m()\u001b[0m\n", + "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'dict'" + ] + } + ], + "source": [ + "# TypeError: unhashable type: 'dict'\n", + "df[df.col3.notna()].groupby(['col3']).count()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col3
col2
[0.5, 0.1]11
[0.75, 0.25]11
\n", + "
" + ], + "text/plain": [ + " col1 col3\n", + "col2 \n", + "[0.5, 0.1] 1 1 \n", + "[0.75, 0.25] 1 1 " + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df.col2.notna()].astype('str').groupby(['col2']).count()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2
col3
{0: 'a', 1: 'b'}11
{0: 'c', 1: 'd'}11
\n", + "
" + ], + "text/plain": [ + " col1 col2\n", + "col3 \n", + "{0: 'a', 1: 'b'} 1 1 \n", + "{0: 'c', 1: 'd'} 1 1 " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df.col3.notna()].astype('str').groupby(['col3']).count()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Convert list/dict column to tuple" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(0.5, 0.1) 1\n", + "(0.75, 0.25) 1\n", + "Name: col2, dtype: int64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# for list\n", + "df['col2'].apply(tuple).value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(0, 1) 2\n", + "Name: col3, dtype: int64" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# for dict\n", + "df['col3'].apply(tuple).value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Expand the list column" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.75 1\n", + "0.50 1\n", + "Name: 0, dtype: int64" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.col2.apply(pd.Series)[0].value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.10 1\n", + "0.25 1\n", + "Name: 1, dtype: int64" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.col2.apply(pd.Series)[1].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. List column mixed: strings and list items" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame({'col1': [1, 2], 'col2': [[0.5], 3],'col3': [{0:'a', 1:'b'}, {0:'c', 1:'d'}]})" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
01[0.5]{0: 'a', 1: 'b'}
123{0: 'c', 1: 'd'}
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "0 1 [0.5] {0: 'a', 1: 'b'}\n", + "1 2 3 {0: 'c', 1: 'd'}" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3.0 1\n", + "0.5 1\n", + "Name: col2, dtype: int64" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.applymap(lambda x: x[0] if isinstance(x, list) else x)['col2'].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus Step #1: Correct way to expand list column" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame({'col1': [1, 2], 'col2': [[0.5, 0.1], [0.75, 0.25]],'col3': [{0:'a', 1:'b'}, {0:'c', 1:'d'}]})" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
01[0.5, 0.1]{0: 'a', 1: 'b'}
12[0.75, 0.25]{0: 'c', 1: 'd'}
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "0 1 [0.5, 0.1] {0: 'a', 1: 'b'}\n", + "1 2 [0.75, 0.25] {0: 'c', 1: 'd'}" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 NaN\n", + "1 NaN\n", + "Name: col2, dtype: float64" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.col2.str.split(',', expand=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01
0[0.50.1]
1[0.750.25]
\n", + "
" + ], + "text/plain": [ + " 0 1\n", + "0 [0.5 0.1] \n", + "1 [0.75 0.25]" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.col2.astype('str').str.split(',', expand=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01
00.500.10
10.750.25
\n", + "
" + ], + "text/plain": [ + " 0 1\n", + "0 0.50 0.10\n", + "1 0.75 0.25" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.col2.apply(pd.Series)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "df[['l1', 'l2']] = df.col2.apply(pd.Series)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
l1l2
0.500.101[0.5, 0.1]{0: 'a', 1: 'b'}
0.750.252[0.75, 0.25]{0: 'c', 1: 'd'}
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "l1 l2 \n", + "0.50 0.10 1 [0.5, 0.1] {0: 'a', 1: 'b'}\n", + "0.75 0.25 2 [0.75, 0.25] {0: 'c', 1: 'd'}" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.set_index(['l1', 'l2'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/pandas/24-pandas-check-value-column-contained-another-column-same-row.ipynb b/notebooks/pandas/24-pandas-check-value-column-contained-another-column-same-row.ipynb new file mode 100644 index 0000000..d0c667c --- /dev/null +++ b/notebooks/pandas/24-pandas-check-value-column-contained-another-column-same-row.ipynb @@ -0,0 +1,896 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 24. Pandas: Check If Value of Column Is Contained in Another Column in the Same Row" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titleplot_keywordscountry
0Avataravatar|future|marine|native|paraplegicUSA
1Pirates of the Caribbean: At World's Endgoddess|marriage ceremony|marriage proposal|pi...USA
2Spectrebomb|espionage|sequel|spy|terroristUK
3The Dark Knight Risesdeception|imprisonment|lawlessness|police offi...USA
4Star Wars: Episode VII - The Force Awakens  ...NaNNaN
5John Carteralien|american civil war|male nipple|mars|prin...USA
6Spider-Man 3sandman|spider man|symbiote|venom|villainUSA
7Tangled17th century|based on fairy tale|disney|flower...USA
8Avengers: Age of Ultronartificial intelligence|based on comic book|ca...USA
9Harry Potter and the Half-Blood Princeblood|book|love|potion|professorUK
\n", + "
" + ], + "text/plain": [ + " movie_title \\\n", + "0 Avatar  \n", + "1 Pirates of the Caribbean: At World's End  \n", + "2 Spectre  \n", + "3 The Dark Knight Rises  \n", + "4 Star Wars: Episode VII - The Force Awakens  ... \n", + "5 John Carter  \n", + "6 Spider-Man 3  \n", + "7 Tangled  \n", + "8 Avengers: Age of Ultron  \n", + "9 Harry Potter and the Half-Blood Prince  \n", + "\n", + " plot_keywords country \n", + "0 avatar|future|marine|native|paraplegic USA \n", + "1 goddess|marriage ceremony|marriage proposal|pi... USA \n", + "2 bomb|espionage|sequel|spy|terrorist UK \n", + "3 deception|imprisonment|lawlessness|police offi... USA \n", + "4 NaN NaN \n", + "5 alien|american civil war|male nipple|mars|prin... USA \n", + "6 sandman|spider man|symbiote|venom|villain USA \n", + "7 17th century|based on fairy tale|disney|flower... USA \n", + "8 artificial intelligence|based on comic book|ca... USA \n", + "9 blood|book|love|potion|professor UK " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[['movie_title', 'plot_keywords', 'country']].head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Check If String Column Contains Substring of Another with Function" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlecountry
196AustraliaAustralia
2504McFarland, USAUSA
\n", + "
" + ], + "text/plain": [ + " movie_title country\n", + "196 Australia  Australia\n", + "2504 McFarland, USA  USA" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def find_value_column(row):\n", + " return row.country in row.movie_title\n", + "\n", + "df.country.fillna('_', inplace=True)\n", + "df[df.apply(find_value_column, axis=1)][['movie_title', 'country']].head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "for row in df.loc[df.plot_keywords.isnull(), 'plot_keywords'].index:\n", + " df.at[row, 'plot_keywords'] = []" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titleplot_keywords
0Avataravatar|future|marine|native|paraplegic
22Robin Hood1190s|archer|england|king of england|robin hood
25King Konganimal name in title|ape abducts a woman|goril...
26Titanicartist|love|ship|titanic|wet
33Alice in Wonderlandalice in wonderland|mistaking reality for drea...
130Thorbattle|marvel cinematic universe|scientist|tho...
145Pan1940s|child hero|fantasy world|orphan|referenc...
147Troygreek|mythology|prince|trojan|troy
150Ghostbustersghost|ghostbuster|ghostbusters|male objectific...
160Star Trekbox office hit|future|lifted by the throat|sta...
\n", + "
" + ], + "text/plain": [ + " movie_title plot_keywords\n", + "0 Avatar  avatar|future|marine|native|paraplegic\n", + "22 Robin Hood  1190s|archer|england|king of england|robin hood\n", + "25 King Kong  animal name in title|ape abducts a woman|goril...\n", + "26 Titanic  artist|love|ship|titanic|wet\n", + "33 Alice in Wonderland  alice in wonderland|mistaking reality for drea...\n", + "130 Thor  battle|marvel cinematic universe|scientist|tho...\n", + "145 Pan  1940s|child hero|fantasy world|orphan|referenc...\n", + "147 Troy  greek|mythology|prince|trojan|troy\n", + "150 Ghostbusters  ghost|ghostbuster|ghostbusters|male objectific...\n", + "160 Star Trek  box office hit|future|lifted by the throat|sta..." + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def find_value_column(row):\n", + " return row.movie_title.lower().strip() in row.plot_keywords\n", + "\n", + "df[df.apply(find_value_column, axis=1)][['movie_title', 'plot_keywords']].head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Check If Column contains another column with lambda" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlecountry
196AustraliaAustralia
2504McFarland, USAUSA
\n", + "
" + ], + "text/plain": [ + " movie_title country\n", + "196 Australia  Australia\n", + "2504 McFarland, USA  USA" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df.apply(lambda x: x.country in x.movie_title, axis=1)][['movie_title', 'country']].head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "(\"'Series' objects are mutable, thus they cannot be hashed\", 'occurred at index 0')", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Warning for common error\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mrow\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcountry\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmovie_title\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36mapply\u001b[0;34m(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)\u001b[0m\n\u001b[1;32m 6904\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6905\u001b[0m )\n\u001b[0;32m-> 6906\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mop\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6907\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6908\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mapplymap\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/apply.py\u001b[0m in \u001b[0;36mget_result\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 184\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply_raw\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 185\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 186\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply_standard\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 187\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 188\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mapply_empty_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/apply.py\u001b[0m in \u001b[0;36mapply_standard\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 290\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 291\u001b[0m \u001b[0;31m# compute the result using the series generator\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 292\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply_series_generator\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 293\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 294\u001b[0m \u001b[0;31m# wrap results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/apply.py\u001b[0m in \u001b[0;36mapply_series_generator\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 319\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 320\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mv\u001b[0m \u001b[0;32min\u001b[0m \u001b[0menumerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseries_gen\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 321\u001b[0;31m \u001b[0mresults\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mv\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 322\u001b[0m \u001b[0mkeys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 323\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m(row)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Warning for common error\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mrow\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcountry\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmovie_title\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__contains__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1935\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__contains__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1936\u001b[0m \u001b[0;34m\"\"\"True if the key is in the info axis\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1937\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mkey\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_info_axis\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1938\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1939\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0mproperty\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/indexes/range.py\u001b[0m in \u001b[0;36m__contains__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 362\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 363\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__contains__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mUnion\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minteger\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0mbool\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 364\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 365\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 366\u001b[0m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mensure_python_int\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__hash__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1885\u001b[0m raise TypeError(\n\u001b[1;32m 1886\u001b[0m \u001b[0;34m\"{0!r} objects are mutable, thus they cannot be\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1887\u001b[0;31m \u001b[0;34m\" hashed\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__class__\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1888\u001b[0m )\n\u001b[1;32m 1889\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mTypeError\u001b[0m: (\"'Series' objects are mutable, thus they cannot be hashed\", 'occurred at index 0')" + ] + } + ], + "source": [ + "# Warning for common error\n", + "df.apply(lambda row: df.country in df.movie_title, axis=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: Fastest Way to Check If One Column Contains Another" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "df['country'].fillna('Uknown', inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlecountry
196AustraliaAustralia
2504McFarland, USAUSA
\n", + "
" + ], + "text/plain": [ + " movie_title country\n", + "196 Australia  Australia\n", + "2504 McFarland, USA  USA" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[[x[0] in x[1] for x in zip(df['country'], df['movie_title'])]][['movie_title', 'country']]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4: For Loop and df.iterrows() Version" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Australia Australia \n", + "USA McFarland, USA \n" + ] + } + ], + "source": [ + "for i, row in df.iterrows():\n", + " if row.country in row.movie_title:\n", + " print(row.country, row.movie_title)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus Step: Check If List Column Contains Substring of Another with Function" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "df['keywords'] = df.plot_keywords.str.split('|')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 [avatar, future, marine, native, paraplegic]\n", + "1 [goddess, marriage ceremony, marriage proposal...\n", + "2 [bomb, espionage, sequel, spy, terrorist]\n", + "3 [deception, imprisonment, lawlessness, police ...\n", + "4 NaN\n", + " ... \n", + "5038 [fraud, postal worker, prison, theft, trial]\n", + "5039 [cult, fbi, hideout, prison escape, serial kil...\n", + "5040 NaN\n", + "5041 NaN\n", + "5042 [actress name in title, crush, date, four word...\n", + "Name: keywords, Length: 5043, dtype: object" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['keywords']" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlekeywords
0Avatar[avatar, future, marine, native, paraplegic]
9Harry Potter and the Half-Blood Prince[blood, book, love, potion, professor]
33Alice in Wonderland[alice in wonderland, mistaking reality for dr...
68Monsters vs. Aliens[alien, alien invasion, alien space craft, gia...
77G.I. Joe: The Rise of Cobra[cobra, gi joe, snake, train, warhead]
\n", + "
" + ], + "text/plain": [ + " movie_title \\\n", + "0 Avatar  \n", + "9 Harry Potter and the Half-Blood Prince  \n", + "33 Alice in Wonderland  \n", + "68 Monsters vs. Aliens  \n", + "77 G.I. Joe: The Rise of Cobra  \n", + "\n", + " keywords \n", + "0 [avatar, future, marine, native, paraplegic] \n", + "9 [blood, book, love, potion, professor] \n", + "33 [alice in wonderland, mistaking reality for dr... \n", + "68 [alien, alien invasion, alien space craft, gia... \n", + "77 [cobra, gi joe, snake, train, warhead] " + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def find_value_column(row):\n", + " if isinstance(row['keywords'], list):\n", + " for keyword in row['keywords']:\n", + " return keyword in row.movie_title.lower()\n", + " else:\n", + " return False\n", + "\n", + "df[df.apply(find_value_column, axis=1)][['movie_title', 'keywords']].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "df['keywords'] = df['keywords'].apply(lambda d: d if isinstance(d, list) else [])" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlekeywords
0Avatar[avatar, future, marine, native, paraplegic]
9Harry Potter and the Half-Blood Prince[blood, book, love, potion, professor]
33Alice in Wonderland[alice in wonderland, mistaking reality for dr...
68Monsters vs. Aliens[alien, alien invasion, alien space craft, gia...
77G.I. Joe: The Rise of Cobra[cobra, gi joe, snake, train, warhead]
\n", + "
" + ], + "text/plain": [ + " movie_title \\\n", + "0 Avatar  \n", + "9 Harry Potter and the Half-Blood Prince  \n", + "33 Alice in Wonderland  \n", + "68 Monsters vs. Aliens  \n", + "77 G.I. Joe: The Rise of Cobra  \n", + "\n", + " keywords \n", + "0 [avatar, future, marine, native, paraplegic] \n", + "9 [blood, book, love, potion, professor] \n", + "33 [alice in wonderland, mistaking reality for dr... \n", + "68 [alien, alien invasion, alien space craft, gia... \n", + "77 [cobra, gi joe, snake, train, warhead] " + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def find_value_column(row):\n", + " for keyword in row['keywords']:\n", + " return keyword in row.movie_title.lower()\n", + " return False\n", + "\n", + "df[df.apply(find_value_column, axis=1)][['movie_title', 'keywords']].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Performance" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10 loops, best of 3: 154 ms per loop\n" + ] + } + ], + "source": [ + "%%timeit\n", + "def find_value_column(row):\n", + " return row.country in row.movie_title\n", + "\n", + "df[df.apply(find_value_column, axis=1)]" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10 loops, best of 3: 155 ms per loop\n" + ] + } + ], + "source": [ + "%%timeit\n", + "df[df.apply(lambda x: x.country in x.movie_title, axis=1)]" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1000 loops, best of 3: 1.76 ms per loop\n" + ] + } + ], + "source": [ + "%%timeit\n", + "df[[x[0] in x[1] for x in zip(df['country'], df['movie_title'])]]" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1 loop, best of 3: 599 ms per loop\n" + ] + } + ], + "source": [ + "%%timeit\n", + "for i, row in df.iterrows():\n", + " if row.country in row.movie_title:\n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/pandas/25_Pandas_Create_A_Matplotlib_Scatterplot_From_A_Dataframe.ipynb b/notebooks/pandas/25_Pandas_Create_A_Matplotlib_Scatterplot_From_A_Dataframe.ipynb new file mode 100644 index 0000000..8e593db --- /dev/null +++ b/notebooks/pandas/25_Pandas_Create_A_Matplotlib_Scatterplot_From_A_Dataframe.ipynb @@ -0,0 +1,964 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 25. Pandas: Create A Matplotlib Scatterplot From A Dataframe " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Datasets:\n", + "* https://www.kaggle.com/statchaitya/country-to-continent\n", + "* https://www.kaggle.com/erikbruin/countries-of-the-world-iso-codes-and-population\n", + "* https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset\n", + "\n", + "\"Drawing\"\n", + "\"Drawing\"" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import chardet\n", + "import pandas as pd\n", + "\n", + "df = pd.read_csv(\"../csv/covid/covid_19_clean_complete.csv\")\n", + "population = pd.read_csv(\"../csv/covid/countries_by_population_2019.csv\")\n", + "\n", + "with open('../csv/covid/countryContinent.csv', 'rb') as f:\n", + " result = chardet.detect(f.read()) # or readline if the file is large\n", + "\n", + "continent = pd.read_csv(\"../csv/covid/countryContinent.csv\" , encoding=result['encoding'])" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Province/StateCountry/RegionLatLongDateConfirmedDeathsRecovered
15061NaNBarbados13.1939-59.54323/17/20200
15062NaNMontenegro42.500019.30003/17/20200
15063NaNThe Gambia13.4667-16.60003/17/20100
\n", + "
" + ], + "text/plain": [ + " Province/State Country/Region Lat Long Date Confirmed \\\n", + "15061 NaN Barbados 13.1939 -59.5432 3/17/20 2 \n", + "15062 NaN Montenegro 42.5000 19.3000 3/17/20 2 \n", + "15063 NaN The Gambia 13.4667 -16.6000 3/17/20 1 \n", + "\n", + " Deaths Recovered \n", + "15061 0 0 \n", + "15062 0 0 \n", + "15063 0 0 " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.tail(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Ranknamepop2019pop2018GrowthRateareaDensity
01China1433783.686NaN1.00399706961.0147.7068
12India1366417.754NaN1.00993287590.0415.6290
23United States329064.917NaN1.00599372610.035.1092
\n", + "
" + ], + "text/plain": [ + " Rank name pop2019 pop2018 GrowthRate area Density\n", + "0 1 China 1433783.686 NaN 1.0039 9706961.0 147.7068\n", + "1 2 India 1366417.754 NaN 1.0099 3287590.0 415.6290\n", + "2 3 United States 329064.917 NaN 1.0059 9372610.0 35.1092" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "population.head(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countrycode_2code_3country_codeiso_3166_2continentsub_regionregion_codesub_region_code
0AfghanistanAFAFG4ISO 3166-2:AFAsiaSouthern Asia142.034.0
1Åland IslandsAXALA248ISO 3166-2:AXEuropeNorthern Europe150.0154.0
2AlbaniaALALB8ISO 3166-2:ALEuropeSouthern Europe150.039.0
\n", + "
" + ], + "text/plain": [ + " country code_2 code_3 country_code iso_3166_2 continent \\\n", + "0 Afghanistan AF AFG 4 ISO 3166-2:AF Asia \n", + "1 Åland Islands AX ALA 248 ISO 3166-2:AX Europe \n", + "2 Albania AL ALB 8 ISO 3166-2:AL Europe \n", + "\n", + " sub_region region_code sub_region_code \n", + "0 Southern Asia 142.0 34.0 \n", + "1 Northern Europe 150.0 154.0 \n", + "2 Southern Europe 150.0 39.0 " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "continent.head(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #1: Combine covid and continent data" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "df = df.merge(continent, left_on='Country/Region', right_on='country', how='inner')" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Province/StateCountry/RegionLatLongDateConfirmedDeathsRecoveredcountrycode_2code_3country_codeiso_3166_2continentsub_regionregion_codesub_region_code
0NaNThailand15.0101.01/22/20200ThailandTHTHA764ISO 3166-2:THAsiaSouth-Eastern Asia142.035.0
1NaNThailand15.0101.01/23/20300ThailandTHTHA764ISO 3166-2:THAsiaSouth-Eastern Asia142.035.0
2NaNThailand15.0101.01/24/20500ThailandTHTHA764ISO 3166-2:THAsiaSouth-Eastern Asia142.035.0
\n", + "
" + ], + "text/plain": [ + " Province/State Country/Region Lat Long Date Confirmed Deaths \\\n", + "0 NaN Thailand 15.0 101.0 1/22/20 2 0 \n", + "1 NaN Thailand 15.0 101.0 1/23/20 3 0 \n", + "2 NaN Thailand 15.0 101.0 1/24/20 5 0 \n", + "\n", + " Recovered country code_2 code_3 country_code iso_3166_2 continent \\\n", + "0 0 Thailand TH THA 764 ISO 3166-2:TH Asia \n", + "1 0 Thailand TH THA 764 ISO 3166-2:TH Asia \n", + "2 0 Thailand TH THA 764 ISO 3166-2:TH Asia \n", + "\n", + " sub_region region_code sub_region_code \n", + "0 South-Eastern Asia 142.0 35.0 \n", + "1 South-Eastern Asia 142.0 35.0 \n", + "2 South-Eastern Asia 142.0 35.0 " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #2: Get last value for Confirmed per country" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ConfirmedCountry/RegionRecovered
022Afghanistan1
155Albania0
260Algeria12
339Andorra1
41Antigua and Barbuda0
\n", + "
" + ], + "text/plain": [ + " Confirmed Country/Region Recovered\n", + "0 22 Afghanistan 1\n", + "1 55 Albania 0\n", + "2 60 Algeria 12\n", + "3 39 Andorra 1\n", + "4 1 Antigua and Barbuda 0" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "last_confirmed_number = df[df.Confirmed > 0].groupby('Country/Region', as_index = False).last()[['Confirmed', 'Country/Region', 'Recovered']]\n", + "last_confirmed_number.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #3: Get first date of Confirmed per country" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
DateCountry/Regioncontinent
02/24/20AfghanistanAsia
13/9/20AlbaniaEurope
22/25/20AlgeriaAfrica
33/2/20AndorraEurope
43/13/20Antigua and BarbudaAmericas
\n", + "
" + ], + "text/plain": [ + " Date Country/Region continent\n", + "0 2/24/20 Afghanistan Asia\n", + "1 3/9/20 Albania Europe\n", + "2 2/25/20 Algeria Africa\n", + "3 3/2/20 Andorra Europe\n", + "4 3/13/20 Antigua and Barbuda Americas" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "first_date = df[df.Confirmed > 0].groupby('Country/Region', as_index = False).first()[['Date', 'Country/Region', 'continent']]\n", + "first_date.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #4: Combine last values and first date" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ConfirmedCountry/RegionRecoveredDatecontinent
92236Pakistan22/26/20Asia
97238Poland133/4/20Europe
109266Singapore1141/23/20Asia
111275Slovenia03/5/20Europe
40321Finland101/29/20Europe
\n", + "
" + ], + "text/plain": [ + " Confirmed Country/Region Recovered Date continent\n", + "92 236 Pakistan 2 2/26/20 Asia\n", + "97 238 Poland 13 3/4/20 Europe\n", + "109 266 Singapore 114 1/23/20 Asia\n", + "111 275 Slovenia 0 3/5/20 Europe\n", + "40 321 Finland 10 1/29/20 Europe" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df3 = last_confirmed_number.merge(first_date, on='Country/Region', how='inner')\n", + "df_final = df3.sort_values(by=['Confirmed', 'Date']).tail(20)\n", + "df_final.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #5: Convert dates to datetime and sort" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "df_final['Date'] = pd.to_datetime(df_final['Date'])" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "63 2020-01-22\n", + "109 2020-01-23\n", + "75 2020-01-25\n", + "44 2020-01-27\n", + "40 2020-01-29\n", + "61 2020-01-31\n", + "118 2020-01-31\n", + "114 2020-02-01\n", + "15 2020-02-04\n", + "60 2020-02-21\n", + "119 2020-02-25\n", + "9 2020-02-25\n", + "90 2020-02-26\n", + "92 2020-02-26\n", + "46 2020-02-26\n", + "19 2020-02-26\n", + "99 2020-02-29\n", + "98 2020-03-02\n", + "97 2020-03-04\n", + "111 2020-03-05\n", + "Name: Date, dtype: datetime64[ns]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_final['Date'].sort_values()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #6: Plot Data as Scatterplot\n", + "* x axis - Current Active Cases\n", + "* y axis - First Date Confirmed\n", + "* size of points - Current Recovered " + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from matplotlib.pyplot import figure\n", + "\n", + "figure(num=None, figsize=(12, 8), dpi=100, facecolor='w', edgecolor='k')\n", + "\n", + "plt.scatter(df_final.Confirmed, df_final.Date, s=df_final.Recovered, alpha = 0.25)\n", + "\n", + "[plt.text( x=row['Confirmed'], y=row['Date'], s=row['Country/Region']) for k,row in df_final.iterrows()]\n", + "\n", + "plt.xlabel('Current Active Cases')\n", + "plt.ylabel('First Date Confirmed')\n", + "\n", + "axes = plt.gca()\n", + "axes.set_ylim(['2020-01-20','2020-03-10'])\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #7: Plot Data with continent colors" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "continent_colors = {'Europe':'red',\n", + " 'Africa':'green',\n", + " 'Americas':'blue',\n", + " 'Asia':'cyan',\n", + " 'Australia and New Zealand':'purple'}" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from matplotlib.pyplot import figure\n", + "\n", + "figure(num=None, figsize=(12, 8), dpi=100, facecolor='w', edgecolor='k')\n", + "\n", + "plt.xlabel('Current Active Cases')\n", + "plt.ylabel('First Date Confirmed')\n", + "\n", + "for i,j in df_final.iterrows():\n", + " reg_color = continent_colors.get(j['continent'], 'black')\n", + " plt.scatter(df_final['Confirmed'][i], df_final['Date'][i], s=200, alpha = 0.25, color=reg_color)\n", + "\n", + " \n", + "[plt.text( x=row['Confirmed'], y=row['Date'], s=row['Country/Region']) for k,row in df_final.iterrows()] \n", + "axes = plt.gca()\n", + "axes.set_ylim(['2020-01-20','2020-03-10'])\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/pandas/26.pandas-display-all-columns-and-show-more-rows.ipynb b/notebooks/pandas/26.pandas-display-all-columns-and-show-more-rows.ipynb new file mode 100644 index 0000000..b495c54 --- /dev/null +++ b/notebooks/pandas/26.pandas-display-all-columns-and-show-more-rows.ipynb @@ -0,0 +1,2177 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 26. Pandas Display All Columns and Show More Rows" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5043, 28)" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
2ColorSam Mendes602.0148.00.0161.0Rory Kinnear11000.0200074175.0Action|Adventure|Thriller...994.0EnglishUKPG-13245000000.02015.0393.06.82.3585000
3ColorChristopher Nolan813.0164.022000.023000.0Christian Bale27000.0448130642.0Action|Thriller...2701.0EnglishUSAPG-13250000000.02012.023000.08.52.35164000
4NaNDoug WalkerNaNNaN131.0NaNRob Walker131.0NaNDocumentary...NaNNaNNaNNaNNaNNaN12.07.1NaN0
..................................................................
5038ColorScott Smith1.087.02.0318.0Daphne Zuniga637.0NaNComedy|Drama...6.0EnglishCanadaNaNNaN2013.0470.07.7NaN84
5039ColorNaN43.043.0NaN319.0Valorie Curry841.0NaNCrime|Drama|Mystery|Thriller...359.0EnglishUSATV-14NaNNaN593.07.516.0032000
5040ColorBenjamin Roberds13.076.00.00.0Maxwell Moody0.0NaNDrama|Horror|Thriller...3.0EnglishUSANaN1400.02013.00.06.3NaN16
5041ColorDaniel Hsia14.0100.00.0489.0Daniel Henney946.010443.0Comedy|Drama|Romance...9.0EnglishUSAPG-13NaN2012.0719.06.32.35660
5042ColorJon Gunn43.090.016.016.0Brian Herzlinger86.085222.0Documentary...84.0EnglishUSAPG1100.02004.023.06.61.85456
\n", + "

5043 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "2 Color Sam Mendes 602.0 148.0 \n", + "3 Color Christopher Nolan 813.0 164.0 \n", + "4 NaN Doug Walker NaN NaN \n", + "... ... ... ... ... \n", + "5038 Color Scott Smith 1.0 87.0 \n", + "5039 Color NaN 43.0 43.0 \n", + "5040 Color Benjamin Roberds 13.0 76.0 \n", + "5041 Color Daniel Hsia 14.0 100.0 \n", + "5042 Color Jon Gunn 43.0 90.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "2 0.0 161.0 Rory Kinnear \n", + "3 22000.0 23000.0 Christian Bale \n", + "4 131.0 NaN Rob Walker \n", + "... ... ... ... \n", + "5038 2.0 318.0 Daphne Zuniga \n", + "5039 NaN 319.0 Valorie Curry \n", + "5040 0.0 0.0 Maxwell Moody \n", + "5041 0.0 489.0 Daniel Henney \n", + "5042 16.0 16.0 Brian Herzlinger \n", + "\n", + " actor_1_facebook_likes gross genres \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy \n", + "2 11000.0 200074175.0 Action|Adventure|Thriller \n", + "3 27000.0 448130642.0 Action|Thriller \n", + "4 131.0 NaN Documentary \n", + "... ... ... ... \n", + "5038 637.0 NaN Comedy|Drama \n", + "5039 841.0 NaN Crime|Drama|Mystery|Thriller \n", + "5040 0.0 NaN Drama|Horror|Thriller \n", + "5041 946.0 10443.0 Comedy|Drama|Romance \n", + "5042 86.0 85222.0 Documentary \n", + "\n", + " ... num_user_for_reviews language country content_rating budget \\\n", + "0 ... 3054.0 English USA PG-13 237000000.0 \n", + "1 ... 1238.0 English USA PG-13 300000000.0 \n", + "2 ... 994.0 English UK PG-13 245000000.0 \n", + "3 ... 2701.0 English USA PG-13 250000000.0 \n", + "4 ... NaN NaN NaN NaN NaN \n", + "... ... ... ... ... ... ... \n", + "5038 ... 6.0 English Canada NaN NaN \n", + "5039 ... 359.0 English USA TV-14 NaN \n", + "5040 ... 3.0 English USA NaN 1400.0 \n", + "5041 ... 9.0 English USA PG-13 NaN \n", + "5042 ... 84.0 English USA PG 1100.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "0 2009.0 936.0 7.9 1.78 \n", + "1 2007.0 5000.0 7.1 2.35 \n", + "2 2015.0 393.0 6.8 2.35 \n", + "3 2012.0 23000.0 8.5 2.35 \n", + "4 NaN 12.0 7.1 NaN \n", + "... ... ... ... ... \n", + "5038 2013.0 470.0 7.7 NaN \n", + "5039 NaN 593.0 7.5 16.00 \n", + "5040 2013.0 0.0 6.3 NaN \n", + "5041 2012.0 719.0 6.3 2.35 \n", + "5042 2004.0 23.0 6.6 1.85 \n", + "\n", + " movie_facebook_likes \n", + "0 33000 \n", + "1 0 \n", + "2 85000 \n", + "3 164000 \n", + "4 0 \n", + "... ... \n", + "5038 84 \n", + "5039 32000 \n", + "5040 16 \n", + "5041 660 \n", + "5042 456 \n", + "\n", + "[5043 rows x 28 columns]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
actor_1_facebook_likesgrossgenresactor_1_name
5640.073058679.0Action|Adventure|Sci-FiDaryl Sabara
624000.0336530303.0Action|Adventure|RomanceJ.K. Simmons
7799.0200807262.0Adventure|Animation|Comedy|Family|Fantasy|Musi...Brad Garrett
826000.0458991599.0Action|Adventure|Sci-FiChris Hemsworth
925000.0301956980.0Adventure|Family|Fantasy|MysteryAlan Rickman
\n", + "
" + ], + "text/plain": [ + " actor_1_facebook_likes gross \\\n", + "5 640.0 73058679.0 \n", + "6 24000.0 336530303.0 \n", + "7 799.0 200807262.0 \n", + "8 26000.0 458991599.0 \n", + "9 25000.0 301956980.0 \n", + "\n", + " genres actor_1_name \n", + "5 Action|Adventure|Sci-Fi Daryl Sabara \n", + "6 Action|Adventure|Romance J.K. Simmons \n", + "7 Adventure|Animation|Comedy|Family|Fantasy|Musi... Brad Garrett \n", + "8 Action|Adventure|Sci-Fi Chris Hemsworth \n", + "9 Adventure|Family|Fantasy|Mystery Alan Rickman " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.iloc[5:10,7:11]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #1: Display all columns and rows with Pandas options" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "pd.set_option('display.max_rows', None)\n", + "pd.set_option('display.max_columns', None)\n", + "pd.set_option('display.width', None)\n", + "pd.set_option('display.max_colwidth', None)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.\n", + " \"\"\"Entry point for launching an IPython kernel.\n" + ] + } + ], + "source": [ + "pd.set_option('display.max_colwidth', -1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #2: Display more or all rows " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "pd.reset_option('display.max_rows')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
genres
Drama236
Comedy209
Comedy|Drama191
Comedy|Drama|Romance187
Comedy|Romance158
......
Adventure|Animation|Comedy|Fantasy|Music|Romance1
Family|Fantasy|Music1
Action|Adventure|Drama|History|Romance|War1
Biography|Comedy|Crime|Drama|Romance1
Adventure|Comedy|Musical|Romance1
\n", + "

914 rows × 1 columns

\n", + "
" + ], + "text/plain": [ + " genres\n", + "Drama 236 \n", + "Comedy 209 \n", + "Comedy|Drama 191 \n", + "Comedy|Drama|Romance 187 \n", + "Comedy|Romance 158 \n", + "... ... \n", + "Adventure|Animation|Comedy|Fantasy|Music|Romance 1 \n", + "Family|Fantasy|Music 1 \n", + "Action|Adventure|Drama|History|Romance|War 1 \n", + "Biography|Comedy|Crime|Drama|Romance 1 \n", + "Adventure|Comedy|Musical|Romance 1 \n", + "\n", + "[914 rows x 1 columns]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.genres.value_counts(dropna=False).to_frame()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "pd.set_option('display.max_rows', 100)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "pd.set_option('display.max_rows', None)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Drama 236\n", + "Comedy 209\n", + "Comedy|Drama 191\n", + "Comedy|Drama|Romance 187\n", + "Comedy|Romance 158\n", + "Drama|Romance 152\n", + "Crime|Drama|Thriller 101\n", + "Horror 71 \n", + "Action|Crime|Drama|Thriller 68 \n", + "Action|Crime|Thriller 65 \n", + "Drama|Thriller 64 \n", + "Crime|Drama 63 \n", + "Horror|Thriller 56 \n", + "Crime|Drama|Mystery|Thriller 55 \n", + "Action|Adventure|Sci-Fi 51 \n", + "Comedy|Crime 51 \n", + "Documentary 51 \n", + "Action|Adventure|Thriller 46 \n", + "Drama|Mystery|Thriller 37 \n", + "Biography|Drama 35 \n", + "Action|Adventure|Sci-Fi|Thriller 35 \n", + "Horror|Mystery|Thriller 35 \n", + "Action|Comedy|Crime 30 \n", + "Action|Thriller 30 \n", + "Action|Adventure|Fantasy 30 \n", + "Horror|Mystery 29 \n", + "Adventure|Animation|Comedy|Family|Fantasy 29 \n", + "Drama|Music 27 \n", + "Drama|Sport 26 \n", + "Comedy|Family 26 \n", + "Biography|Drama|History 26 \n", + "Biography|Drama|Sport 26 \n", + "Adventure|Animation|Comedy|Family 24 \n", + "Comedy|Crime|Drama 24 \n", + "Drama|War 23 \n", + "Action|Comedy|Crime|Thriller 23 \n", + "Action|Sci-Fi 23 \n", + "Action|Drama|Thriller 22 \n", + "Mystery|Thriller 22 \n", + "Action|Crime|Drama|Mystery|Thriller 22 \n", + "Drama|History|War 21 \n", + "Drama|Horror|Mystery|Thriller 21 \n", + "Comedy|Family|Fantasy 20 \n", + "Adventure|Family|Fantasy 20 \n", + "Thriller 20 \n", + "Drama|Music|Romance 19 \n", + "Crime|Thriller 18 \n", + "Horror|Sci-Fi|Thriller 18 \n", + "Comedy|Drama|Music 18 \n", + "Comedy|Horror 18 \n", + "Fantasy|Horror 18 \n", + "Drama|Family 18 \n", + "Biography|Drama|Romance 17 \n", + "Comedy|Music 17 \n", + "Action|Sci-Fi|Thriller 16 \n", + "Crime|Drama|Romance|Thriller 16 \n", + "Comedy|Sport 16 \n", + "Biography|Crime|Drama 16 \n", + "Comedy|Fantasy 16 \n", + "Crime|Mystery|Thriller 15 \n", + "Comedy|Drama|Family 15 \n", + "Action|Comedy 15 \n", + "Comedy|Crime|Thriller 15 \n", + "Action|Horror|Sci-Fi|Thriller 15 \n", + "Drama|Sci-Fi|Thriller 15 \n", + "Drama|Fantasy|Romance 14 \n", + "Comedy|Drama|Romance|Sport 14 \n", + "Drama|Romance|War 14 \n", + "Adventure|Comedy 14 \n", + "Action|Adventure|Fantasy|Sci-Fi 13 \n", + "Crime|Drama|Romance 13 \n", + "Comedy|Fantasy|Romance 13 \n", + "Comedy|Family|Romance 13 \n", + "Drama|Horror|Thriller 13 \n", + "Adventure|Drama 13 \n", + "Action|Adventure|Drama|Thriller 13 \n", + "Drama|Mystery|Sci-Fi|Thriller 13 \n", + "Biography|Drama|Music 13 \n", + "Drama|Mystery|Romance|Thriller 12 \n", + "Action|Adventure|Comedy 12 \n", + "Action|Adventure|Fantasy|Sci-Fi|Thriller 12 \n", + "Adventure|Comedy|Family|Fantasy 12 \n", + "Adventure|Animation|Family|Fantasy 12 \n", + "Western 12 \n", + "Action|Crime|Sci-Fi|Thriller 11 \n", + "Adventure|Comedy|Sci-Fi 11 \n", + "Biography|Drama|History|Romance 11 \n", + "Drama|Horror|Sci-Fi|Thriller 11 \n", + "Comedy|Fantasy|Horror 11 \n", + "Action|Adventure 11 \n", + "Action 11 \n", + "Comedy|Crime|Romance 11 \n", + "Comedy|Drama|Fantasy|Romance 11 \n", + "Documentary|Music 10 \n", + "Action|Drama 10 \n", + "Drama|Mystery 10 \n", + "Action|Mystery|Thriller 10 \n", + "Drama|History 10 \n", + "Action|Horror|Thriller 10 \n", + "Drama|Sci-Fi 10 \n", + "Action|Drama|War 10 \n", + "Crime|Drama|Mystery 10 \n", + "Drama|Romance|Sci-Fi 10 \n", + "Drama|Fantasy 10 \n", + "Fantasy|Horror|Thriller 10 \n", + "Sci-Fi|Thriller 10 \n", + "Action|Horror|Sci-Fi 9 \n", + "Animation|Comedy|Family 9 \n", + "Comedy|Music|Romance 9 \n", + "Horror|Sci-Fi 9 \n", + "Drama|Musical|Romance 9 \n", + "Action|Adventure|Crime|Thriller 9 \n", + "Action|Drama|Sci-Fi|Thriller 9 \n", + "Fantasy|Horror|Mystery|Thriller 9 \n", + "Comedy|Sci-Fi 9 \n", + "Adventure|Fantasy 9 \n", + "Comedy|Drama|Music|Romance 9 \n", + "Drama|History|Thriller 9 \n", + "Comedy|Horror|Sci-Fi 8 \n", + "Action|Comedy|Sci-Fi 8 \n", + "Drama|Western 8 \n", + "Adventure|Drama|Romance 8 \n", + "Comedy|Crime|Drama|Thriller 8 \n", + "Action|Drama|History|War 8 \n", + "Adventure|Comedy|Drama 8 \n", + "Action|Adventure|Comedy|Family|Sci-Fi 8 \n", + "Adventure|Comedy|Family 8 \n", + "Drama|Romance|Western 8 \n", + "Comedy|Fantasy|Horror|Thriller 8 \n", + "Action|Adventure|Drama 7 \n", + "Biography|Drama|Thriller 7 \n", + "Action|Adventure|Drama|Romance 7 \n", + "Adventure|Drama|Thriller 7 \n", + "Adventure|Sci-Fi|Thriller 7 \n", + "Mystery|Sci-Fi|Thriller 7 \n", + "Action|Adventure|Drama|History|War 7 \n", + "Comedy|Documentary 7 \n", + "Comedy|Musical|Romance 7 \n", + "Adventure|Animation|Comedy|Family|Sci-Fi 7 \n", + "Action|Drama|Thriller|War 7 \n", + "Action|Crime|Mystery|Thriller 7 \n", + "Comedy|Romance|Sport 7 \n", + "Adventure|Drama|History 7 \n", + "Action|Comedy|Crime|Drama|Thriller 7 \n", + "Action|Adventure|Animation|Comedy|Family 6 \n", + "Action|Adventure|Drama|Fantasy 6 \n", + "Comedy|Drama|Sport 6 \n", + "Biography|Drama|History|War 6 \n", + "Action|Comedy|Romance 6 \n", + "Crime|Horror|Thriller 6 \n", + "Action|Crime|Drama|Romance|Thriller 6 \n", + "Animation|Comedy|Family|Fantasy 6 \n", + "Comedy|Horror|Thriller 6 \n", + "Action|Adventure|Fantasy|Thriller 6 \n", + "Drama|Romance|Thriller 6 \n", + "Comedy|Drama|Family|Romance 6 \n", + "Action|Adventure|Western 6 \n", + "Action|Mystery|Sci-Fi|Thriller 6 \n", + "Biography|Crime|Drama|Thriller 6 \n", + "Action|Adventure|Comedy|Crime 6 \n", + "Comedy|Crime|Mystery 6 \n", + "Drama|History|Romance|War 6 \n", + "Drama|Fantasy|Horror|Thriller 6 \n", + "Horror|Mystery|Sci-Fi|Thriller 6 \n", + "Drama|Romance|Sport 6 \n", + "Crime|Drama|Horror|Thriller 5 \n", + "Action|Horror 5 \n", + "Comedy|Drama|Musical|Romance 5 \n", + "Adventure|Drama|Family|Fantasy 5 \n", + "Action|Adventure|Comedy|Thriller 5 \n", + "Comedy|Family|Sci-Fi 5 \n", + "Action|Adventure|Comedy|Sci-Fi 5 \n", + "Documentary|Sport 5 \n", + "Comedy|Romance|Sci-Fi 5 \n", + "Action|Adventure|Drama|Sci-Fi 5 \n", + "Crime|Horror|Mystery|Thriller 5 \n", + "Biography|Comedy|Drama 5 \n", + "Drama|Horror 5 \n", + "Action|Adventure|Family|Fantasy 5 \n", + "Biography|Drama|Music|Musical 5 \n", + "Action|Adventure|Romance 5 \n", + "Adventure|Drama|Romance|War 5 \n", + "Family 5 \n", + "Adventure|Animation|Family 5 \n", + "Comedy|Musical 5 \n", + "Drama|History|Thriller|War 5 \n", + "Action|Fantasy|Horror|Thriller 5 \n", + "Comedy|Drama|Family|Sport 5 \n", + "Comedy|Family|Fantasy|Romance 5 \n", + "Adventure|Family 5 \n", + "Adventure|Horror|Thriller 5 \n", + "Action|Crime|Drama 5 \n", + "Drama|Mystery|Sci-Fi 5 \n", + "Action|Drama|Sport 5 \n", + "Adventure|Family|Fantasy|Mystery 5 \n", + "Comedy|Drama|Fantasy 4 \n", + "Action|Adventure|Romance|Sci-Fi 4 \n", + "Drama|Family|Fantasy|Romance 4 \n", + "Comedy|Western 4 \n", + "Comedy|Drama|War 4 \n", + "Action|Comedy|Horror 4 \n", + "Adventure|Biography|Drama|History|War 4 \n", + "Documentary|War 4 \n", + "Adventure|Mystery|Sci-Fi 4 \n", + "Drama|Mystery|Romance 4 \n", + "Action|Adventure|Animation|Comedy|Family|Fantasy 4 \n", + "Romance 4 \n", + "Drama|Musical 4 \n", + "Comedy|Drama|Family|Music|Musical|Romance 4 \n", + "Drama|Family|Romance 4 \n", + "Action|Adventure|Romance|Sci-Fi|Thriller 4 \n", + "Action|Comedy|Fantasy|Sci-Fi 4 \n", + "Biography|Drama|War 4 \n", + "Adventure|Drama|Fantasy|Romance 4 \n", + "Drama|Fantasy|Horror 4 \n", + "Drama|Family|Sport 4 \n", + "Action|Crime 4 \n", + "Adventure|Drama|Family 4 \n", + "Adventure|Drama|Western 4 \n", + "Action|Fantasy|Thriller 4 \n", + "Action|Fantasy|Horror 4 \n", + "Adventure|Drama|Sci-Fi|Thriller 4 \n", + "Drama|History|Sport 4 \n", + "Action|Comedy|Thriller 4 \n", + "Action|Adventure|History 4 \n", + "Comedy|Drama|Sci-Fi 4 \n", + "Action|Adventure|Comedy|Family|Fantasy|Sci-Fi 4 \n", + "Biography|Drama|Music|Romance 4 \n", + "Adventure|Drama|Sci-Fi 4 \n", + "Comedy|Crime|Drama|Romance 4 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi 4 \n", + "Biography|Documentary|Music 4 \n", + "Adventure|Comedy|Drama|Romance 4 \n", + "Drama|Horror|Sci-Fi 4 \n", + "Action|Adventure|Horror|Sci-Fi 4 \n", + "Adventure|Biography|Drama 3 \n", + "Adventure|Animation|Comedy|Family|Sport 3 \n", + "Crime|Drama|Horror|Mystery|Thriller 3 \n", + "Action|Adventure|Mystery|Sci-Fi|Thriller 3 \n", + "Action|Comedy|Fantasy 3 \n", + "Adventure 3 \n", + "Adventure|Comedy|Family|Romance 3 \n", + "Action|Crime|Fantasy|Thriller 3 \n", + "Drama|Family|Music 3 \n", + "Action|Crime|Romance|Thriller 3 \n", + "Musical|Romance 3 \n", + "Drama|Fantasy|Horror|Mystery|Thriller 3 \n", + "Drama|Music|Musical|Romance 3 \n", + "Action|Biography|Crime|Drama 3 \n", + "Biography|Comedy|Crime|Drama 3 \n", + "Drama|Fantasy|Thriller 3 \n", + "Comedy|Crime|Romance|Thriller 3 \n", + "Action|Animation|Comedy|Family|Sci-Fi 3 \n", + "Adventure|Animation|Comedy|Family|Romance 3 \n", + "Adventure|Family|Fantasy|Musical 3 \n", + "Adventure|Comedy|Fantasy 3 \n", + "Adventure|Animation|Comedy|Family|Musical 3 \n", + "Action|Adventure|Fantasy|Romance 3 \n", + "Action|Crime|Drama|Thriller|Western 3 \n", + "Fantasy|Horror|Mystery 3 \n", + "Action|Comedy|Sport 3 \n", + "Adventure|Comedy|Drama|Family|Fantasy 3 \n", + "Action|Adventure|Comedy|Fantasy 3 \n", + "Drama|Family|Musical|Romance 3 \n", + "Action|Biography|Drama|Sport 3 \n", + "Biography|Comedy|Drama|History 3 \n", + "Adventure|Drama|Fantasy 3 \n", + "Sci-Fi 3 \n", + "Horror|Mystery|Sci-Fi 3 \n", + "Comedy|Fantasy|Sci-Fi 3 \n", + "Drama|Horror|Mystery 3 \n", + "Drama|Fantasy|Mystery|Thriller 3 \n", + "Drama|Family|Fantasy 3 \n", + "Drama|Fantasy|Romance|Sci-Fi 3 \n", + "Comedy|Crime|Music 3 \n", + "Action|Adventure|Drama|History|Romance 3 \n", + "Comedy|Drama|Family|Fantasy 3 \n", + "Biography|Drama|Romance|Sport 3 \n", + "Documentary|Drama 3 \n", + "Adventure|Biography|Drama|Thriller 3 \n", + "Fantasy|Romance 3 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Musical 3 \n", + "Action|Romance|Thriller 3 \n", + "Comedy|War 3 \n", + "Action|Drama|Romance 3 \n", + "Comedy|Crime|Drama|Romance|Thriller 3 \n", + "Action|Adventure|Comedy|Romance|Sci-Fi 3 \n", + "Action|Adventure|Animation|Comedy|Family|Sci-Fi 3 \n", + "Action|Adventure|Horror|Sci-Fi|Thriller 3 \n", + "Comedy|Crime|Drama|Mystery|Romance 3 \n", + "Drama|Thriller|War 3 \n", + "Action|Comedy|Crime|Romance|Thriller 3 \n", + "Comedy|Mystery|Romance 2 \n", + "Animation|Comedy 2 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Romance 2 \n", + "Drama|Romance|Sci-Fi|Thriller 2 \n", + "Adventure|Comedy|Family|Fantasy|Sci-Fi 2 \n", + "Adventure|Animation|Comedy|Drama|Family|Musical 2 \n", + "Comedy|Horror|Musical 2 \n", + "Action|Adventure|Comedy|Western 2 \n", + "Comedy|Mystery 2 \n", + "Action|Comedy|Family 2 \n", + "Action|Drama|Western 2 \n", + "Action|Comedy|War 2 \n", + "Biography|Drama|History|Sport 2 \n", + "Action|Adventure|Crime|Mystery|Thriller 2 \n", + "Biography|Comedy|Drama|Sport 2 \n", + "Action|Drama|History|Romance|War 2 \n", + "Crime|Drama|Western 2 \n", + "Adventure|Comedy|Family|Sport 2 \n", + "Adventure|Mystery|Thriller 2 \n", + "Adventure|Comedy|Drama|Fantasy 2 \n", + "Comedy|Drama|Family|Music|Romance 2 \n", + "Adventure|Comedy|Mystery 2 \n", + "Animation 2 \n", + "Comedy|Horror|Mystery 2 \n", + "Biography|Drama|Romance|War 2 \n", + "Action|Comedy|Horror|Sci-Fi 2 \n", + "Adventure|Drama|Family|Fantasy|Sci-Fi 2 \n", + "Animation|Comedy|Family|Sci-Fi 2 \n", + "Comedy|Drama|Music|Musical 2 \n", + "Crime|Drama|Sport 2 \n", + "Adventure|Comedy|Romance 2 \n", + "Comedy|Drama|Romance|Sci-Fi 2 \n", + "Adventure|Fantasy|Mystery|Thriller 2 \n", + "Adventure|Family|Fantasy|Romance 2 \n", + "Animation|Comedy|Family|Fantasy|Music 2 \n", + "Biography|Drama|Sport|War 2 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Musical|Romance 2 \n", + "Action|Drama|Fantasy|War 2 \n", + "Adventure|Animation|Family|Sci-Fi 2 \n", + "Action|Adventure|Thriller|War 2 \n", + "Action|Comedy|Crime|Drama 2 \n", + "Comedy|Horror|Romance 2 \n", + "Drama|Horror|Romance|Thriller 2 \n", + "Animation|Family|Fantasy|Music 2 \n", + "Biography|Comedy|Drama|Romance 2 \n", + "Documentary|History|Music 2 \n", + "Drama|Horror|Mystery|Sci-Fi|Thriller 2 \n", + "Action|Adventure|Drama|Romance|War 2 \n", + "Comedy|Drama|Horror|Romance 2 \n", + "Biography|Comedy|Romance 2 \n", + "Action|Biography|Drama|History 2 \n", + "Adventure|Drama|Mystery 2 \n", + "Action|Adventure|Crime|Drama|Thriller 2 \n", + "Action|Adventure|Drama|Sci-Fi|Thriller 2 \n", + "Biography|Drama|Thriller|War 2 \n", + "Comedy|Family|Sport 2 \n", + "Fantasy 2 \n", + "Action|Drama|Fantasy|Romance 2 \n", + "Action|Adventure|Animation|Family|Fantasy|Sci-Fi 2 \n", + "Adventure|Comedy|Fantasy|Sci-Fi 2 \n", + "Action|Adventure|Animation|Family|Sci-Fi 2 \n", + "Animation|Comedy|Family|Mystery|Sci-Fi 2 \n", + "Action|Comedy|Crime|Romance 2 \n", + "Adventure|Animation|Fantasy 2 \n", + "Comedy|Drama|Thriller 2 \n", + "Action|Drama|Sci-Fi 2 \n", + "Action|Fantasy|Horror|Sci-Fi|Thriller 2 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Music 2 \n", + "Fantasy|Horror|Sci-Fi 2 \n", + "Action|Comedy|Crime|Fantasy 2 \n", + "Animation|Family 2 \n", + "Action|Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi 2 \n", + "Biography|Drama|History|Thriller 2 \n", + "Action|Drama|Fantasy|Mystery|Thriller 2 \n", + "Comedy|Drama|Family|Fantasy|Romance 2 \n", + "Animation|Comedy|Drama 2 \n", + "Action|Comedy|Documentary 2 \n", + "Action|Adventure|Drama|History 2 \n", + "Crime|Drama|Music 2 \n", + "Adventure|Drama|War 2 \n", + "Action|Comedy|Romance|Thriller 2 \n", + "Comedy|Fantasy|Horror|Romance 2 \n", + "Biography|Crime|Drama|Romance 2 \n", + "Crime|Romance|Thriller 2 \n", + "Adventure|Animation|Comedy 2 \n", + "Action|Adventure|Animation|Comedy|Drama|Family|Sci-Fi 2 \n", + "Action|Fantasy 2 \n", + "Comedy|Romance|Thriller 2 \n", + "Crime|Documentary 2 \n", + "Action|Adventure|Family|Sci-Fi 2 \n", + "Adventure|Horror 2 \n", + "Comedy|Drama|Musical 2 \n", + "Action|Adventure|Comedy|Romance 2 \n", + "Action|Comedy|Family|Fantasy 2 \n", + "Action|Drama|Mystery|Sci-Fi 2 \n", + "Drama|Fantasy|Musical|Romance 2 \n", + "Comedy|Drama|Horror|Sci-Fi 2 \n", + "Action|Adventure|Mystery|Sci-Fi 2 \n", + "Action|Crime|Mystery|Romance|Thriller 2 \n", + "Comedy|Crime|Family 2 \n", + "Mystery|Romance|Thriller 2 \n", + "Drama|Fantasy|Horror|Mystery 2 \n", + "Action|Drama|Fantasy 2 \n", + "Crime|Documentary|War 2 \n", + "Action|Crime|Drama|Sci-Fi|Thriller 2 \n", + "Comedy|Drama|Romance|Thriller 2 \n", + "Documentary|History 2 \n", + "Animation|Family|Fantasy|Musical 2 \n", + "Action|Drama|Family|Sport 2 \n", + "Adventure|Comedy|Family|Musical 2 \n", + "Action|Drama|Horror|Thriller 2 \n", + "Biography|Crime|Drama|History 2 \n", + "Action|Adventure|Horror|Thriller 2 \n", + "Family|Sci-Fi 2 \n", + "Animation|Comedy|Family|Fantasy|Musical 2 \n", + "Action|Sci-Fi|Sport 2 \n", + "Action|Adventure|Drama|Horror|Sci-Fi 2 \n", + "Action|Adventure|Animation|Family|Fantasy 2 \n", + "Adventure|Animation|Comedy|Drama|Family 2 \n", + "Biography|Documentary 2 \n", + "Action|Comedy|Sci-Fi|Thriller 2 \n", + "Action|Crime|Sport|Thriller 2 \n", + "Action|Comedy|Drama|Thriller 2 \n", + "Drama|Mystery|Romance|War 2 \n", + "Drama|History|War|Western 2 \n", + "Drama|Romance|War|Western 2 \n", + "Adventure|Comedy|Family|Fantasy|Horror 2 \n", + "Adventure|Animation|Comedy|Drama|Family|Fantasy|Musical 1 \n", + "History 1 \n", + "Adventure|Animation|Comedy|Fantasy|Romance 1 \n", + "Animation|Comedy|Family|Musical 1 \n", + "Game-Show|Reality-TV|Romance 1 \n", + "Adventure|Comedy|Drama|Fantasy|Romance 1 \n", + "Adventure|Fantasy|Horror|Mystery|Thriller 1 \n", + "Comedy|Romance|Sci-Fi|Thriller 1 \n", + "Comedy|Horror|Mystery|Thriller 1 \n", + "Adventure|Comedy|History|Romance 1 \n", + "Biography|Comedy|Drama|Music 1 \n", + "Comedy|Drama|Music|Musical|Romance 1 \n", + "Action|Adventure|Animation|Comedy|Fantasy|Sci-Fi 1 \n", + "Adventure|Biography|Drama|Romance 1 \n", + "Adventure|Animation|Drama|Family|Musical 1 \n", + "Drama|Fantasy|Horror|Romance 1 \n", + "Biography|Crime|Drama|Western 1 \n", + "Adventure|Family|Fantasy|Horror|Mystery 1 \n", + "Comedy|Mystery|Sci-Fi|Thriller 1 \n", + "Adventure|Animation|Fantasy|Horror|Sci-Fi 1 \n", + "Comedy|Crime|Drama|Horror|Mystery|Thriller 1 \n", + "Action|Drama|Fantasy|Sci-Fi 1 \n", + "Action|Biography|Drama|History|War 1 \n", + "Comedy|Drama|Mystery|Romance|Thriller 1 \n", + "Drama|Mystery|Romance|Thriller|War 1 \n", + "Adventure|Animation|Family|Musical 1 \n", + "Action|Crime|Drama|Western 1 \n", + "Adventure|Drama|Thriller|Western 1 \n", + "Action|Animation|Comedy|Sci-Fi 1 \n", + "Adventure|Drama|Family|Romance|Western 1 \n", + "Romance|Short 1 \n", + "Adventure|Animation|Comedy|Crime|Family 1 \n", + "Adventure|Fantasy|Mystery 1 \n", + "Drama|Family|Music|Musical 1 \n", + "Romance|Sci-Fi|Thriller 1 \n", + "Drama|Music|Mystery|Romance 1 \n", + "Adventure|Drama|History|War 1 \n", + "Comedy|Fantasy|Thriller 1 \n", + "Adventure|Comedy|Family|Fantasy|Horror|Mystery 1 \n", + "Action|Drama|History|Thriller 1 \n", + "Animation|Comedy|Family|Horror|Sci-Fi 1 \n", + "Biography|Crime|Documentary|History 1 \n", + "Adventure|Animation|Drama|Family|History|Musical|Romance 1 \n", + "Thriller|Western 1 \n", + "Comedy|Drama|Family|Musical 1 \n", + "Comedy|Crime|Drama|Thriller|War 1 \n", + "Animation|Comedy|Family|Romance 1 \n", + "Comedy|Family|Fantasy|Musical 1 \n", + "Comedy|Documentary|Drama 1 \n", + "Adventure|Comedy|Crime|Family|Mystery 1 \n", + "Action|Drama|History|Thriller|War 1 \n", + "Comedy|Crime|Musical 1 \n", + "Animation|Drama|Family|Fantasy 1 \n", + "Action|Adventure|Drama|Thriller|Western 1 \n", + "Crime|Drama|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Drama|Romance|Thriller 1 \n", + "Action|Comedy|Drama|Family|Thriller 1 \n", + "Action|Adventure|Drama|Romance|Western 1 \n", + "Comedy|Drama|Romance|War 1 \n", + "Biography|Crime|Drama|Romance|Thriller 1 \n", + "Adventure|Comedy|Crime|Drama|Family 1 \n", + "Comedy|Crime|Family|Sci-Fi 1 \n", + "Drama|Mystery|War 1 \n", + "Action|Adventure|Biography|Drama|History 1 \n", + "Action|Adventure|Family|Thriller 1 \n", + "Drama|Music|Musical 1 \n", + "Comedy|Crime|Musical|Romance 1 \n", + "Crime|Drama|Fantasy|Romance 1 \n", + "Action|Adventure|Crime|Fantasy|Mystery|Thriller 1 \n", + "Drama|Fantasy|War 1 \n", + "Action|Animation|Fantasy|Horror|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Fantasy|War 1 \n", + "Comedy|Drama|History|Romance 1 \n", + "Action|Adventure|Romance|War 1 \n", + "Fantasy|Mystery|Romance|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Comedy|Family|Romance|Sci-Fi 1 \n", + "Comedy|History 1 \n", + "Adventure|Comedy|Family|Romance|Sci-Fi 1 \n", + "Adventure|Animation|Family|Fantasy|Musical 1 \n", + "Comedy|Crime|Sport 1 \n", + "Thriller|War 1 \n", + "Drama|Music|Romance|War 1 \n", + "Biography|Crime|Drama|History|Romance 1 \n", + "Comedy|Mystery|Thriller 1 \n", + "Biography|Crime|Drama|Music 1 \n", + "Action|Crime|Drama|Sport 1 \n", + "Drama|Fantasy|Romance|Thriller 1 \n", + "Drama|Film-Noir|Mystery|Thriller 1 \n", + "Action|Comedy|Drama 1 \n", + "Drama|War|Western 1 \n", + "Film-Noir|Mystery|Romance|Thriller 1 \n", + "Action|Horror|Mystery|Sci-Fi|Thriller 1 \n", + "Adventure|Crime|Drama|Romance 1 \n", + "Biography|Comedy|Drama|Music|Romance 1 \n", + "Drama|Music|Mystery|Romance|Sci-Fi 1 \n", + "Biography|Documentary|Sport 1 \n", + "Adventure|Animation|Comedy|Family|Western 1 \n", + "Action|Comedy|Crime|Music 1 \n", + "Action|Adventure|Comedy|Crime|Family|Romance|Thriller 1 \n", + "Action|Comedy|Drama|Music 1 \n", + "Animation|Biography|Documentary|Drama|History|War 1 \n", + "Fantasy|Horror|Romance|Thriller 1 \n", + "Action|Drama|Romance|Sci-Fi|Thriller 1 \n", + "Action|Comedy|Crime|Fantasy|Horror|Mystery|Sci-Fi|Thriller 1 \n", + "Comedy|Crime|Family|Mystery|Romance|Thriller 1 \n", + "Adventure|Comedy|Romance|Sci-Fi 1 \n", + "Comedy|Drama|Horror 1 \n", + "Crime|Drama|History|Romance 1 \n", + "Action|Crime|Drama|Thriller|War 1 \n", + "Action|Crime|Drama|History|Western 1 \n", + "Adventure|Biography|Drama|History|Sport|Thriller 1 \n", + "Comedy|Drama|Fantasy|Horror 1 \n", + "Adventure|Animation|Family|Sport 1 \n", + "Action|Adventure|Drama|Mystery 1 \n", + "Animation|Comedy|Fantasy 1 \n", + "Crime|Film-Noir|Thriller 1 \n", + "Documentary|Drama|War 1 \n", + "Adventure|Crime|Drama|Mystery|Western 1 \n", + "Animation|Comedy|Fantasy|Musical 1 \n", + "Action|Adventure|Comedy|Family|Fantasy|Mystery|Sci-Fi 1 \n", + "Biography|Comedy|Drama|History|Music|Musical 1 \n", + "Action|Adventure|Drama|Thriller|War 1 \n", + "Adventure|Comedy|Sport 1 \n", + "Biography|Drama|History|Music 1 \n", + "Comedy|Family|Music|Musical 1 \n", + "Animation|Comedy|Family|Music|Western 1 \n", + "Drama|Fantasy|Sci-Fi 1 \n", + "Action|Biography|Drama|History|Romance|Western 1 \n", + "Biography|Crime|Drama|History|Thriller 1 \n", + "Action|Adventure|Comedy|Music|Thriller 1 \n", + "Biography|Drama|Fantasy|History 1 \n", + "Animation|Family|Fantasy 1 \n", + "Drama|Fantasy|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Comedy|Family|Romance 1 \n", + "Action|Drama|Fantasy|Horror|War 1 \n", + "Comedy|Drama|Romance|Western 1 \n", + "Animation|Drama|Family|Fantasy|Musical|Romance 1 \n", + "Action|Fantasy|Romance|Sci-Fi 1 \n", + "Adventure|Drama|History|Romance 1 \n", + "Action|Biography|Drama 1 \n", + "Action|Adventure|Comedy|Drama|Thriller 1 \n", + "Comedy|Short 1 \n", + "Action|Adventure|Comedy|Crime|Mystery|Thriller 1 \n", + "Adventure|Comedy|Drama|Romance|Sci-Fi 1 \n", + "Adventure|Comedy|Family|Mystery|Sci-Fi 1 \n", + "Action|Adventure|Comedy|Sci-Fi|Thriller 1 \n", + "Action|Drama|Fantasy|Thriller|Western 1 \n", + "Biography|Comedy|Drama|Family|Sport 1 \n", + "Action|Adventure|Crime|Drama|Mystery|Thriller 1 \n", + "Action|Animation|Comedy|Family|Fantasy|Sci-Fi 1 \n", + "Action|Adventure|Comedy|Family|Mystery 1 \n", + "Adventure|Family|Romance 1 \n", + "Adventure|Comedy|Fantasy|Music|Sci-Fi 1 \n", + "Drama|Musical|Romance|Thriller 1 \n", + "Crime|Documentary|News 1 \n", + "Comedy|Drama|Reality-TV|Romance 1 \n", + "Action|Drama|Fantasy|Horror|Thriller 1 \n", + "Drama|History|Music|Romance|War 1 \n", + "Action|Crime|Horror|Sci-Fi|Thriller 1 \n", + "Comedy|Family|Musical|Romance 1 \n", + "Action|Comedy|Horror|Thriller 1 \n", + "Comedy|Family|Romance|Sci-Fi 1 \n", + "Action|Adventure|Romance|Thriller 1 \n", + "Animation|Drama|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Family|Fantasy|Musical 1 \n", + "Adventure|Crime|Drama 1 \n", + "Action|Adventure|Animation|Drama|Mystery|Sci-Fi|Thriller 1 \n", + "Comedy|Drama|Mystery|Romance|Thriller|War 1 \n", + "Drama|Horror|Romance 1 \n", + "Action|Sci-Fi|War 1 \n", + "Action|Drama|Romance|Thriller 1 \n", + "Action|Comedy|Drama|Western 1 \n", + "Crime|Horror|Music|Thriller 1 \n", + "Documentary|Drama|Sport 1 \n", + "Family|Fantasy|Musical 1 \n", + "Biography|Crime|Documentary|History|Thriller 1 \n", + "Adventure|Drama|History|Romance|War 1 \n", + "Horror|Musical 1 \n", + "Horror|Musical|Sci-Fi 1 \n", + "Animation|Biography|Drama|War 1 \n", + "Action|Adventure|Fantasy|Horror|Sci-Fi|Thriller 1 \n", + "Comedy|Crime|Drama|Horror|Thriller 1 \n", + "Comedy|Sci-Fi|Thriller 1 \n", + "Comedy|Drama|Music|War 1 \n", + "Crime|Drama|Horror 1 \n", + "Drama|History|Horror 1 \n", + "Crime|Drama|Mystery|Romance|Thriller 1 \n", + "Drama|Fantasy|Romance|War 1 \n", + "Adventure|Animation|Family|Thriller 1 \n", + "Adventure|Horror|Mystery 1 \n", + "Mystery|Romance|Sci-Fi|Thriller 1 \n", + "Documentary|History|Sport 1 \n", + "Crime|Documentary|Drama 1 \n", + "Comedy|Thriller 1 \n", + "Action|Fantasy|Horror|Sci-Fi 1 \n", + "Adventure|Drama|Romance|Western 1 \n", + "Action|Adventure|Fantasy|Horror|Sci-Fi 1 \n", + "Action|Animation|Comedy|Family|Fantasy 1 \n", + "Adventure|Comedy|Western 1 \n", + "Action|Thriller|Western 1 \n", + "Action|Crime|Horror|Thriller 1 \n", + "Comedy|Crime|Family|Romance 1 \n", + "Crime|Drama|Music|Romance 1 \n", + "Drama|Family|Music|Romance 1 \n", + "Adventure|Comedy|Drama|Family|Sport 1 \n", + "Adventure|Documentary 1 \n", + "Biography|Comedy|Drama|Family|Romance 1 \n", + "Action|Adventure|Animation|Comedy|Sci-Fi 1 \n", + "Horror|Romance|Sci-Fi 1 \n", + "Action|Adventure|Romance|Western 1 \n", + "Action|Adventure|Animation|Comedy|Crime|Family|Fantasy 1 \n", + "Adventure|Comedy|Family|Fantasy|Musical 1 \n", + "Comedy|Drama|Musical|Romance|War 1 \n", + "Action|Adventure|Comedy|Family 1 \n", + "Biography|Crime|Drama|History|Music 1 \n", + "Action|Adventure|Comedy|Drama|War 1 \n", + "Action|Adventure|Fantasy|Horror|Thriller 1 \n", + "Action|Adventure|Drama|Fantasy|Sci-Fi 1 \n", + "Action|Adventure|Comedy|Fantasy|Romance 1 \n", + "Adventure|Comedy|Family|Fantasy|Music|Sci-Fi 1 \n", + "Comedy|Documentary|Music 1 \n", + "Adventure|Animation|Drama|Family|Fantasy|Musical|Mystery|Romance 1 \n", + "Musical 1 \n", + "Adventure|Comedy|Family|Sci-Fi 1 \n", + "Adventure|Comedy|Music|Sci-Fi 1 \n", + "Family|Music|Romance 1 \n", + "Action|Crime|Fantasy|Romance|Thriller 1 \n", + "Comedy|Family|Fantasy|Sci-Fi 1 \n", + "Family|Musical 1 \n", + "Action|Comedy|Sci-Fi|Western 1 \n", + "Adventure|Drama|Mystery|Sci-Fi|Thriller 1 \n", + "Adventure|Animation|Comedy|Fantasy 1 \n", + "Comedy|Fantasy|Horror|Musical 1 \n", + "Action|Animation|Comedy|Crime|Family 1 \n", + "Comedy|Drama|Musical|Romance|Western 1 \n", + "Comedy|Crime|Mystery|Romance 1 \n", + "Action|Comedy|Crime|Family 1 \n", + "Action|Horror|Romance|Sci-Fi|Thriller 1 \n", + "Action|Western 1 \n", + "Biography|Crime|Drama|War 1 \n", + "Crime|Drama|Mystery|Sci-Fi|Thriller 1 \n", + "Adventure|Comedy|History 1 \n", + "Comedy|Family|Music 1 \n", + "Comedy|Crime|Drama|Mystery|Thriller 1 \n", + "Adventure|Crime|Thriller 1 \n", + "Crime|Horror 1 \n", + "Action|Adventure|Comedy|Fantasy|Sci-Fi 1 \n", + "Comedy|Horror|Musical|Sci-Fi 1 \n", + "Adventure|Drama|Fantasy|Thriller|Western 1 \n", + "Drama|Family|History|Musical 1 \n", + "Action|Crime|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Drama|Music|Romance 1 \n", + "Adventure|Biography 1 \n", + "Action|Adventure|Comedy|Fantasy|Thriller 1 \n", + "Adventure|Biography|Drama|Horror|Thriller 1 \n", + "Action|Adventure|Crime|Drama 1 \n", + "Action|Adventure|Crime|Drama|Romance 1 \n", + "Comedy|Crime|Horror|Thriller 1 \n", + "Action|Drama|History 1 \n", + "Action|Adventure|Comedy|Crime|Music|Mystery 1 \n", + "Fantasy|Horror|Mystery|Romance 1 \n", + "Drama|History|Romance 1 \n", + "Crime|Drama|Mystery|Thriller|Western 1 \n", + "Action|Comedy|Drama|Sci-Fi 1 \n", + "Action|Adventure|Drama|War 1 \n", + "Adventure|Comedy|Crime 1 \n", + "Crime|Drama|Film-Noir|Mystery|Thriller 1 \n", + "Action|Animation|Sci-Fi|Thriller 1 \n", + "Comedy|Crime|Drama|Sci-Fi 1 \n", + "Action|Comedy|Crime|Music|Romance|Thriller 1 \n", + "Adventure|Comedy|Drama|Family|Mystery 1 \n", + "Adventure|Animation 1 \n", + "Action|Adventure|Comedy|Romance|Thriller|Western 1 \n", + "Drama|Family|Musical 1 \n", + "Drama|Musical|Sci-Fi 1 \n", + "Action|Adventure|Family|Fantasy|Sci-Fi 1 \n", + "Adventure|Crime|Drama|Thriller 1 \n", + "Action|Adventure|Animation|Drama|Fantasy|Sci-Fi 1 \n", + "Comedy|Family|Fantasy|Sport 1 \n", + "Biography|Drama|History|Thriller|War 1 \n", + "Action|Fantasy|Sci-Fi|Thriller 1 \n", + "Adventure|Animation|Drama|Family|Fantasy 1 \n", + "Adventure|Animation|Comedy|Drama|Family|Fantasy|Sci-Fi 1 \n", + "Drama|Film-Noir 1 \n", + "Drama|History|Romance|Western 1 \n", + "Comedy|Crime|Sci-Fi|Thriller 1 \n", + "Comedy|Family|Musical|Romance|Short 1 \n", + "Crime|Drama|Musical|Romance|Thriller 1 \n", + "Action|Crime|Drama|Mystery 1 \n", + "Action|Adventure|Drama|Western 1 \n", + "Animation|Drama|Family 1 \n", + "Comedy|Family|Music|Romance 1 \n", + "Action|Drama|Sci-Fi|Sport 1 \n", + "Adventure|Biography|Documentary|Drama 1 \n", + "Horror|Sci-Fi|Short|Thriller 1 \n", + "Action|Adventure|Crime|Drama|Family|Fantasy|Romance|Thriller 1 \n", + "Adventure|Animation|Comedy|Drama|Family|Fantasy|Romance 1 \n", + "Comedy|Documentary|War 1 \n", + "Action|Biography|Crime|Drama|Thriller 1 \n", + "Drama|Fantasy|Music|Romance 1 \n", + "Action|Adventure|Animation|Fantasy 1 \n", + "Adventure|Biography|Drama|War 1 \n", + "Comedy|Drama|Fantasy|Music|Romance 1 \n", + "Biography|Comedy|Crime|Drama|Romance|Thriller 1 \n", + "Drama|Mystery|Romance|Sci-Fi|Thriller 1 \n", + "Comedy|Drama|Horror|Sci-Fi|Thriller 1 \n", + "Action|Drama|Mystery|Thriller|War 1 \n", + "Action|Adventure|Family|Mystery|Sci-Fi 1 \n", + "Action|Adventure|Family|Sci-Fi|Thriller 1 \n", + "Comedy|Fantasy|Musical|Sci-Fi 1 \n", + "Drama|Fantasy|Mystery|Romance 1 \n", + "Action|Adventure|Crime 1 \n", + "Animation|Family|Fantasy|Musical|Romance 1 \n", + "Action|Comedy|Drama|War 1 \n", + "Adventure|Comedy|Drama|Family|Romance 1 \n", + "Comedy|Family|Fantasy|Music|Romance 1 \n", + "Comedy|Family|Fantasy|Horror|Mystery 1 \n", + "Fantasy|Mystery|Thriller 1 \n", + "Adventure|Documentary|Short 1 \n", + "Action|Biography|Documentary|Sport 1 \n", + "Crime|Drama|History 1 \n", + "Comedy|Crime|Drama|Music|Romance 1 \n", + "Adventure|Comedy|Crime|Family|Musical 1 \n", + "Action|Adventure|Comedy|Romance|Thriller 1 \n", + "Action|Adventure|Comedy|Musical 1 \n", + "Adventure|Crime|Drama|Mystery|Thriller 1 \n", + "Adventure|Drama|Fantasy|Mystery 1 \n", + "Crime|Drama|Musical 1 \n", + "Crime|Drama|Film-Noir 1 \n", + "Action|Adventure|Comedy|Fantasy|Mystery 1 \n", + "Adventure|Drama|History|Thriller|War 1 \n", + "Drama|Family|Western 1 \n", + "Documentary|Family 1 \n", + "Biography|Drama|Family|Musical|Romance 1 \n", + "Action|Fantasy|Western 1 \n", + "Animation|Drama 1 \n", + "Action|Drama|Mystery|Thriller 1 \n", + "Biography|Drama|Romance|Western 1 \n", + "Action|Crime|Sci-Fi 1 \n", + "Action|Comedy|Fantasy|Horror 1 \n", + "Action|Comedy|Crime|Sci-Fi|Thriller 1 \n", + "Animation|Comedy|Crime|Drama|Family 1 \n", + "Comedy|Drama|History|Musical|Romance 1 \n", + "Adventure|Comedy|Drama|Fantasy|Musical 1 \n", + "Action|Animation|Crime|Sci-Fi|Thriller 1 \n", + "Biography|Comedy|Drama|War 1 \n", + "Adventure|Documentary|Drama|Sport 1 \n", + "Adventure|Animation|Sci-Fi 1 \n", + "Adventure|Animation|Biography|Drama|Family|Fantasy|Musical 1 \n", + "Adventure|Comedy|Crime|Drama 1 \n", + "Biography|Crime|Drama|History|Western 1 \n", + "Action|Drama|History|Romance|War|Western 1 \n", + "Adventure|Animation|Family|Western 1 \n", + "Adventure|Sci-Fi 1 \n", + "Adventure|Comedy|Drama|Family 1 \n", + "Crime|Drama|Fantasy 1 \n", + "Animation|Comedy|Drama|Fantasy|Sci-Fi 1 \n", + "Crime|Thriller|War 1 \n", + "Crime|Fantasy|Horror 1 \n", + "Action|Crime|Drama|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Comedy|Crime|Drama|Romance|Thriller 1 \n", + "Adventure|Family|Fantasy|Sci-Fi 1 \n", + "Action|Adventure|Animation|Comedy|Fantasy 1 \n", + "Animation|Comedy|Family|Sport 1 \n", + "Action|Comedy|Fantasy|Romance 1 \n", + "Adventure|Family|Fantasy|Music|Musical 1 \n", + "Adventure|Drama|Horror|Mystery|Thriller 1 \n", + "Action|Horror|Mystery|Thriller 1 \n", + "Action|Family|Sport 1 \n", + "Biography|Drama|Family|History|Sport 1 \n", + "Action|Biography|Drama|Thriller|War 1 \n", + "Comedy|Family|Romance|Sport 1 \n", + "Action|Adventure|Drama|Romance|Sci-Fi 1 \n", + "Adventure|Family|Sci-Fi 1 \n", + "Adventure|Horror|Sci-Fi 1 \n", + "Drama|Fantasy|Sport 1 \n", + "Biography|Documentary|History 1 \n", + "Action|Adventure|Comedy|Drama|Music|Sci-Fi 1 \n", + "Crime|Drama|Mystery|Romance 1 \n", + "Drama|Fantasy|Mystery|Sci-Fi 1 \n", + "Adventure|Drama|History|Romance|Thriller|War 1 \n", + "Action|War 1 \n", + "Action|Drama|Fantasy|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Family|Fantasy|Romance 1 \n", + "Action|Adventure|Family|Mystery 1 \n", + "Action|Adventure|Drama|Family 1 \n", + "Biography 1 \n", + "Action|Drama|Romance|War 1 \n", + "Fantasy|Thriller 1 \n", + "Action|Adventure|Fantasy|Horror 1 \n", + "Documentary|News 1 \n", + "Action|Comedy|Sci-Fi|Sport 1 \n", + "Action|Adventure|Family|Fantasy|Sci-Fi|Thriller 1 \n", + "Crime|Drama|Music|Mystery|Thriller 1 \n", + "Adventure|War|Western 1 \n", + "Drama|Horror|Mystery|Sci-Fi 1 \n", + "Drama|Music|Mystery|Romance|Thriller 1 \n", + "Action|Adventure|Drama|Fantasy|War 1 \n", + "Action|Adventure|Animation|Family 1 \n", + "Adventure|Comedy|Horror|Sci-Fi 1 \n", + "Action|Fantasy|Horror|Mystery 1 \n", + "Adventure|Comedy|Family|Fantasy|Romance|Sport 1 \n", + "Action|Adventure|Crime|Drama|Sci-Fi|Thriller 1 \n", + "Action|Biography|Drama|History|Thriller|War 1 \n", + "Adventure|Biography|Crime|Drama|Western 1 \n", + "Action|Sport 1 \n", + "Comedy|Fantasy|Mystery 1 \n", + "Biography|Drama|Family|Sport 1 \n", + "Comedy|Music|Sci-Fi 1 \n", + "Documentary|Drama|History|News 1 \n", + "Mystery|Western 1 \n", + "Action|Adventure|Mystery|Romance|Thriller 1 \n", + "Comedy|Horror|Sci-Fi|Thriller 1 \n", + "Comedy|Crime|Drama|Mystery 1 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi|Sport 1 \n", + "Action|Drama|Romance|Sport 1 \n", + "Animation|Family|Fantasy|Mystery 1 \n", + "Action|Animation|Sci-Fi 1 \n", + "Action|Adventure|History|Western 1 \n", + "Adventure|Drama|Horror|Thriller 1 \n", + "Documentary|Family|Music 1 \n", + "Biography|Documentary|Drama 1 \n", + "Adventure|Animation|Comedy|Drama|Family|Fantasy 1 \n", + "Biography|Drama|History|Musical 1 \n", + "Action|Fantasy|Horror|Mystery|Thriller 1 \n", + "Action|Adventure|Animation|Family|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Animation|Fantasy|Romance|Sci-Fi 1 \n", + "Action|Biography|Drama|History|Romance|War 1 \n", + "Adventure|Animation|Comedy|Family|War 1 \n", + "Comedy|Documentary|Drama|Fantasy|Mystery|Sci-Fi 1 \n", + "Adventure|Drama|Fantasy|Mystery|Thriller 1 \n", + "Animation|Comedy|Family|Music|Romance 1 \n", + "Action|Comedy|Mystery 1 \n", + "Animation|Comedy|Drama|Family|Musical 1 \n", + "Adventure|Biography|Drama|History 1 \n", + "Drama|Fantasy|Mystery|Romance|Thriller 1 \n", + "Crime|Drama|Music|Thriller 1 \n", + "Adventure|Comedy|Crime|Romance 1 \n", + "Action|Biography|Crime|Drama|Family|Fantasy 1 \n", + "Action|Romance|Sport 1 \n", + "Biography|Comedy|Drama|History|Music 1 \n", + "Animation|Drama|Family|Musical|Romance 1 \n", + "Action|Adventure|Family|Fantasy|Thriller 1 \n", + "Biography|Drama|Family 1 \n", + "Fantasy|Horror|Romance 1 \n", + "Action|Adventure|Comedy|Family|Fantasy 1 \n", + "Horror|Romance|Thriller 1 \n", + "Comedy|Drama|Family|Fantasy|Musical 1 \n", + "Biography|Comedy|Musical|Romance|Western 1 \n", + "Animation|Comedy|Family|Fantasy|Musical|Romance 1 \n", + "Animation|Comedy|Family|Fantasy|Sci-Fi 1 \n", + "Adventure|Fantasy|Thriller 1 \n", + "Adventure|Family|Sport 1 \n", + "Adventure|Crime|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Adventure|History|Romance 1 \n", + "Animation|Comedy|Family|Fantasy|Mystery 1 \n", + "Action|Animation|Comedy|Family 1 \n", + "Action|Adventure|Animation|Comedy|Drama|Family|Fantasy|Thriller 1 \n", + "Adventure|Crime|Drama|Western 1 \n", + "Action|Adventure|Comedy|Crime|Thriller 1 \n", + "Music 1 \n", + "Action|Comedy|Music 1 \n", + "Adventure|Drama|Family|Mystery 1 \n", + "Biography|Comedy|Musical 1 \n", + "Adventure|Comedy|Horror 1 \n", + "Adventure|Animation|Comedy|Crime 1 \n", + "Biography|Comedy|Documentary 1 \n", + "Action|Comedy|Mystery|Romance 1 \n", + "Action|Drama|Sport|Thriller 1 \n", + "Animation|Comedy|Drama|Romance 1 \n", + "Comedy|Fantasy|Horror|Mystery 1 \n", + "Crime|Drama|History|Mystery|Thriller 1 \n", + "Action|Horror|Romance 1 \n", + "Adventure|Comedy|Crime|Music 1 \n", + "Crime|Drama|Musical|Romance 1 \n", + "Adventure|Comedy|Sci-Fi|Western 1 \n", + "Crime|Drama|Fantasy|Mystery 1 \n", + "Action|Adventure|Drama|History|Thriller|War 1 \n", + "Action|Adventure|Biography|Drama|History|Thriller 1 \n", + "Comedy|Crime|Horror 1 \n", + "Adventure|Animation|Family|Fantasy|Musical|War 1 \n", + "Action|Adventure|Biography|Drama|History|Romance|War 1 \n", + "Comedy|Drama|Family|Fantasy|Sci-Fi 1 \n", + "Comedy|Crime|Musical|Mystery 1 \n", + "Adventure|Comedy|Drama|Romance|Thriller|War 1 \n", + "Adventure|Comedy|Family|Music|Romance 1 \n", + "Action|Comedy|Crime|Western 1 \n", + "Adventure|Drama|Thriller|War 1 \n", + "Biography|Crime|Drama|Mystery|Thriller 1 \n", + "Adventure|Comedy|Drama|Music 1 \n", + "Adventure|Animation|Comedy|Fantasy|Music|Romance 1 \n", + "Family|Fantasy|Music 1 \n", + "Action|Adventure|Drama|History|Romance|War 1 \n", + "Biography|Comedy|Crime|Drama|Romance 1 \n", + "Adventure|Comedy|Musical|Romance 1 \n", + "Name: genres, dtype: int64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.genres.value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #3: Show all columns and column width" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "pd.reset_option('display.width')\n", + "pd.reset_option('display.max_columns')\n", + "pd.reset_option('display.max_colwidth')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
\n", + "

2 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "0 3054.0 English USA PG-13 237000000.0 \n", + "1 1238.0 English USA PG-13 300000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "0 2009.0 936.0 7.9 1.78 \n", + "1 2007.0 5000.0 7.1 2.35 \n", + "\n", + " movie_facebook_likes \n", + "0 33000 \n", + "1 0 \n", + "\n", + "[2 rows x 28 columns]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "# show all columns on wider area\n", + "import pandas as pd\n", + "pd.set_option('display.width', None)\n", + "pd.set_option('display.max_columns', None)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenresactor_1_namemovie_titlenum_voted_userscast_total_facebook_likesactor_3_namefacenumber_in_posterplot_keywordsmovie_imdb_linknum_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-FiCCH PounderAvatar8862044834Wes Studi0.0avatar|future|marine|native|paraplegichttp://www.imdb.com/title/tt0499549/?ref_=fn_t...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|FantasyJohnny DeppPirates of the Caribbean: At World's End47122048350Jack Davenport0.0goddess|marriage ceremony|marriage proposal|pi...http://www.imdb.com/title/tt0449088/?ref_=fn_t...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "\n", + " actor_1_facebook_likes gross genres \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy \n", + "\n", + " actor_1_name movie_title num_voted_users \\\n", + "0 CCH Pounder Avatar  886204 \n", + "1 Johnny Depp Pirates of the Caribbean: At World's End  471220 \n", + "\n", + " cast_total_facebook_likes actor_3_name facenumber_in_poster \\\n", + "0 4834 Wes Studi 0.0 \n", + "1 48350 Jack Davenport 0.0 \n", + "\n", + " plot_keywords \\\n", + "0 avatar|future|marine|native|paraplegic \n", + "1 goddess|marriage ceremony|marriage proposal|pi... \n", + "\n", + " movie_imdb_link num_user_for_reviews \\\n", + "0 http://www.imdb.com/title/tt0499549/?ref_=fn_t... 3054.0 \n", + "1 http://www.imdb.com/title/tt0449088/?ref_=fn_t... 1238.0 \n", + "\n", + " language country content_rating budget title_year \\\n", + "0 English USA PG-13 237000000.0 2009.0 \n", + "1 English USA PG-13 300000000.0 2007.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \n", + "0 936.0 7.9 1.78 33000 \n", + "1 5000.0 7.1 2.35 0 " + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5 Action|Adventure|Sci-Fi\n", + "6 Action|Adventure|Romance\n", + "7 Adventure|Animation|Comedy|Family|Fantasy|Musi...\n", + "8 Action|Adventure|Sci-Fi\n", + "9 Adventure|Family|Fantasy|Mystery\n", + "Name: genres, dtype: object" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.iloc[5:10,9]" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "# display column values without truncation\n", + "import pandas as pd\n", + "pd.set_option('display.max_colwidth', None)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5 Action|Adventure|Sci-Fi\n", + "6 Action|Adventure|Romance\n", + "7 Adventure|Animation|Comedy|Family|Fantasy|Musical|Romance\n", + "8 Action|Adventure|Sci-Fi\n", + "9 Adventure|Family|Fantasy|Mystery\n", + "Name: genres, dtype: object" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.iloc[5:10,9]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #5: Increase Jupyter Notebook cell width" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from IPython.core.display import display, HTML\n", + "display(HTML(\"\"))" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from IPython.core.display import display, HTML\n", + "display(HTML(\"\"))\n", + "display(HTML(\"\"))\n", + "display(HTML(\"\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/pandas/Pandas_Crosstab_-_cross_tabulation_of_two_factors_examples.ipynb b/notebooks/pandas/Pandas_Crosstab_-_cross_tabulation_of_two_factors_examples.ipynb new file mode 100644 index 0000000..dfe2ce4 --- /dev/null +++ b/notebooks/pandas/Pandas_Crosstab_-_cross_tabulation_of_two_factors_examples.ipynb @@ -0,0 +1,2380 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas : Crosstab - cross tabulation of two (or more) factors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Resources\n", + "\n", + "* [pandas.crosstab](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.crosstab.html)\n", + "* [Pivot table](https://en.wikipedia.org/wiki/Pivot_table)\n", + "* [imdb 5000 movie dataset](https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset)\n", + "\n", + "## Official Pandas doc\n", + "\n", + ">Compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.\n", + "\n", + "## Pivot Table\n", + "\n", + "> A pivot table is a table of statistics that summarizes the data of more extensive table (such as from a database, spreadsheet, or business intelligence program). This summary might include sums, averages, or other statistics, which the pivot table groups together in a meaningful way.\n", + "\n", + "> Pivot tables are a technique in data processing." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use cases\n", + "\n", + "* Data summary\n", + "* Data aggregation\n", + "* Grouping\n", + "* Quick Reports\n", + "* Data patterns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Import Pandas and read data" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Select data for the crosstab" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
2ColorSam Mendes602.0148.00.0161.0Rory Kinnear11000.0200074175.0Action|Adventure|Thriller...994.0EnglishUKPG-13245000000.02015.0393.06.82.3585000
3ColorChristopher Nolan813.0164.022000.023000.0Christian Bale27000.0448130642.0Action|Thriller...2701.0EnglishUSAPG-13250000000.02012.023000.08.52.35164000
4NaNDoug WalkerNaNNaN131.0NaNRob Walker131.0NaNDocumentary...NaNNaNNaNNaNNaNNaN12.07.1NaN0
\n", + "

5 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "2 Color Sam Mendes 602.0 148.0 \n", + "3 Color Christopher Nolan 813.0 164.0 \n", + "4 NaN Doug Walker NaN NaN \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "2 0.0 161.0 Rory Kinnear \n", + "3 22000.0 23000.0 Christian Bale \n", + "4 131.0 NaN Rob Walker \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy ... \n", + "2 11000.0 200074175.0 Action|Adventure|Thriller ... \n", + "3 27000.0 448130642.0 Action|Thriller ... \n", + "4 131.0 NaN Documentary ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "0 3054.0 English USA PG-13 237000000.0 \n", + "1 1238.0 English USA PG-13 300000000.0 \n", + "2 994.0 English UK PG-13 245000000.0 \n", + "3 2701.0 English USA PG-13 250000000.0 \n", + "4 NaN NaN NaN NaN NaN \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "0 2009.0 936.0 7.9 1.78 \n", + "1 2007.0 5000.0 7.1 2.35 \n", + "2 2015.0 393.0 6.8 2.35 \n", + "3 2012.0 23000.0 8.5 2.35 \n", + "4 NaN 12.0 7.1 NaN \n", + "\n", + " movie_facebook_likes \n", + "0 33000 \n", + "1 0 \n", + "2 85000 \n", + "3 164000 \n", + "4 0 \n", + "\n", + "[5 rows x 28 columns]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
colorColorColorColorColorNaN
director_nameJames CameronGore VerbinskiSam MendesChristopher NolanDoug Walker
num_critic_for_reviews723302602813NaN
duration178169148164NaN
director_facebook_likes0563022000131
actor_3_facebook_likes855100016123000NaN
actor_2_nameJoel David MooreOrlando BloomRory KinnearChristian BaleRob Walker
actor_1_facebook_likes1000400001100027000131
gross7.60506e+083.09404e+082.00074e+084.48131e+08NaN
genresAction|Adventure|Fantasy|Sci-FiAction|Adventure|FantasyAction|Adventure|ThrillerAction|ThrillerDocumentary
actor_1_nameCCH PounderJohnny DeppChristoph WaltzTom HardyDoug Walker
movie_titleAvatarPirates of the Caribbean: At World's EndSpectreThe Dark Knight RisesStar Wars: Episode VII - The Force Awakens  ...
num_voted_users88620447122027586811443378
cast_total_facebook_likes48344835011700106759143
actor_3_nameWes StudiJack DavenportStephanie SigmanJoseph Gordon-LevittNaN
facenumber_in_poster00100
plot_keywordsavatar|future|marine|native|paraplegicgoddess|marriage ceremony|marriage proposal|pi...bomb|espionage|sequel|spy|terroristdeception|imprisonment|lawlessness|police offi...NaN
movie_imdb_linkhttp://www.imdb.com/title/tt0499549/?ref_=fn_t...http://www.imdb.com/title/tt0449088/?ref_=fn_t...http://www.imdb.com/title/tt2379713/?ref_=fn_t...http://www.imdb.com/title/tt1345836/?ref_=fn_t...http://www.imdb.com/title/tt5289954/?ref_=fn_t...
num_user_for_reviews305412389942701NaN
languageEnglishEnglishEnglishEnglishNaN
countryUSAUSAUKUSANaN
content_ratingPG-13PG-13PG-13PG-13NaN
budget2.37e+083e+082.45e+082.5e+08NaN
title_year2009200720152012NaN
actor_2_facebook_likes93650003932300012
imdb_score7.97.16.88.57.1
aspect_ratio1.782.352.352.35NaN
movie_facebook_likes330000850001640000
\n", + "
" + ], + "text/plain": [ + " 0 \\\n", + "color Color \n", + "director_name James Cameron \n", + "num_critic_for_reviews 723 \n", + "duration 178 \n", + "director_facebook_likes 0 \n", + "actor_3_facebook_likes 855 \n", + "actor_2_name Joel David Moore \n", + "actor_1_facebook_likes 1000 \n", + "gross 7.60506e+08 \n", + "genres Action|Adventure|Fantasy|Sci-Fi \n", + "actor_1_name CCH Pounder \n", + "movie_title Avatar  \n", + "num_voted_users 886204 \n", + "cast_total_facebook_likes 4834 \n", + "actor_3_name Wes Studi \n", + "facenumber_in_poster 0 \n", + "plot_keywords avatar|future|marine|native|paraplegic \n", + "movie_imdb_link http://www.imdb.com/title/tt0499549/?ref_=fn_t... \n", + "num_user_for_reviews 3054 \n", + "language English \n", + "country USA \n", + "content_rating PG-13 \n", + "budget 2.37e+08 \n", + "title_year 2009 \n", + "actor_2_facebook_likes 936 \n", + "imdb_score 7.9 \n", + "aspect_ratio 1.78 \n", + "movie_facebook_likes 33000 \n", + "\n", + " 1 \\\n", + "color Color \n", + "director_name Gore Verbinski \n", + "num_critic_for_reviews 302 \n", + "duration 169 \n", + "director_facebook_likes 563 \n", + "actor_3_facebook_likes 1000 \n", + "actor_2_name Orlando Bloom \n", + "actor_1_facebook_likes 40000 \n", + "gross 3.09404e+08 \n", + "genres Action|Adventure|Fantasy \n", + "actor_1_name Johnny Depp \n", + "movie_title Pirates of the Caribbean: At World's End  \n", + "num_voted_users 471220 \n", + "cast_total_facebook_likes 48350 \n", + "actor_3_name Jack Davenport \n", + "facenumber_in_poster 0 \n", + "plot_keywords goddess|marriage ceremony|marriage proposal|pi... \n", + "movie_imdb_link http://www.imdb.com/title/tt0449088/?ref_=fn_t... \n", + "num_user_for_reviews 1238 \n", + "language English \n", + "country USA \n", + "content_rating PG-13 \n", + "budget 3e+08 \n", + "title_year 2007 \n", + "actor_2_facebook_likes 5000 \n", + "imdb_score 7.1 \n", + "aspect_ratio 2.35 \n", + "movie_facebook_likes 0 \n", + "\n", + " 2 \\\n", + "color Color \n", + "director_name Sam Mendes \n", + "num_critic_for_reviews 602 \n", + "duration 148 \n", + "director_facebook_likes 0 \n", + "actor_3_facebook_likes 161 \n", + "actor_2_name Rory Kinnear \n", + "actor_1_facebook_likes 11000 \n", + "gross 2.00074e+08 \n", + "genres Action|Adventure|Thriller \n", + "actor_1_name Christoph Waltz \n", + "movie_title Spectre  \n", + "num_voted_users 275868 \n", + "cast_total_facebook_likes 11700 \n", + "actor_3_name Stephanie Sigman \n", + "facenumber_in_poster 1 \n", + "plot_keywords bomb|espionage|sequel|spy|terrorist \n", + "movie_imdb_link http://www.imdb.com/title/tt2379713/?ref_=fn_t... \n", + "num_user_for_reviews 994 \n", + "language English \n", + "country UK \n", + "content_rating PG-13 \n", + "budget 2.45e+08 \n", + "title_year 2015 \n", + "actor_2_facebook_likes 393 \n", + "imdb_score 6.8 \n", + "aspect_ratio 2.35 \n", + "movie_facebook_likes 85000 \n", + "\n", + " 3 \\\n", + "color Color \n", + "director_name Christopher Nolan \n", + "num_critic_for_reviews 813 \n", + "duration 164 \n", + "director_facebook_likes 22000 \n", + "actor_3_facebook_likes 23000 \n", + "actor_2_name Christian Bale \n", + "actor_1_facebook_likes 27000 \n", + "gross 4.48131e+08 \n", + "genres Action|Thriller \n", + "actor_1_name Tom Hardy \n", + "movie_title The Dark Knight Rises  \n", + "num_voted_users 1144337 \n", + "cast_total_facebook_likes 106759 \n", + "actor_3_name Joseph Gordon-Levitt \n", + "facenumber_in_poster 0 \n", + "plot_keywords deception|imprisonment|lawlessness|police offi... \n", + "movie_imdb_link http://www.imdb.com/title/tt1345836/?ref_=fn_t... \n", + "num_user_for_reviews 2701 \n", + "language English \n", + "country USA \n", + "content_rating PG-13 \n", + "budget 2.5e+08 \n", + "title_year 2012 \n", + "actor_2_facebook_likes 23000 \n", + "imdb_score 8.5 \n", + "aspect_ratio 2.35 \n", + "movie_facebook_likes 164000 \n", + "\n", + " 4 \n", + "color NaN \n", + "director_name Doug Walker \n", + "num_critic_for_reviews NaN \n", + "duration NaN \n", + "director_facebook_likes 131 \n", + "actor_3_facebook_likes NaN \n", + "actor_2_name Rob Walker \n", + "actor_1_facebook_likes 131 \n", + "gross NaN \n", + "genres Documentary \n", + "actor_1_name Doug Walker \n", + "movie_title Star Wars: Episode VII - The Force Awakens  ... \n", + "num_voted_users 8 \n", + "cast_total_facebook_likes 143 \n", + "actor_3_name NaN \n", + "facenumber_in_poster 0 \n", + "plot_keywords NaN \n", + "movie_imdb_link http://www.imdb.com/title/tt5289954/?ref_=fn_t... \n", + "num_user_for_reviews NaN \n", + "language NaN \n", + "country NaN \n", + "content_rating NaN \n", + "budget NaN \n", + "title_year NaN \n", + "actor_2_facebook_likes 12 \n", + "imdb_score 7.1 \n", + "aspect_ratio NaN \n", + "movie_facebook_likes 0 " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head().T" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',\n", + " 'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',\n", + " 'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',\n", + " 'movie_title', 'num_voted_users', 'cast_total_facebook_likes',\n", + " 'actor_3_name', 'facenumber_in_poster', 'plot_keywords',\n", + " 'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',\n", + " 'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',\n", + " 'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],\n", + " dtype='object')" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "df2 = df.iloc[[2, 4, 9, 12, 13, 14, 20, 23, 25, 30, 34, 50, 79], :]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 3: Create crosstab table" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_name
Baz Luhrmann10000
Brett Ratner01000
David Yates00010
Gore Verbinski00002
Jon Favreau00010
Marc Forster00010
Peter Jackson00201
Sam Mendes00020
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director_name \n", + "Baz Luhrmann 1 0 0 0 0\n", + "Brett Ratner 0 1 0 0 0\n", + "David Yates 0 0 0 1 0\n", + "Gore Verbinski 0 0 0 0 2\n", + "Jon Favreau 0 0 0 1 0\n", + "Marc Forster 0 0 0 1 0\n", + "Peter Jackson 0 0 2 0 1\n", + "Sam Mendes 0 0 0 2 0" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# simple usage\n", + "pd.crosstab(df2['director_name'], df2['country'])" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director
Baz Luhrmann10000
Brett Ratner01000
David Yates00010
Gore Verbinski00002
Jon Favreau00010
Marc Forster00010
Peter Jackson00201
Sam Mendes00020
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director \n", + "Baz Luhrmann 1 0 0 0 0\n", + "Brett Ratner 0 1 0 0 0\n", + "David Yates 0 0 0 1 0\n", + "Gore Verbinski 0 0 0 0 2\n", + "Jon Favreau 0 0 0 1 0\n", + "Marc Forster 0 0 0 1 0\n", + "Peter Jackson 0 0 2 0 1\n", + "Sam Mendes 0 0 0 2 0" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# change row and column names\n", + "pd.crosstab(df2['director_name'], df2['country'], rownames=['director'], colnames=['country'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Crosstab: normaliza or show percentage per row or total" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_name
Baz Luhrmann0.0833330.0000000.0000000.0000000.000000
Brett Ratner0.0000000.0833330.0000000.0000000.000000
David Yates0.0000000.0000000.0000000.0833330.000000
Gore Verbinski0.0000000.0000000.0000000.0000000.166667
Jon Favreau0.0000000.0000000.0000000.0833330.000000
Marc Forster0.0000000.0000000.0000000.0833330.000000
Peter Jackson0.0000000.0000000.1666670.0000000.083333
Sam Mendes0.0000000.0000000.0000000.1666670.000000
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director_name \n", + "Baz Luhrmann 0.083333 0.000000 0.000000 0.000000 0.000000\n", + "Brett Ratner 0.000000 0.083333 0.000000 0.000000 0.000000\n", + "David Yates 0.000000 0.000000 0.000000 0.083333 0.000000\n", + "Gore Verbinski 0.000000 0.000000 0.000000 0.000000 0.166667\n", + "Jon Favreau 0.000000 0.000000 0.000000 0.083333 0.000000\n", + "Marc Forster 0.000000 0.000000 0.000000 0.083333 0.000000\n", + "Peter Jackson 0.000000 0.000000 0.166667 0.000000 0.083333\n", + "Sam Mendes 0.000000 0.000000 0.000000 0.166667 0.000000" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Show percentage - global - normalize=True\n", + "pd.crosstab(df2['director_name'], df2['country'], normalize=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_name
Baz Luhrmann1.00.00.0000000.00.000000
Brett Ratner0.01.00.0000000.00.000000
David Yates0.00.00.0000001.00.000000
Gore Verbinski0.00.00.0000000.01.000000
Jon Favreau0.00.00.0000001.00.000000
Marc Forster0.00.00.0000001.00.000000
Peter Jackson0.00.00.6666670.00.333333
Sam Mendes0.00.00.0000001.00.000000
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director_name \n", + "Baz Luhrmann 1.0 0.0 0.000000 0.0 0.000000\n", + "Brett Ratner 0.0 1.0 0.000000 0.0 0.000000\n", + "David Yates 0.0 0.0 0.000000 1.0 0.000000\n", + "Gore Verbinski 0.0 0.0 0.000000 0.0 1.000000\n", + "Jon Favreau 0.0 0.0 0.000000 1.0 0.000000\n", + "Marc Forster 0.0 0.0 0.000000 1.0 0.000000\n", + "Peter Jackson 0.0 0.0 0.666667 0.0 0.333333\n", + "Sam Mendes 0.0 0.0 0.000000 1.0 0.000000" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Show percentage - per index - normalize='index'\n", + "pd.crosstab(df2['director_name'], df2['country'], normalize='index')" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSAAll
director_name
Baz Luhrmann100001
Brett Ratner010001
David Yates000101
Gore Verbinski000022
Jon Favreau000101
Marc Forster000101
Peter Jackson002013
Sam Mendes000202
All1125312
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA All\n", + "director_name \n", + "Baz Luhrmann 1 0 0 0 0 1\n", + "Brett Ratner 0 1 0 0 0 1\n", + "David Yates 0 0 0 1 0 1\n", + "Gore Verbinski 0 0 0 0 2 2\n", + "Jon Favreau 0 0 0 1 0 1\n", + "Marc Forster 0 0 0 1 0 1\n", + "Peter Jackson 0 0 2 0 1 3\n", + "Sam Mendes 0 0 0 2 0 2\n", + "All 1 1 2 5 3 12" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Show total - margins=True\n", + "pd.crosstab(df2['director_name'], df2['country'], margins=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSAAll
director_name
Baz Luhrmann0.0833330.0000000.0000000.0000000.0000000.083333
Brett Ratner0.0000000.0833330.0000000.0000000.0000000.083333
David Yates0.0000000.0000000.0000000.0833330.0000000.083333
Gore Verbinski0.0000000.0000000.0000000.0000000.1666670.166667
Jon Favreau0.0000000.0000000.0000000.0833330.0000000.083333
Marc Forster0.0000000.0000000.0000000.0833330.0000000.083333
Peter Jackson0.0000000.0000000.1666670.0000000.0833330.250000
Sam Mendes0.0000000.0000000.0000000.1666670.0000000.166667
All0.0833330.0833330.1666670.4166670.2500001.000000
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA All\n", + "director_name \n", + "Baz Luhrmann 0.083333 0.000000 0.000000 0.000000 0.000000 0.083333\n", + "Brett Ratner 0.000000 0.083333 0.000000 0.000000 0.000000 0.083333\n", + "David Yates 0.000000 0.000000 0.000000 0.083333 0.000000 0.083333\n", + "Gore Verbinski 0.000000 0.000000 0.000000 0.000000 0.166667 0.166667\n", + "Jon Favreau 0.000000 0.000000 0.000000 0.083333 0.000000 0.083333\n", + "Marc Forster 0.000000 0.000000 0.000000 0.083333 0.000000 0.083333\n", + "Peter Jackson 0.000000 0.000000 0.166667 0.000000 0.083333 0.250000\n", + "Sam Mendes 0.000000 0.000000 0.000000 0.166667 0.000000 0.166667\n", + "All 0.083333 0.083333 0.166667 0.416667 0.250000 1.000000" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Combining totals and percentage\n", + "pd.crosstab(df2['director_name'], df2['country'], margins=True, normalize=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_name
Baz Luhrmann1.0000000.0000000.0000000.0000000.000000
Brett Ratner0.0000001.0000000.0000000.0000000.000000
David Yates0.0000000.0000000.0000001.0000000.000000
Gore Verbinski0.0000000.0000000.0000000.0000001.000000
Jon Favreau0.0000000.0000000.0000001.0000000.000000
Marc Forster0.0000000.0000000.0000001.0000000.000000
Peter Jackson0.0000000.0000000.6666670.0000000.333333
Sam Mendes0.0000000.0000000.0000001.0000000.000000
All0.0833330.0833330.1666670.4166670.250000
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director_name \n", + "Baz Luhrmann 1.000000 0.000000 0.000000 0.000000 0.000000\n", + "Brett Ratner 0.000000 1.000000 0.000000 0.000000 0.000000\n", + "David Yates 0.000000 0.000000 0.000000 1.000000 0.000000\n", + "Gore Verbinski 0.000000 0.000000 0.000000 0.000000 1.000000\n", + "Jon Favreau 0.000000 0.000000 0.000000 1.000000 0.000000\n", + "Marc Forster 0.000000 0.000000 0.000000 1.000000 0.000000\n", + "Peter Jackson 0.000000 0.000000 0.666667 0.000000 0.333333\n", + "Sam Mendes 0.000000 0.000000 0.000000 1.000000 0.000000\n", + "All 0.083333 0.083333 0.166667 0.416667 0.250000" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Combining totals and percentage per row\n", + "pd.crosstab(df2['director_name'], df2['country'], margins=True, normalize='index')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas crosstab multiple columns" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_namegenres
Baz LuhrmannDrama|Romance10000
Brett RatnerAction|Adventure|Fantasy|Sci-Fi|Thriller01000
David YatesAdventure|Family|Fantasy|Mystery00010
Gore VerbinskiAction|Adventure|Fantasy00001
Action|Adventure|Western00001
Jon FavreauAdventure|Drama|Family|Fantasy00010
Marc ForsterAction|Adventure00010
Peter JacksonAction|Adventure|Drama|Romance00100
Adventure|Fantasy00101
Sam MendesAction|Adventure|Thriller00020
\n", + "
" + ], + "text/plain": [ + "country Australia Canada \\\n", + "director_name genres \n", + "Baz Luhrmann Drama|Romance 1 0 \n", + "Brett Ratner Action|Adventure|Fantasy|Sci-Fi|Thriller 0 1 \n", + "David Yates Adventure|Family|Fantasy|Mystery 0 0 \n", + "Gore Verbinski Action|Adventure|Fantasy 0 0 \n", + " Action|Adventure|Western 0 0 \n", + "Jon Favreau Adventure|Drama|Family|Fantasy 0 0 \n", + "Marc Forster Action|Adventure 0 0 \n", + "Peter Jackson Action|Adventure|Drama|Romance 0 0 \n", + " Adventure|Fantasy 0 0 \n", + "Sam Mendes Action|Adventure|Thriller 0 0 \n", + "\n", + "country New Zealand UK USA \n", + "director_name genres \n", + "Baz Luhrmann Drama|Romance 0 0 0 \n", + "Brett Ratner Action|Adventure|Fantasy|Sci-Fi|Thriller 0 0 0 \n", + "David Yates Adventure|Family|Fantasy|Mystery 0 1 0 \n", + "Gore Verbinski Action|Adventure|Fantasy 0 0 1 \n", + " Action|Adventure|Western 0 0 1 \n", + "Jon Favreau Adventure|Drama|Family|Fantasy 0 1 0 \n", + "Marc Forster Action|Adventure 0 1 0 \n", + "Peter Jackson Action|Adventure|Drama|Romance 1 0 0 \n", + " Adventure|Fantasy 1 0 1 \n", + "Sam Mendes Action|Adventure|Thriller 0 2 0 " + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.crosstab([df2['director_name'], df2['genres']], df2['country'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Simulate pandas crosstab with Group By" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namecountry
director_namecountry
Baz LuhrmannAustralia11
Brett RatnerCanada11
David YatesUK11
Gore VerbinskiUSA22
Jon FavreauUK11
Marc ForsterUK11
Peter JacksonNew Zealand22
USA11
Sam MendesUK22
\n", + "
" + ], + "text/plain": [ + " director_name country\n", + "director_name country \n", + "Baz Luhrmann Australia 1 1\n", + "Brett Ratner Canada 1 1\n", + "David Yates UK 1 1\n", + "Gore Verbinski USA 2 2\n", + "Jon Favreau UK 1 1\n", + "Marc Forster UK 1 1\n", + "Peter Jackson New Zealand 2 2\n", + " USA 1 1\n", + "Sam Mendes UK 2 2" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cols = ['director_name', 'country']\n", + "df2.groupby(cols)[cols].count()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas crosstab use values from another column" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_name
Baz Luhrmann7.3NaNNaNNaNNaN
Brett RatnerNaN6.8NaNNaNNaN
David YatesNaNNaNNaN7.5NaN
Gore VerbinskiNaNNaNNaNNaN6.9
Jon FavreauNaNNaNNaN7.8NaN
Marc ForsterNaNNaNNaN6.7NaN
Peter JacksonNaNNaN7.35NaN7.9
Sam MendesNaNNaNNaN7.3NaN
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director_name \n", + "Baz Luhrmann 7.3 NaN NaN NaN NaN\n", + "Brett Ratner NaN 6.8 NaN NaN NaN\n", + "David Yates NaN NaN NaN 7.5 NaN\n", + "Gore Verbinski NaN NaN NaN NaN 6.9\n", + "Jon Favreau NaN NaN NaN 7.8 NaN\n", + "Marc Forster NaN NaN NaN 6.7 NaN\n", + "Peter Jackson NaN NaN 7.35 NaN 7.9\n", + "Sam Mendes NaN NaN NaN 7.3 NaN" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "pd.crosstab(df2['director_name'], df2['country'], values=df2.imdb_score, aggfunc=np.average)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSAAll
director_name
Baz Luhrmann7.3NaNNaNNaNNaN7.300000
Brett RatnerNaN6.8NaNNaNNaN6.800000
David YatesNaNNaNNaN7.50NaN7.500000
Gore VerbinskiNaNNaNNaNNaN6.9000006.900000
Jon FavreauNaNNaNNaN7.80NaN7.800000
Marc ForsterNaNNaNNaN6.70NaN6.700000
Peter JacksonNaNNaN7.35NaN7.9000007.533333
Sam MendesNaNNaNNaN7.30NaN7.300000
All7.36.87.357.327.2333337.258333
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA All\n", + "director_name \n", + "Baz Luhrmann 7.3 NaN NaN NaN NaN 7.300000\n", + "Brett Ratner NaN 6.8 NaN NaN NaN 6.800000\n", + "David Yates NaN NaN NaN 7.50 NaN 7.500000\n", + "Gore Verbinski NaN NaN NaN NaN 6.900000 6.900000\n", + "Jon Favreau NaN NaN NaN 7.80 NaN 7.800000\n", + "Marc Forster NaN NaN NaN 6.70 NaN 6.700000\n", + "Peter Jackson NaN NaN 7.35 NaN 7.900000 7.533333\n", + "Sam Mendes NaN NaN NaN 7.30 NaN 7.300000\n", + "All 7.3 6.8 7.35 7.32 7.233333 7.258333" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "pd.crosstab(df2['director_name'], df2['country'], values=df2.imdb_score, aggfunc=np.average, margins=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb b/notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb new file mode 100644 index 0000000..d0d81d7 --- /dev/null +++ b/notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb @@ -0,0 +1,592 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas : Select rows between two dates - DataFrame or CSV file" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Resources\n", + "\n", + "* [pandas.to_datetime](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html)\n", + "* [pandas.DataFrame.between_time](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.between_time.html)\n", + "* [pandas.DataFrame.loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use cases\n", + "\n", + "* Pandas: Verify columns containing dates\n", + "* Convert string to datetime in DataFrame\n", + "* Select rows between two dates\n", + " * 1. Select rows based on dates with loc\n", + " * 2. Series method between\n", + " * 3. Select rows between two times\n", + " * 4. Select rows based on dates without loc\n", + " * 5. Use mask to mark the records\n", + " * 6. Select records from last month/30 days " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Import Pandas and read data" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
loading_datetimepagestitledatetime_col
02019-10-28 19:56:03main<GET https://www.wikipedia.org/> (The Free En...2019-10-29 9:06:03
12019-10-29 19:56:03english<GET https://en.wikipedia.org/wiki/Main_Page>...2019-10-31 11:16:43
22019-10-29 19:56:03italiano<GET https://it.wikipedia.org/wiki/Pagina_pri...2019-10-30 21:15:23
32019-10-30 19:56:03português<GET https://pt.wikipedia.org/wiki/Wikip%C3%A...2019-10-30 20:26:35
\n", + "
" + ], + "text/plain": [ + " loading_datetime pages \\\n", + "0 2019-10-28 19:56:03 main \n", + "1 2019-10-29 19:56:03 english \n", + "2 2019-10-29 19:56:03 italiano \n", + "3 2019-10-30 19:56:03 português \n", + "\n", + " title datetime_col \n", + "0 (The Free En... 2019-10-29 9:06:03 \n", + "1 ... 2019-10-31 11:16:43 \n", + "2 \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
loading_datetimepagestitledatetime_col
12019-10-29 19:56:03english<GET https://en.wikipedia.org/wiki/Main_Page>...2019-10-31 11:16:43+00:00
22019-10-29 19:56:03italiano<GET https://it.wikipedia.org/wiki/Pagina_pri...2019-10-30 21:15:23+00:00
\n", + "" + ], + "text/plain": [ + " loading_datetime pages \\\n", + "1 2019-10-29 19:56:03 english \n", + "2 2019-10-29 19:56:03 italiano \n", + "\n", + " title datetime_col \n", + "1 ... 2019-10-31 11:16:43+00:00 \n", + "2 start_date) & (df['datetime_col'] < end_date)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 2. Series method between" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "start_date = pd.to_datetime('2019-10-30 20:41', utc= True)\n", + "end_date = pd.to_datetime('5/13/2020 8:55', utc= True)\n", + "\n", + "df[df.datetime_col.between(start_date, end_date)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 3. Select rows between two times" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
loading_datetimepagestitle
datetime_col
2019-10-30 21:15:23+00:002019-10-29 19:56:03italiano<GET https://it.wikipedia.org/wiki/Pagina_pri...
\n", + "
" + ], + "text/plain": [ + " loading_datetime pages \\\n", + "datetime_col \n", + "2019-10-30 21:15:23+00:00 2019-10-29 19:56:03 italiano \n", + "\n", + " title \n", + "datetime_col \n", + "2019-10-30 21:15:23+00:00 '2018-12-02') & (df['datetime_col'] <= '2018-12-03 23:26:10+00:00')]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 6. Select records from last month/30 days " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
loading_datetimepagestitledatetime_col
12019-10-29 19:56:03english<GET https://en.wikipedia.org/wiki/Main_Page>...2019-10-31 11:16:43+00:00
\n", + "
" + ], + "text/plain": [ + " loading_datetime pages \\\n", + "1 2019-10-29 19:56:03 english \n", + "\n", + " title datetime_col \n", + "1 ... 2019-10-31 11:16:43+00:00 " + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df[\"datetime_col\"] >= (pd.to_datetime('11/30/2019', utc=True) - pd.Timedelta(days=30))]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/pandas/Pandas_compare_columns_in_two_Dataframes.ipynb b/notebooks/pandas/Pandas_compare_columns_in_two_Dataframes.ipynb new file mode 100644 index 0000000..b060b47 --- /dev/null +++ b/notebooks/pandas/Pandas_compare_columns_in_two_Dataframes.ipynb @@ -0,0 +1,894 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df1 = pd.read_csv('../csv/file1.csv',sep=\"\\s+\")\n", + "df2 = pd.read_csv('../csv/file2.csv',sep=\"\\s+\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalue
0Mikea+98
1Jerya-144
2Tomyb108
\n", + "
" + ], + "text/plain": [ + " name type value\n", + "0 Mike a+ 98\n", + "1 Jery a- 144\n", + "2 Tomy b 108" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
typelowhigh
0a+7897
1a-108143
2b108150
\n", + "
" + ], + "text/plain": [ + " type low high\n", + "0 a+ 78 97\n", + "1 a- 108 143\n", + "2 b 108 150" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Similar sized dataframes" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "df1['low_value'] = np.where(df1.type == df2.type, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 True\n", + "1 True\n", + "2 True\n", + "Name: low_value, dtype: object" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1['low_value']" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# compare using np.where whether values from first dataframe has match in the column of the second\n", + "import numpy as np\n", + "df1['low_high'] = np.where(df1.value < df2.high, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 False\n", + "1 False\n", + "2 True\n", + "Name: low_high, dtype: object" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1['low_high']" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "# Compare one column from first against two from second dataframe\n", + "df1['low_high_value'] = np.where((df1.value >= df2.low) & (df1.value <= df2.high), 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 False\n", + "1 False\n", + "2 True\n", + "Name: low_high_value, dtype: object" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1['low_high_value']" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['False', 'False', 'True'], dtype='\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalue0
0Mikea+98False
1Jerya-144False
2Tomyb108True
\n", + "" + ], + "text/plain": [ + " name type value 0\n", + "0 Mike a+ 98 False\n", + "1 Jery a- 144 False\n", + "2 Tomy b 108 True" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# compare data as Boolean Series and join it the result to first dataframe\n", + "df3 = [(df2.type.isin(df1.type)) & (df1.value.between(df2.low,df2.high,inclusive=True))]\n", + "df1.join(df3)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "# compare data and assign it as new column to first data frame\n", + "df1['enh1'] = pd.Series((df2.type.isin(df1.type)) & (df1.value >= df2.low) & (df1.value <= df2.high))" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalueenh1
0Mikea+98False
1Jerya-144False
2Tomyb108True
\n", + "
" + ], + "text/plain": [ + " name type value enh1\n", + "0 Mike a+ 98 False\n", + "1 Jery a- 144 False\n", + "2 Tomy b 108 True" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "# compare with 3 conditions and or clause. You can use any valid python code\n", + "df1['enh2'] = pd.Series((df2.type.isin(df1.type)) & (df1.value != df2.low) | (df1.value + 1 == df2.high))" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalueenh1enh2
0Mikea+98FalseTrue
1Jerya-144FalseTrue
2Tomyb108TrueFalse
\n", + "
" + ], + "text/plain": [ + " name type value enh1 enh2\n", + "0 Mike a+ 98 False True\n", + "1 Jery a- 144 False True\n", + "2 Tomy b 108 True False" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Different sized dataframes" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "# add new row for dataframe 2\n", + "df2 = df2.append({'type':'0', 'low':143, 'high':108}, ignore_index=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "merged = df1.merge(df2,how='outer',left_on=['type'],right_on=[\"type\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalueenh1enh2lowhigh
0Mikea+98.0FalseTrue7897
1Jerya-144.0FalseTrue108143
2Tomyb108.0TrueFalse108150
3NaN0NaNNaNNaN143108
\n", + "
" + ], + "text/plain": [ + " name type value enh1 enh2 low high\n", + "0 Mike a+ 98.0 False True 78 97\n", + "1 Jery a- 144.0 False True 108 143\n", + "2 Tomy b 108.0 True False 108 150\n", + "3 NaN 0 NaN NaN NaN 143 108" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "merged" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalueenh1enh2lowhigh
2Tomyb108.0TrueFalse108150
\n", + "
" + ], + "text/plain": [ + " name type value enh1 enh2 low high\n", + "2 Tomy b 108.0 True False 108 150" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "merged[(merged.value >= merged.low) & (merged.value <= merged.high)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Error ValueError: Can only compare identically-labeled Series objects" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "ename": "ValueError", + "evalue": "Can only compare identically-labeled Series objects", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# demo of error - ValueError: Can only compare identically-labeled Series objects\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mdf1\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'low_high'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwhere\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mdf2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhigh\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'True'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'False'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/ops/__init__.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(self, other, axis)\u001b[0m\n\u001b[1;32m 1140\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1141\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mABCSeries\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_indexed_same\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1142\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Can only compare identically-labeled \"\u001b[0m \u001b[0;34m\"Series objects\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1143\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1144\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mis_categorical_dtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mValueError\u001b[0m: Can only compare identically-labeled Series objects" + ] + } + ], + "source": [ + "# demo of error - ValueError: Can only compare identically-labeled Series objects \n", + "import numpy as np\n", + "df1['low_high'] = np.where(df1.value < df2.high, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "df2.drop(3, inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "# demo of error - Now is working because of equal rows\n", + "import numpy as np\n", + "df1['low_high'] = np.where(df1.value < df2.high, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "# how to cause it on first dataframes\n", + "df1.set_index([pd.Index([1, 2, 3])], inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "ename": "ValueError", + "evalue": "Can only compare identically-labeled Series objects", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# demo of error - ValueError: Can only compare identically-labeled Series objects because of mismatching indexes\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mdf1\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'low_high'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwhere\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mdf2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhigh\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'True'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'False'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/ops/__init__.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(self, other, axis)\u001b[0m\n\u001b[1;32m 1140\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1141\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mABCSeries\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_indexed_same\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1142\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Can only compare identically-labeled \"\u001b[0m \u001b[0;34m\"Series objects\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1143\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1144\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mis_categorical_dtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mValueError\u001b[0m: Can only compare identically-labeled Series objects" + ] + } + ], + "source": [ + "# demo of error - ValueError: Can only compare identically-labeled Series objects because of mismatching indexes\n", + "import numpy as np\n", + "df1['low_high'] = np.where(df1.value < df2.high, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "# possible solution for - ValueError: Can only compare identically-labeled Series objects\n", + "df1.sort_index(inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "# possible solution for - ValueError: Can only compare identically-labeled Series objects\n", + "df1.reset_index(inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "# demo of error - ValueError: Can only compare identically-labeled Series objects\n", + "import numpy as np\n", + "df1['low_high'] = np.where(df1.value < df2.high, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 False\n", + "1 False\n", + "2 True\n", + "Name: low_high, dtype: object" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1['low_high']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb b/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb index 91e9752..42f168a 100644 --- a/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb +++ b/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb @@ -34,7 +34,8 @@ "outputs": [], "source": [ "import pandas as pd\n", - "pd.set_option('display.max_colwidth', -1)" + "import numpy as np\n", + "pd.set_option('display.max_colwidth', None)" ] }, { @@ -161,39 +162,39 @@ ], "text/plain": [ " Respondent Hobby OpenSource Country Student Employment \\\n", - "0 1 Yes No Kenya No Employed part-time \n", - "1 3 Yes Yes United Kingdom No Employed full-time \n", + "0 1 Yes No Kenya No Employed part-time \n", + "1 3 Yes Yes United Kingdom No Employed full-time \n", "\n", " FormalEducation \\\n", "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", "\n", " UndergradMajor \\\n", - "0 Mathematics or statistics \n", + "0 Mathematics or statistics \n", "1 A natural science (ex. biology, chemistry, physics) \n", "\n", " CompanySize \\\n", - "0 20 to 99 employees \n", + "0 20 to 99 employees \n", "1 10,000 or more employees \n", "\n", " DevType \\\n", - "0 Full-stack developer \n", + "0 Full-stack developer \n", "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", "\n", - " ... Exercise Gender SexualOrientation \\\n", - "0 ... 3 - 4 times per week Male Straight or heterosexual \n", - "1 ... Daily or almost every day Male Straight or heterosexual \n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "1 ... Daily or almost every day Male Straight or heterosexual \n", "\n", " EducationParents RaceEthnicity \\\n", - "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", "1 Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent \n", "\n", " Age Dependents MilitaryUS \\\n", - "0 25 - 34 years old Yes NaN \n", - "1 35 - 44 years old Yes NaN \n", + "0 25 - 34 years old Yes NaN \n", + "1 35 - 44 years old Yes NaN \n", "\n", " SurveyTooLong SurveyEasy \n", - "0 The survey was an appropriate length Very easy \n", + "0 The survey was an appropriate length Very easy \n", "1 The survey was an appropriate length Somewhat easy \n", "\n", "[2 rows x 129 columns]" @@ -361,64 +362,64 @@ ], "text/plain": [ " Respondent Hobby OpenSource Country Student \\\n", - "0 1 Yes No Kenya No \n", - "1 3 Yes Yes United Kingdom No \n", - "98853 101544 Yes No Russian Federation No \n", - "98854 101548 Yes Yes Cambodia NaN \n", + "0 1 Yes No Kenya No \n", + "1 3 Yes Yes United Kingdom No \n", + "98853 101544 Yes No Russian Federation No \n", + "98854 101548 Yes Yes Cambodia NaN \n", "\n", " Employment \\\n", - "0 Employed part-time \n", - "1 Employed full-time \n", + "0 Employed part-time \n", + "1 Employed full-time \n", "98853 Independent contractor, freelancer, or self-employed \n", - "98854 NaN \n", + "98854 NaN \n", "\n", " FormalEducation \\\n", - "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", "98853 Some college/university study without earning a degree \n", - "98854 NaN \n", + "98854 NaN \n", "\n", " UndergradMajor \\\n", - "0 Mathematics or statistics \n", + "0 Mathematics or statistics \n", "1 A natural science (ex. biology, chemistry, physics) \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", " CompanySize \\\n", - "0 20 to 99 employees \n", + "0 20 to 99 employees \n", "1 10,000 or more employees \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", " DevType \\\n", - "0 Full-stack developer \n", + "0 Full-stack developer \n", "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", - " ... Exercise Gender \\\n", - "0 ... 3 - 4 times per week Male \n", - "1 ... Daily or almost every day Male \n", - "98853 ... NaN NaN \n", - "98854 ... NaN NaN \n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "1 ... Daily or almost every day Male Straight or heterosexual \n", + "98853 ... NaN NaN NaN \n", + "98854 ... NaN NaN NaN \n", "\n", - " SexualOrientation EducationParents \\\n", - "0 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "1 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "98853 NaN NaN \n", - "98854 NaN NaN \n", + " EducationParents RaceEthnicity \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", "\n", - " RaceEthnicity Age Dependents MilitaryUS \\\n", - "0 Black or of African descent 25 - 34 years old Yes NaN \n", - "1 White or of European descent 35 - 44 years old Yes NaN \n", - "98853 NaN NaN NaN NaN \n", - "98854 NaN NaN NaN NaN \n", + " Age Dependents MilitaryUS \\\n", + "0 25 - 34 years old Yes NaN \n", + "1 35 - 44 years old Yes NaN \n", + "98853 NaN NaN NaN \n", + "98854 NaN NaN NaN \n", "\n", " SurveyTooLong SurveyEasy \n", - "0 The survey was an appropriate length Very easy \n", + "0 The survey was an appropriate length Very easy \n", "1 The survey was an appropriate length Somewhat easy \n", - "98853 NaN NaN \n", - "98854 NaN NaN \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", "\n", "[4 rows x 129 columns]" ] @@ -587,64 +588,64 @@ ], "text/plain": [ " Respondent Hobby OpenSource Country Student \\\n", - "0 1 Yes No Kenya No \n", - "1 3 Yes Yes United Kingdom No \n", - "98853 101544 Yes No Russian Federation No \n", - "98854 101548 Yes Yes Cambodia NaN \n", + "0 1 Yes No Kenya No \n", + "1 3 Yes Yes United Kingdom No \n", + "98853 101544 Yes No Russian Federation No \n", + "98854 101548 Yes Yes Cambodia NaN \n", "\n", " Employment \\\n", - "0 Employed part-time \n", - "1 Employed full-time \n", + "0 Employed part-time \n", + "1 Employed full-time \n", "98853 Independent contractor, freelancer, or self-employed \n", - "98854 NaN \n", + "98854 NaN \n", "\n", " FormalEducation \\\n", - "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", "98853 Some college/university study without earning a degree \n", - "98854 NaN \n", + "98854 NaN \n", "\n", " UndergradMajor \\\n", - "0 Mathematics or statistics \n", + "0 Mathematics or statistics \n", "1 A natural science (ex. biology, chemistry, physics) \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", " CompanySize \\\n", - "0 20 to 99 employees \n", + "0 20 to 99 employees \n", "1 10,000 or more employees \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", " DevType \\\n", - "0 Full-stack developer \n", + "0 Full-stack developer \n", "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", - " ... Exercise Gender \\\n", - "0 ... 3 - 4 times per week Male \n", - "1 ... Daily or almost every day Male \n", - "98853 ... NaN NaN \n", - "98854 ... NaN NaN \n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "1 ... Daily or almost every day Male Straight or heterosexual \n", + "98853 ... NaN NaN NaN \n", + "98854 ... NaN NaN NaN \n", "\n", - " SexualOrientation EducationParents \\\n", - "0 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "1 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "98853 NaN NaN \n", - "98854 NaN NaN \n", + " EducationParents RaceEthnicity \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", "\n", - " RaceEthnicity Age Dependents MilitaryUS \\\n", - "0 Black or of African descent 25 - 34 years old Yes NaN \n", - "1 White or of European descent 35 - 44 years old Yes NaN \n", - "98853 NaN NaN NaN NaN \n", - "98854 NaN NaN NaN NaN \n", + " Age Dependents MilitaryUS \\\n", + "0 25 - 34 years old Yes NaN \n", + "1 35 - 44 years old Yes NaN \n", + "98853 NaN NaN NaN \n", + "98854 NaN NaN NaN \n", "\n", " SurveyTooLong SurveyEasy \n", - "0 The survey was an appropriate length Very easy \n", + "0 The survey was an appropriate length Very easy \n", "1 The survey was an appropriate length Somewhat easy \n", - "98853 NaN NaN \n", - "98854 NaN NaN \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", "\n", "[4 rows x 129 columns]" ] @@ -658,7 +659,7 @@ "# combine head and tail variant 2\n", "# ranges with iloc\n", "rows = 2\n", - "df.iloc[pd.np.r_[:rows, -rows:0]]" + "df.iloc[np.r_[:rows, -rows:0]]" ] }, { @@ -669,16 +670,16 @@ { "data": { "text/plain": [ - "0 JavaScript;Python;HTML;CSS \n", - "1 JavaScript;Python;Bash/Shell \n", - "2 NaN \n", + "0 JavaScript;Python;HTML;CSS\n", + "1 JavaScript;Python;Bash/Shell\n", + "2 NaN\n", "3 C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell\n", - "4 C;C++;Java;Matlab;R;SQL;Bash/Shell \n", - "98850 NaN \n", - "98851 NaN \n", - "98852 NaN \n", - "98853 NaN \n", - "98854 NaN \n", + "4 C;C++;Java;Matlab;R;SQL;Bash/Shell\n", + "98850 NaN\n", + "98851 NaN\n", + "98852 NaN\n", + "98853 NaN\n", + "98854 NaN\n", "Name: LanguageWorkedWith, dtype: object" ] }, @@ -690,7 +691,7 @@ "source": [ "# get examples from column LanguageWorkedWith\n", "rows = 5\n", - "df.LanguageWorkedWith.iloc[pd.np.r_[:rows, -rows:0]]" + "df.LanguageWorkedWith.iloc[np.r_[:rows, -rows:0]]" ] }, { @@ -701,16 +702,16 @@ { "data": { "text/plain": [ - "C#;JavaScript;SQL;HTML;CSS 1347\n", - "JavaScript;PHP;SQL;HTML;CSS 1235\n", - "Java 1030\n", - "JavaScript;HTML;CSS 881 \n", - "C#;JavaScript;SQL;TypeScript;HTML;CSS 828 \n", - "C;Go;Hack;Java;JavaScript;Perl;PHP;Python;SQL;TypeScript;HTML;CSS;Bash/Shell 1 \n", - "C;C++;Java;JavaScript;PHP;SQL;VBA;Visual Basic 6;HTML;CSS 1 \n", - "Assembly;C;C++;Java;JavaScript;Matlab;PHP;Python;R;SQL;TypeScript;Visual Basic 6;HTML;CSS 1 \n", - "C;C++;Java;JavaScript;Matlab;PHP;Python;Ruby;SQL;HTML;CSS 1 \n", - "Java;JavaScript;PHP;Scala;SQL;Kotlin;HTML;CSS;Bash/Shell 1 \n", + "C#;JavaScript;SQL;HTML;CSS 1347\n", + "JavaScript;PHP;SQL;HTML;CSS 1235\n", + "Java 1030\n", + "JavaScript;HTML;CSS 881\n", + "C#;JavaScript;SQL;TypeScript;HTML;CSS 828\n", + "C;C++;C#;Java;Python;SQL;Swift;HTML;CSS;Bash/Shell 1\n", + "C;C#;Java;JavaScript;PHP;Python;SQL;VBA;VB.NET;HTML;CSS;Bash/Shell 1\n", + "C#;Objective-C;PHP;Python;Swift;HTML;CSS;Bash/Shell 1\n", + "C#;Java;JavaScript;Objective-C;Perl;PHP;Python;SQL;Swift;TypeScript;VBA;VB.NET;HTML;CSS;Bash/Shell 1\n", + "C#;CoffeeScript;F#;JavaScript;SQL;TypeScript;HTML;CSS 1\n", "Name: LanguageWorkedWith, dtype: int64" ] }, @@ -721,7 +722,7 @@ ], "source": [ "# value counts for the same column\n", - "df.LanguageWorkedWith.value_counts().iloc[pd.np.r_[:rows, -rows:0]]" + "df.LanguageWorkedWith.value_counts().iloc[np.r_[:rows, -rows:0]]" ] }, { @@ -943,19 +944,19 @@ "" ], "text/plain": [ - " 0 1 2 3 4 5 6 \\\n", - "0 JavaScript Python HTML CSS None None None \n", - "1 JavaScript Python Bash/Shell None None None None \n", - "3 C# JavaScript SQL TypeScript HTML CSS Bash/Shell \n", - "4 C C++ Java Matlab R SQL Bash/Shell \n", - "5 Java JavaScript Python TypeScript HTML CSS None \n", + " 0 1 2 3 4 5 6 \\\n", + "0 JavaScript Python HTML CSS None None None \n", + "1 JavaScript Python Bash/Shell None None None None \n", + "3 C# JavaScript SQL TypeScript HTML CSS Bash/Shell \n", + "4 C C++ Java Matlab R SQL Bash/Shell \n", + "5 Java JavaScript Python TypeScript HTML CSS None \n", "\n", - " 7 8 9 ... 28 29 30 31 32 33 34 35 \\\n", - "0 None None None ... None None None None None None None None \n", - "1 None None None ... None None None None None None None None \n", - "3 None None None ... None None None None None None None None \n", - "4 None None None ... None None None None None None None None \n", - "5 None None None ... None None None None None None None None \n", + " 7 8 9 ... 28 29 30 31 32 33 34 35 \\\n", + "0 None None None ... None None None None None None None None \n", + "1 None None None ... None None None None None None None None \n", + "3 None None None ... None None None None None None None None \n", + "4 None None None ... None None None None None None None None \n", + "5 None None None ... None None None None None None None None \n", "\n", " 36 37 \n", "0 None None \n", @@ -1152,26 +1153,26 @@ "" ], "text/plain": [ - " 0 1 2 3 4 5 6 7 \\\n", - "Assembly 5760.0 NaN NaN NaN NaN NaN NaN NaN \n", - "Bash/Shell 29.0 465.0 1221.0 1929.0 2882.0 4442.0 4844.0 4269.0 \n", - "C 13335.0 4707.0 NaN NaN NaN NaN NaN NaN \n", - "C# 16969.0 4321.0 3990.0 1674.0 NaN NaN NaN NaN \n", - "C++ 7042.0 9275.0 3555.0 NaN NaN NaN NaN NaN \n", + " 0 1 2 3 4 5 6 7 \\\n", + "Assembly 5760.0 NaN NaN NaN NaN NaN NaN NaN \n", + "Bash/Shell 29.0 465.0 1221.0 1929.0 2882.0 4442.0 4844.0 4269.0 \n", + "C 13335.0 4707.0 NaN NaN NaN NaN NaN NaN \n", + "C# 16969.0 4321.0 3990.0 1674.0 NaN NaN NaN NaN \n", + "C++ 7042.0 9275.0 3555.0 NaN NaN NaN NaN NaN \n", "\n", - " 8 9 ... 28 29 30 31 32 33 34 35 36 \\\n", - "Assembly NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", - "Bash/Shell 3311.0 2562.0 ... 3.0 1.0 2.0 2.0 NaN 1.0 NaN NaN 2.0 \n", - "C NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", - "C# NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", - "C++ NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + " 8 9 ... 28 29 30 31 32 33 34 35 36 \\\n", + "Assembly NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + "Bash/Shell 3311.0 2562.0 ... 3.0 1.0 2.0 2.0 NaN 1.0 NaN NaN 2.0 \n", + "C NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C# NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C++ NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "\n", " 37 \n", - "Assembly NaN \n", + "Assembly NaN \n", "Bash/Shell 35.0 \n", - "C NaN \n", - "C# NaN \n", - "C++ NaN \n", + "C NaN \n", + "C# NaN \n", + "C++ NaN \n", "\n", "[5 rows x 38 columns]" ] @@ -1364,33 +1365,26 @@ "" ], "text/plain": [ - " 0 1 2 3 4 5 \\\n", - "Assembly 0.073531 NaN NaN NaN NaN NaN \n", + " 0 1 2 3 4 5 \\\n", + "Assembly 0.073531 NaN NaN NaN NaN NaN \n", "Bash/Shell 0.000370 0.005936 0.015587 0.024625 0.036791 0.056706 \n", - "C 0.170233 0.060089 NaN NaN NaN NaN \n", - "C# 0.216624 0.055161 0.050936 0.021370 NaN NaN \n", - "C++ 0.089897 0.118403 0.045383 NaN NaN NaN \n", + "C 0.170233 0.060089 NaN NaN NaN NaN \n", + "C# 0.216624 0.055161 0.050936 0.021370 NaN NaN \n", + "C++ 0.089897 0.118403 0.045383 NaN NaN NaN \n", "\n", - " 6 7 8 9 ... 28 \\\n", - "Assembly NaN NaN NaN NaN ... NaN \n", - "Bash/Shell 0.061838 0.054497 0.042268 0.032706 ... 0.000038 \n", - "C NaN NaN NaN NaN ... NaN \n", - "C# NaN NaN NaN NaN ... NaN \n", - "C++ NaN NaN NaN NaN ... NaN \n", + " 6 7 8 9 ... 28 29 \\\n", + "Assembly NaN NaN NaN NaN ... NaN NaN \n", + "Bash/Shell 0.061838 0.054497 0.042268 0.032706 ... 0.000038 0.000013 \n", + "C NaN NaN NaN NaN ... NaN NaN \n", + "C# NaN NaN NaN NaN ... NaN NaN \n", + "C++ NaN NaN NaN NaN ... NaN NaN \n", "\n", - " 29 30 31 32 33 34 35 36 \\\n", - "Assembly NaN NaN NaN NaN NaN NaN NaN NaN \n", - "Bash/Shell 0.000013 0.000026 0.000026 NaN 0.000013 NaN NaN 0.000026 \n", - "C NaN NaN NaN NaN NaN NaN NaN NaN \n", - "C# NaN NaN NaN NaN NaN NaN NaN NaN \n", - "C++ NaN NaN NaN NaN NaN NaN NaN NaN \n", - "\n", - " 37 \n", - "Assembly NaN \n", - "Bash/Shell 0.000447 \n", - "C NaN \n", - "C# NaN \n", - "C++ NaN \n", + " 30 31 32 33 34 35 36 37 \n", + "Assembly NaN NaN NaN NaN NaN NaN NaN NaN \n", + "Bash/Shell 0.000026 0.000026 NaN 0.000013 NaN NaN 0.000026 0.000447 \n", + "C NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C# NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C++ NaN NaN NaN NaN NaN NaN NaN NaN \n", "\n", "[5 rows x 38 columns]" ] @@ -1419,7 +1413,7 @@ "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# why for value counts and parameters you need lambda\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf_lang_per\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf_lang\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfillna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSeries\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnormalize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# why for value counts and parameters you need lambda\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf_lang_per\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf_lang\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfillna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSeries\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnormalize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: value_counts() missing 1 required positional argument: 'self'" ] } @@ -1438,15 +1432,15 @@ "data": { "text/plain": [ "0 31.800036\n", - "JavaScript 0.698113 \n", - "HTML 0.684607 \n", - "CSS 0.650790 \n", - "SQL 0.570250 \n", - "Java 0.453456 \n", - "Bash/Shell 0.397937 \n", - "Python 0.387558 \n", - "C# 0.344091 \n", - "PHP 0.307287 \n", + "JavaScript 0.698113\n", + "HTML 0.684607\n", + "CSS 0.650790\n", + "SQL 0.570250\n", + "Java 0.453456\n", + "Bash/Shell 0.397937\n", + "Python 0.387558\n", + "C# 0.344091\n", + "PHP 0.307287\n", "Name: total, dtype: float64" ] }, @@ -1470,10 +1464,10 @@ "data": { "text/plain": [ "0 2491024.0\n", - "JavaScript 54686.0 \n", - "HTML 53628.0 \n", - "CSS 50979.0 \n", - "SQL 44670.0 \n", + "JavaScript 54686.0\n", + "HTML 53628.0\n", + "CSS 50979.0\n", + "SQL 44670.0\n", "Name: total, dtype: float64" ] }, @@ -1664,19 +1658,19 @@ "" ], "text/plain": [ - " 0 1 2 3 4 5 6 \\\n", - "0 JavaScript Python HTML CSS None None None \n", - "1 JavaScript Python Bash/Shell None None None None \n", - "3 C# JavaScript SQL TypeScript HTML CSS Bash/Shell \n", - "4 C C++ Java Matlab R SQL Bash/Shell \n", - "5 Java JavaScript Python TypeScript HTML CSS None \n", + " 0 1 2 3 4 5 6 \\\n", + "0 JavaScript Python HTML CSS None None None \n", + "1 JavaScript Python Bash/Shell None None None None \n", + "3 C# JavaScript SQL TypeScript HTML CSS Bash/Shell \n", + "4 C C++ Java Matlab R SQL Bash/Shell \n", + "5 Java JavaScript Python TypeScript HTML CSS None \n", "\n", - " 7 8 9 ... 28 29 30 31 32 33 34 35 \\\n", - "0 None None None ... None None None None None None None None \n", - "1 None None None ... None None None None None None None None \n", - "3 None None None ... None None None None None None None None \n", - "4 None None None ... None None None None None None None None \n", - "5 None None None ... None None None None None None None None \n", + " 7 8 9 ... 28 29 30 31 32 33 34 35 \\\n", + "0 None None None ... None None None None None None None None \n", + "1 None None None ... None None None None None None None None \n", + "3 None None None ... None None None None None None None None \n", + "4 None None None ... None None None None None None None None \n", + "5 None None None ... None None None None None None None None \n", "\n", " 36 37 \n", "0 None None \n", @@ -1709,7 +1703,7 @@ "C 13335\n", "JavaScript 12150\n", "Java 12087\n", - "C++ 7042 \n", + "C++ 7042\n", "Name: 0, dtype: int64" ] }, @@ -1733,9 +1727,9 @@ "text/plain": [ "JavaScript 19532\n", "Java 10175\n", - "C++ 9275 \n", - "PHP 6450 \n", - "C 4707 \n", + "C++ 9275\n", + "PHP 6450\n", + "C 4707\n", "Name: 1, dtype: int64" ] }, @@ -1754,16 +1748,6 @@ "execution_count": 19, "metadata": {}, "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/generic.py:7441: RuntimeWarning: '<' not supported between instances of 'str' and 'float', sort order is undefined for incomparable objects\n", - " return_indexers=True)\n", - "/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/generic.py:7441: RuntimeWarning: '<' not supported between instances of 'float' and 'str', sort order is undefined for incomparable objects\n", - " return_indexers=True)\n" - ] - }, { "data": { "text/plain": [ @@ -1878,10 +1862,10 @@ "CSS 50979\n", "SQL 44670\n", "Java 35521\n", - "Rust 1857 \n", - "Kotlin 3508 \n", - "Cobol 590 \n", - "Ocaml 470 \n", + "Rust 1857\n", + "Kotlin 3508\n", + "Cobol 590\n", + "Ocaml 470\n", "CSS 50979" ] }, @@ -1984,11 +1968,11 @@ "CSS 50979\n", "SQL 44670\n", "Java 35521\n", - "Erlang 886 \n", - "Cobol 590 \n", - "Ocaml 470 \n", - "Julia 430 \n", - "Hack 254 " + "Erlang 886\n", + "Cobol 590\n", + "Ocaml 470\n", + "Julia 430\n", + "Hack 254" ] }, "execution_count": 22, @@ -2001,12 +1985,65 @@ "df_comb.head(rows).append(df_comb.tail(rows))" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note**: In some cases the iteration example is not working properly - when the first column doesn't contain all values. It can be replaced with the example below:" + ] + }, { "cell_type": "code", - "execution_count": null, + "execution_count": 24, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "JavaScript 54686.0\n", + "HTML 53628.0\n", + "CSS 50979.0\n", + "SQL 44670.0\n", + "Java 35521.0\n", + "Bash/Shell 31172.0\n", + "Python 30359.0\n", + "C# 26954.0\n", + "PHP 24071.0\n", + "C++ 19872.0\n", + "Delphi/Object Pascal 2025.0\n", + "Haskell 1961.0\n", + "Rust 1857.0\n", + "F# 1115.0\n", + "Clojure 1032.0\n", + "Erlang 886.0\n", + "Cobol 590.0\n", + "Ocaml 470.0\n", + "Julia 430.0\n", + "Hack 254.0\n", + "dtype: float64" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_comb = pd.DataFrame()\n", + "temp = []\n", + "val_count_tmp = pd.Series(dtype=float)\n", + "\n", + "# sum all columns in dataframe with iteration\n", + "for col in df_lang.columns:\n", + " temp.append(df_lang[col].fillna(0).value_counts())\n", + "\n", + "for val_count in temp:\n", + " val_count_tmp = val_count_tmp.add(val_count,fill_value=0)\n", + "\n", + "y = val_count_tmp.dropna().drop(0) \n", + "y.sort_values(ascending=False, inplace=True)\n", + "y.head(10).append(y.tail(10))" + ] }, { "cell_type": "code", @@ -2032,7 +2069,20 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.9" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false } }, "nbformat": 4, diff --git a/notebooks/pandas/Pandas_extract_url_or_dates_from_column.ipynb b/notebooks/pandas/Pandas_extract_url_or_dates_from_column.ipynb new file mode 100644 index 0000000..fe659f2 --- /dev/null +++ b/notebooks/pandas/Pandas_extract_url_or_dates_from_column.ipynb @@ -0,0 +1,786 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Python Pandas extract URL or date by regex" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "\n", + "# Reading the CSV file as it is\n", + "result = pd.read_csv('../csv/url_dates.csv') " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "pd.set_option('display.max_columns', None) # or 1000\n", + "pd.set_option('display.max_rows', None) # or 1000\n", + "pd.set_option('display.max_colwidth', -1) # or 199" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
log
02019-10-28 19:56:03 DEMO <GET https://www.wikipedia.org/> (The Free Encyclopedia) 2019-10-29 9:06:03
12019-10-29 19:56:03 DEMO <GET https://en.wikipedia.org/wiki/Main_Page> (5,962,233 articles in English) 2019-10-31 11:16:43
22019-10-29 19:56:03 DEMO <GET https://it.wikipedia.org/wiki/Pagina_principale> (1 561 730 voci in italiano) 2019-10-30 21:15:23
32019-10-30 19:56:03 DEMO <GET https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal> (1 014 783 artigos em português) 2019-10-30 20:26:35
\n", + "
" + ], + "text/plain": [ + " log\n", + "0 2019-10-28 19:56:03 DEMO (The Free Encyclopedia) 2019-10-29 9:06:03 \n", + "1 2019-10-29 19:56:03 DEMO (5,962,233 articles in English) 2019-10-31 11:16:43 \n", + "2 2019-10-29 19:56:03 DEMO (1 561 730 voci in italiano) 2019-10-30 21:15:23 \n", + "3 2019-10-30 19:56:03 DEMO (1 014 783 artigos em português) 2019-10-30 20:26:35" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Checking sample data\n", + "result.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# URL extraction from Dataframe" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# extract urls by matching protocol - https and end >\n", + "# first part is a matching group while the ending is a non matching group\n", + "result['url'] = result.log.str.extract(r'(https.*)(?:>)').head()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
logurl
22019-10-29 19:56:03 DEMO <GET https://it.wikipedia.org/wiki/Pagina_principale> (1 561 730 voci in italiano) 2019-10-30 21:15:23https://it.wikipedia.org/wiki/Pagina_principale
\n", + "
" + ], + "text/plain": [ + " log \\\n", + "2 2019-10-29 19:56:03 DEMO (1 561 730 voci in italiano) 2019-10-30 21:15:23 \n", + "\n", + " url \n", + "2 https://it.wikipedia.org/wiki/Pagina_principale " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# filtering results if needed\n", + "result[result['url'].str.contains('it.wikipedia.org')]" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# extract urls by matching protocol - https and end >\n", + "# first part is a matching group while the ending is a non matching group\n", + "result['url'] = result.log.str.extract(r'(https.*)(?:>)').head()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0
0https://www.wikipedia.org/>
1https://en.wikipedia.org/wiki/Main_Page>
2https://it.wikipedia.org/wiki/Pagina_principale>
3https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal>
\n", + "
" + ], + "text/plain": [ + " 0\n", + "0 https://www.wikipedia.org/> \n", + "1 https://en.wikipedia.org/wiki/Main_Page> \n", + "2 https://it.wikipedia.org/wiki/Pagina_principale> \n", + "3 https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal>" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# examples\n", + "\n", + "result.log.str.extract(r'(https?:\\/\\/(?:www\\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\\.[^\\s]{2,}|www\\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\\.[^\\s]{2,}|https?:\\/\\/(?:www\\.|(?!www))[a-zA-Z0-9]+\\.[^\\s]{2,}|www\\.[a-zA-Z0-9]+\\.[^\\s]{2,})').head()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
012345
0httpsNaNwww.wikipedia.org/>NaNNaNNaN
1httpsNaNen.wikipedia.org/wiki/Main_Page>NaNNaNNaN
2httpsNaNit.wikipedia.org/wiki/Pagina_principale>NaNNaNNaN
3httpsNaNpt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal>NaNNaNNaN
\n", + "
" + ], + "text/plain": [ + " 0 1 2 \\\n", + "0 https NaN www.wikipedia.org/> \n", + "1 https NaN en.wikipedia.org/wiki/Main_Page> \n", + "2 https NaN it.wikipedia.org/wiki/Pagina_principale> \n", + "3 https NaN pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal> \n", + "\n", + " 3 4 5 \n", + "0 NaN NaN NaN \n", + "1 NaN NaN NaN \n", + "2 NaN NaN NaN \n", + "3 NaN NaN NaN " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# examples\n", + "result.log.str.extract(r'(ftp|http|https):\\/\\/(\\w+:{0,1}\\w*@)?(\\S+)(:[0-9]+)?(\\/|\\/([\\w#!:.?+=&%@!\\-\\/]))?').head()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Date extraction from Dataframe" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "# extract single date\n", + "result['date'] = result.log.str.extract(r'(\\d{4}-\\d{2}-\\d{2})')" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2019-10-28\n", + "1 2019-10-29\n", + "2 2019-10-29\n", + "3 2019-10-30\n", + "Name: date, dtype: object" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result['date']" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0
match
002019-10-28
12019-10-29
102019-10-29
12019-10-31
202019-10-29
12019-10-30
302019-10-30
12019-10-30
\n", + "
" + ], + "text/plain": [ + " 0\n", + " match \n", + "0 0 2019-10-28\n", + " 1 2019-10-29\n", + "1 0 2019-10-29\n", + " 1 2019-10-31\n", + "2 0 2019-10-29\n", + " 1 2019-10-30\n", + "3 0 2019-10-30\n", + " 1 2019-10-30" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# extract multiple dates\n", + "result.log.str.extractall(r'(\\d{4}-\\d{2}-\\d{2})')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0
match01
02019-10-282019-10-29
12019-10-292019-10-31
22019-10-292019-10-30
32019-10-302019-10-30
\n", + "
" + ], + "text/plain": [ + " 0 \n", + "match 0 1\n", + "0 2019-10-28 2019-10-29\n", + "1 2019-10-29 2019-10-31\n", + "2 2019-10-29 2019-10-30\n", + "3 2019-10-30 2019-10-30" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# unstack the multiindex\n", + "result.log.str.extractall(r'(\\d{4}-\\d{2}-\\d{2})').unstack()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "# extract datetime\n", + "result['datetime'] = result.log.str.extract(r'(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2})')" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2019-10-28 19:56:03\n", + "1 2019-10-29 19:56:03\n", + "2 2019-10-29 19:56:03\n", + "3 2019-10-30 19:56:03\n", + "Name: datetime, dtype: object" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result['datetime']" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "# match datetime extract only date\n", + "result['date'] = result.log.str.extract(r'(\\d{4}-\\d{2}-\\d{2}) (?:\\d{2}-\\d{2}-\\d{2})')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 NaN\n", + "1 NaN\n", + "2 NaN\n", + "3 NaN\n", + "Name: date, dtype: object" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result['date']" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "# match datetime extract only date\n", + "result[['date', 'time']] = result.log.str.extract(r'(\\d{4}-\\d{2}-\\d{2}) (\\d{2}:\\d{2}:\\d{2})')" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
datetime
02019-10-2819:56:03
12019-10-2919:56:03
22019-10-2919:56:03
32019-10-3019:56:03
\n", + "
" + ], + "text/plain": [ + " date time\n", + "0 2019-10-28 19:56:03\n", + "1 2019-10-29 19:56:03\n", + "2 2019-10-29 19:56:03\n", + "3 2019-10-30 19:56:03" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result[['date', 'time']]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Split URLs" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "result['url_split'] = 'https' + result.log.str.split('https', expand=True)[1].str.split('>', expand=True)[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 https://www.wikipedia.org/ \n", + "1 https://en.wikipedia.org/wiki/Main_Page \n", + "2 https://it.wikipedia.org/wiki/Pagina_principale \n", + "3 https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal\n", + "Name: url_split, dtype: object" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result['url_split']" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/pandas/pandas-use-list-values-select-rows-column.ipynb b/notebooks/pandas/pandas-use-list-values-select-rows-column.ipynb new file mode 100644 index 0000000..41e2e0e --- /dev/null +++ b/notebooks/pandas/pandas-use-list-values-select-rows-column.ipynb @@ -0,0 +1,1453 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas use a list of values to select rows from a column\n", + "\n", + "* filter pandas rows by exact match from a list\n", + "* filter pandas rows by partial match from a list\n", + "\n", + "Bonus\n", + "\n", + "* execute value counts on multiple columns\n", + "* vectorized operations\n", + "\n", + "> Vectorization is the process of executing operations on entire arrays. " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "pd.set_option('display.max_colwidth', -1)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(98855, 129)\n" + ] + } + ], + "source": [ + "# read the data frame and see the data insight\n", + "df = pd.read_csv(\"../csv/stackoverflow/developer_survey_2018/survey_results_public.csv\", low_memory=False)\n", + "print(df.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
RespondentHobbyOpenSourceCountryStudentEmploymentFormalEducationUndergradMajorCompanySizeDevType...ExerciseGenderSexualOrientationEducationParentsRaceEthnicityAgeDependentsMilitaryUSSurveyTooLongSurveyEasy
01YesNoKenyaNoEmployed part-timeBachelor’s degree (BA, BS, B.Eng., etc.)Mathematics or statistics20 to 99 employeesFull-stack developer...3 - 4 times per weekMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)Black or of African descent25 - 34 years oldYesNaNThe survey was an appropriate lengthVery easy
13YesYesUnited KingdomNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)A natural science (ex. biology, chemistry, physics)10,000 or more employeesDatabase administrator;DevOps specialist;Full-stack developer;System administrator...Daily or almost every dayMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)White or of European descent35 - 44 years oldYesNaNThe survey was an appropriate lengthSomewhat easy
\n", + "

2 rows × 129 columns

\n", + "
" + ], + "text/plain": [ + " Respondent Hobby OpenSource Country Student Employment \\\n", + "0 1 Yes No Kenya No Employed part-time \n", + "1 3 Yes Yes United Kingdom No Employed full-time \n", + "\n", + " FormalEducation \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "\n", + " UndergradMajor \\\n", + "0 Mathematics or statistics \n", + "1 A natural science (ex. biology, chemistry, physics) \n", + "\n", + " CompanySize \\\n", + "0 20 to 99 employees \n", + "1 10,000 or more employees \n", + "\n", + " DevType \\\n", + "0 Full-stack developer \n", + "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", + "\n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "1 ... Daily or almost every day Male Straight or heterosexual \n", + "\n", + " EducationParents RaceEthnicity \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent \n", + "\n", + " Age Dependents MilitaryUS \\\n", + "0 25 - 34 years old Yes NaN \n", + "1 35 - 44 years old Yes NaN \n", + "\n", + " SurveyTooLong SurveyEasy \n", + "0 The survey was an appropriate length Very easy \n", + "1 The survey was an appropriate length Somewhat easy \n", + "\n", + "[2 rows x 129 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Computer science, computer engineering, or software engineering 50336\n", + "Another engineering discipline (ex. civil, electrical, mechanical) 6945 \n", + "Information systems, information technology, or system administration 6507 \n", + "A natural science (ex. biology, chemistry, physics) 3050 \n", + "Mathematics or statistics 2818 \n", + "Web development or web design 2418 \n", + "A business discipline (ex. accounting, finance, marketing) 1921 \n", + "A humanities discipline (ex. literature, history, philosophy) 1590 \n", + "A social science (ex. anthropology, psychology, political science) 1377 \n", + "Fine arts or performing arts (ex. graphic design, music, studio art) 1135 \n", + "I never declared a major 693 \n", + "A health science (ex. nursing, pharmacy, radiology) 246 \n", + "Name: UndergradMajor, dtype: int64" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.UndergradMajor.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
RespondentHobbyOpenSourceCountryStudentEmploymentFormalEducationUndergradMajorCompanySizeDevType...ExerciseGenderSexualOrientationEducationParentsRaceEthnicityAgeDependentsMilitaryUSSurveyTooLongSurveyEasy
01YesNoKenyaNoEmployed part-timeBachelor’s degree (BA, BS, B.Eng., etc.)Mathematics or statistics20 to 99 employeesFull-stack developer...3 - 4 times per weekMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)Black or of African descent25 - 34 years oldYesNaNThe survey was an appropriate lengthVery easy
3251YesNoUnited StatesNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)Web development or web design500 to 999 employeesBack-end developer;Designer;Front-end developer;Full-stack developer;Marketing or sales professional;Mobile developer...Daily or almost every dayFemaleStraight or heterosexualAssociate degreeWhite or of European descent18 - 24 years oldNoNoThe survey was an appropriate lengthVery easy
82124YesYesUnited KingdomNoEmployed full-timeMaster’s degree (MA, MS, M.Eng., MBA, etc.)Mathematics or statistics10,000 or more employeesBack-end developer;DevOps specialist;Front-end developer;Full-stack developer;Mobile developer...1 - 2 times per weekMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)White or of European descent25 - 34 years oldYesNaNThe survey was an appropriate lengthVery easy
84126YesYesArgentinaYes, part-timeEmployed full-timeSome college/university study without earning a degreeWeb development or web designFewer than 10 employeesMobile developer...1 - 2 times per weekMaleStraight or heterosexualSome college/university study without earning a degreeNaN25 - 34 years oldNoNaNThe survey was an appropriate lengthVery easy
148230YesYesUnited StatesNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)Mathematics or statistics1,000 to 4,999 employeesData scientist or machine learning specialist...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", + "

5 rows × 129 columns

\n", + "
" + ], + "text/plain": [ + " Respondent Hobby OpenSource Country Student \\\n", + "0 1 Yes No Kenya No \n", + "32 51 Yes No United States No \n", + "82 124 Yes Yes United Kingdom No \n", + "84 126 Yes Yes Argentina Yes, part-time \n", + "148 230 Yes Yes United States No \n", + "\n", + " Employment \\\n", + "0 Employed part-time \n", + "32 Employed full-time \n", + "82 Employed full-time \n", + "84 Employed full-time \n", + "148 Employed full-time \n", + "\n", + " FormalEducation \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "32 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "82 Master’s degree (MA, MS, M.Eng., MBA, etc.) \n", + "84 Some college/university study without earning a degree \n", + "148 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "\n", + " UndergradMajor CompanySize \\\n", + "0 Mathematics or statistics 20 to 99 employees \n", + "32 Web development or web design 500 to 999 employees \n", + "82 Mathematics or statistics 10,000 or more employees \n", + "84 Web development or web design Fewer than 10 employees \n", + "148 Mathematics or statistics 1,000 to 4,999 employees \n", + "\n", + " DevType \\\n", + "0 Full-stack developer \n", + "32 Back-end developer;Designer;Front-end developer;Full-stack developer;Marketing or sales professional;Mobile developer \n", + "82 Back-end developer;DevOps specialist;Front-end developer;Full-stack developer;Mobile developer \n", + "84 Mobile developer \n", + "148 Data scientist or machine learning specialist \n", + "\n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "32 ... Daily or almost every day Female Straight or heterosexual \n", + "82 ... 1 - 2 times per week Male Straight or heterosexual \n", + "84 ... 1 - 2 times per week Male Straight or heterosexual \n", + "148 ... NaN NaN NaN \n", + "\n", + " EducationParents \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "32 Associate degree \n", + "82 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "84 Some college/university study without earning a degree \n", + "148 NaN \n", + "\n", + " RaceEthnicity Age Dependents MilitaryUS \\\n", + "0 Black or of African descent 25 - 34 years old Yes NaN \n", + "32 White or of European descent 18 - 24 years old No No \n", + "82 White or of European descent 25 - 34 years old Yes NaN \n", + "84 NaN 25 - 34 years old No NaN \n", + "148 NaN NaN NaN NaN \n", + "\n", + " SurveyTooLong SurveyEasy \n", + "0 The survey was an appropriate length Very easy \n", + "32 The survey was an appropriate length Very easy \n", + "82 The survey was an appropriate length Very easy \n", + "84 The survey was an appropriate length Very easy \n", + "148 NaN NaN \n", + "\n", + "[5 rows x 129 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['UndergradMajor'].isin(['Mathematics or statistics', \n", + " 'Web development or web design'])].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "area_list = ['biology', 'physics', 'Computer', 'enginnering', 'pharmacy', 'psychology', 'graphic design',\n", + " 'music', 'art', 'studio art', 'accounting', 'finance', 'chemistry',]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
biologyphysicsComputerenginneringpharmacypsychologygraphic designmusicartstudio artaccountingfinancechemistry
0FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
1TrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrue
2FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
3FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
4FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
5FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
6FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
7FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
8FalseFalseFalseFalseFalseFalseTrueTrueTrueTrueFalseFalseFalse
9FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
10FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
11FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
12FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
13FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
14NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
15FalseFalseFalseFalseFalseFalseTrueTrueTrueTrueFalseFalseFalse
16FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
17FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueTrueFalse
18NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
19FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueTrueFalse
20FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
21NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
22FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
23NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
24FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
25TrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrue
26FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
27FalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalse
28FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
29FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
\n", + "
" + ], + "text/plain": [ + " biology physics Computer enginnering pharmacy psychology graphic design \\\n", + "0 False False False False False False False \n", + "1 True True False False False False False \n", + "2 False False True False False False False \n", + "3 False False True False False False False \n", + "4 False False True False False False False \n", + "5 False False True False False False False \n", + "6 False False True False False False False \n", + "7 False False True False False False False \n", + "8 False False False False False False True \n", + "9 False False True False False False False \n", + "10 False False False False False False False \n", + "11 False False False False False False False \n", + "12 False False True False False False False \n", + "13 False False False False False False False \n", + "14 NaN NaN NaN NaN NaN NaN NaN \n", + "15 False False False False False False True \n", + "16 False False True False False False False \n", + "17 False False False False False False False \n", + "18 NaN NaN NaN NaN NaN NaN NaN \n", + "19 False False False False False False False \n", + "20 False False False False False False False \n", + "21 NaN NaN NaN NaN NaN NaN NaN \n", + "22 False False True False False False False \n", + "23 NaN NaN NaN NaN NaN NaN NaN \n", + "24 False False True False False False False \n", + "25 True True False False False False False \n", + "26 False False True False False False False \n", + "27 False False False False False True False \n", + "28 False False True False False False False \n", + "29 False False False False False False False \n", + "\n", + " music art studio art accounting finance chemistry \n", + "0 False False False False False False \n", + "1 False False False False False True \n", + "2 False False False False False False \n", + "3 False False False False False False \n", + "4 False False False False False False \n", + "5 False False False False False False \n", + "6 False False False False False False \n", + "7 False False False False False False \n", + "8 True True True False False False \n", + "9 False False False False False False \n", + "10 False False False False False False \n", + "11 False False False False False False \n", + "12 False False False False False False \n", + "13 False False False False False False \n", + "14 NaN NaN NaN NaN NaN NaN \n", + "15 True True True False False False \n", + "16 False False False False False False \n", + "17 False False False True True False \n", + "18 NaN NaN NaN NaN NaN NaN \n", + "19 False False False True True False \n", + "20 False False False False False False \n", + "21 NaN NaN NaN NaN NaN NaN \n", + "22 False False False False False False \n", + "23 NaN NaN NaN NaN NaN NaN \n", + "24 False False False False False False \n", + "25 False False False False False True \n", + "26 False False False False False False \n", + "27 False False False False False False \n", + "28 False False False False False False \n", + "29 False False False False False False " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import re\n", + "area_df = pd.DataFrame(dict((area, df.UndergradMajor.str.contains(area))\n", + " for area in area_list))\n", + "area_df.head(30)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Back-end developer 6417\n", + "Full-stack developer 6104\n", + "Back-end developer;Front-end developer;Full-stack developer 4460\n", + "Mobile developer 3518\n", + "Student 3222\n", + "Name: DevType, dtype: int64" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.DevType.value_counts().head()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "dev_list = ['Mobile', 'Data', 'QA']" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
012345678910111213141516171819
MobileFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalse
DataFalseTrueFalseFalseTrueTrueFalseFalseTrueFalseTrueFalseFalseFalseFalseFalseFalseFalseTrueFalse
QAFalseFalseFalseFalseTrueFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrue
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4 5 6 7 8 9 \\\n", + "Mobile False False False False False False False False False False \n", + "Data False True False False True True False False True False \n", + "QA False False False False True False False True False False \n", + "\n", + " 10 11 12 13 14 15 16 17 18 19 \n", + "Mobile True False False False False False False False False False \n", + "Data True False False False False False False False True False \n", + "QA False False False False False False False False False True " + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import re\n", + "dev_df = pd.DataFrame(dict((dev, df.DevType.str.contains(dev, re.IGNORECASE))\n", + " for dev in dev_list))\n", + "dev_df.head(20).T" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False 85904\n", + "True 6194 \n", + "Name: QA, dtype: int64" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dev_df.QA.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
MobileDataQA
False732947020985904
True18804218896194
\n", + "
" + ], + "text/plain": [ + " Mobile Data QA\n", + "False 73294 70209 85904\n", + "True 18804 21889 6194 " + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dev_df.apply(pd.Series.value_counts)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
MobileQA
False7329485904
True188046194
\n", + "
" + ], + "text/plain": [ + " Mobile QA\n", + "False 73294 85904\n", + "True 18804 6194 " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dev_df[['Mobile','QA']].apply(pd.Series.value_counts)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb b/notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb new file mode 100644 index 0000000..7651657 --- /dev/null +++ b/notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb @@ -0,0 +1,664 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# How to merge multiple CSV files with Python\n", + "Python convert normal JSON to JSON separated lines 3 examples\n", + "\n", + "* Steps to merge multiple CSV(identical) files with Python\n", + "* Steps to merge multiple CSV(identical) files with Python with trace\n", + "* Combine multiple CSV files when the columns are different\n", + "* Bonus: Merge multiple files with Windows/Linux" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['../../csv/data_202001.csv',\n", + " '../../csv/data_202002.csv',\n", + " '../../csv/data_201902.csv',\n", + " '../../csv/data_201901.csv']" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3col4
0EF5e5
1EEFF6ee6
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3 col4\n", + "0 E F 5 e5\n", + "1 EE FF 6 ee6" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3col5
0HJ777
1HHJJ888
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3 col5\n", + "0 H J 7 77\n", + "1 HH JJ 8 88" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
0CD3
1CCDD4
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "0 C D 3\n", + "1 CC DD 4" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
0AB1
1AABB2
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "0 A B 1\n", + "1 AA BB 2" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "all_files = glob.glob(os.path.join(path, \"data_*.csv\"))\n", + "display(all_files)\n", + "for f in all_files:\n", + " display(pd.read_csv(f, sep=','))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Steps to merge multiple CSV(identical) files with Python" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import os, glob\n", + "import pandas as pd\n", + "\n", + "path = \"../../csv/\"\n", + "#path = \"/home/user/data\"\n", + "\n", + "all_files = glob.glob(os.path.join(path, \"data_2019*.csv\"))\n", + "\n", + "all_csv = (pd.read_csv(f, sep=',') for f in all_files)\n", + "df_merged = pd.concat(all_csv, ignore_index=True)\n", + "df_merged.to_csv( \"merged.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0col1col2col3
00CD3
11CCDD4
22AB1
33AABB2
\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 col1 col2 col3\n", + "0 0 C D 3\n", + "1 1 CC DD 4\n", + "2 2 A B 1\n", + "3 3 AA BB 2" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.read_csv('merged.csv')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Steps to merge multiple CSV(identical) files with Python with trace" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3file
0CD3data_201902.csv
1CCDD4data_201902.csv
2AB1data_201901.csv
3AABB2data_201901.csv
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3 file\n", + "0 C D 3 data_201902.csv\n", + "1 CC DD 4 data_201902.csv\n", + "2 A B 1 data_201901.csv\n", + "3 AA BB 2 data_201901.csv" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import os, glob\n", + "import pandas as pd\n", + "\n", + "path = \"../../csv/\"\n", + "\n", + "all_files = glob.glob(os.path.join(path, \"data_2019*.csv\"))\n", + "\n", + "all_df = []\n", + "for f in all_files:\n", + " df = pd.read_csv(f, sep=',')\n", + " df['file'] = f.split('/')[-1]\n", + " all_df.append(df)\n", + " \n", + "merged_df = pd.concat(all_df, ignore_index=True)\n", + "merged_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Combine multiple CSV files when the columns are different" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3col4col5file
0EF5e5NaNdata_202001.csv
1EEFF6ee6NaNdata_202001.csv
2HJ7NaN77.0data_202002.csv
3HHJJ8NaN88.0data_202002.csv
4CD3NaNNaNdata_201902.csv
5CCDD4NaNNaNdata_201902.csv
6AB1NaNNaNdata_201901.csv
7AABB2NaNNaNdata_201901.csv
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3 col4 col5 file\n", + "0 E F 5 e5 NaN data_202001.csv\n", + "1 EE FF 6 ee6 NaN data_202001.csv\n", + "2 H J 7 NaN 77.0 data_202002.csv\n", + "3 HH JJ 8 NaN 88.0 data_202002.csv\n", + "4 C D 3 NaN NaN data_201902.csv\n", + "5 CC DD 4 NaN NaN data_201902.csv\n", + "6 A B 1 NaN NaN data_201901.csv\n", + "7 AA BB 2 NaN NaN data_201901.csv" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import os, glob\n", + "import pandas as pd\n", + "\n", + "path = \"../../csv/\"\n", + "\n", + "all_files = glob.glob(os.path.join(path, \"data_*.csv\"))\n", + "\n", + "\n", + "all_df = []\n", + "for f in all_files:\n", + " df = pd.read_csv(f, sep=',')\n", + " df['file'] = f.split('/')[-1]\n", + " all_df.append(df)\n", + " \n", + "merged_df = pd.concat(all_df, ignore_index=True, sort=True)\n", + "merged_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Bonus: Merge multiple files with Windows/Linux\n", + "\n", + "Linux\n", + "\n", + "`sed 1d data_*.csv > merged.csv`\n", + "\n", + "Windows\n", + "\n", + "`C:\\> copy data_*.csv merged.csv `" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/python/JSON/41._Create_a_table_in_MySQL_Database_from_python_dictionary.ipynb b/notebooks/python/JSON/41._Create_a_table_in_MySQL_Database_from_python_dictionary.ipynb new file mode 100644 index 0000000..3bee813 --- /dev/null +++ b/notebooks/python/JSON/41._Create_a_table_in_MySQL_Database_from_python_dictionary.ipynb @@ -0,0 +1,421 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 41. Create a table in SQL(MySQL Database) from python dictionary\n", + "\n", + "\n", + "[Python convert normal JSON to JSON separated lines 3 examples](https://blog.softhints.com/python-convert-json-to-json-lines/)\n", + "\n", + "* Pandas DataFrame to MySQL\n", + "* Create table from Python Dict\n", + "* connect MySQL database and Python\n", + " * SQLAlchemy\n", + " * PyMySQL" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Python dict which is converted to a Database Table\n", + "\n", + "```json\n", + "{\"id\":1,\"label\":\"A\",\"size\":\"S\"}\n", + "{\"id\":2,\"label\":\"B\",\"size\":\"XL\"}\n", + "{\"id\":3,\"label\":\"C\",\"size\":\"XXl\"}\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Read/Create a Python dict" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idlabelsize
01AS
12BXL
23CXXl
\n", + "
" + ], + "text/plain": [ + " id label size\n", + "0 1 A S\n", + "1 2 B XL\n", + "2 3 C XXl" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "# read normal JSON with pandas\n", + "df = pd.read_json('/home/vanx/Downloads/old/normal_json.json')\n", + "\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'id': {0: 1, 1: 2, 2: 3},\n", + " 'label': {0: 'A', 1: 'B', 2: 'C'},\n", + " 'size': {0: 'S', 1: 'XL', 2: 'XXl'}}" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data_dict = df.to_dict()\n", + "data_dict" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idlabelsize
01AS
12BXL
23CXXl
\n", + "
" + ], + "text/plain": [ + " id label size\n", + "0 1 A S\n", + "1 2 B XL\n", + "2 3 C XXl" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df2 = pd.DataFrame.from_dict(data_dict)\n", + "df2.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Pandas DataFrame to MySQL table with SQLAlchemy" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# connect\n", + "from sqlalchemy import create_engine\n", + "cnx = create_engine('mysql+pymysql://test:pass@localhost/test') " + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "# create table from DataFrame\n", + "df.to_sql('test', cnx, if_exists='replace', index = False)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idlabelsize
01AS
12BXL
23CXXl
\n", + "
" + ], + "text/plain": [ + " id label size\n", + "0 1 A S\n", + "1 2 B XL\n", + "2 3 C XXl" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# query table\n", + "df = pd.read_sql('SELECT * FROM test', cnx)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: Python Dict Insert Records Into a MySQL Database" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "# connect\n", + "import pymysql\n", + "\n", + "connection = pymysql.connect(host='localhost',\n", + " user='test',\n", + " password='pass',\n", + " db='test')\n", + "cursor = connection.cursor()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Create table\n", + "cols = df.columns\n", + "table_name = 'test'\n", + "ddl = \"\"\n", + "for col in cols:\n", + " ddl += \"`{}` text,\".format(col)\n", + "\n", + "sql_create = \"CREATE TABLE IF NOT EXISTS `{}` ({}) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;\".format(table_name, ddl[:-1])\n", + "cursor.execute(sql_create)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "# insert data\n", + "cols = \"`,`\".join([str(i) for i in df.columns.tolist()])\n", + "\n", + "# insert dict records .\n", + "for i,row in df.iterrows():\n", + " sql = \"INSERT INTO `test` (`\" +cols + \"`) VALUES (\" + \"%s,\"*(len(row)-1) + \"%s)\"\n", + " cursor.execute(sql, tuple(row))\n", + " connection.commit()" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "('1', 'A', 'S')\n", + "('2', 'B', 'XL')\n", + "('3', 'C', 'XXl')\n" + ] + } + ], + "source": [ + "# read\n", + "sql = \"SELECT * FROM test\"\n", + "cursor.execute(sql)\n", + "result = cursor.fetchall()\n", + "for i in result:\n", + " print(i)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/python/JSON/42._Convert_MySQL_table_to_Pandas_DataFrame_Python_dictionary.ipynb b/notebooks/python/JSON/42._Convert_MySQL_table_to_Pandas_DataFrame_Python_dictionary.ipynb new file mode 100644 index 0000000..b762e57 --- /dev/null +++ b/notebooks/python/JSON/42._Convert_MySQL_table_to_Pandas_DataFrame_Python_dictionary.ipynb @@ -0,0 +1,222 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 42. Convert MySQL table to Pandas DataFrame(Python dictionary)\n", + "\n", + "\n", + "[How to Convert MySQL Table to Pandas DataFrame / Python Dictionary](https://blog.softhints.com/convert-mysql-table-pandas-dataframe-python-dictionary/)\n", + "\n", + "* [PyMySQL](https://pypi.org/project/PyMySQL/) + [SQLAlchemy](https://pypi.org/project/SQLAlchemy/) - the shortest and easiest way to convert MySQL table to Python dict\n", + "* [mysql.connector](https://pypi.org/project/mysql-connector-python/)\n", + "* [pyodbc](https://pypi.org/project/pyodbc/) in order to connect to MySQL database, read table and convert it to DataFrame or Python dict." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](https://blog.softhints.com/content/images/2020/11/MySQL_table_to_Pandas_DataFrame_to_Python_dict.png)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "password = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1: Convert MySQL Table to DataFrame with PyMySQL + SQLAlchemy " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},\n", + " 'name': {0: 'Emma', 1: 'Ann', 2: 'Kim', 3: 'Olivia', 4: 'Victoria'}}" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sqlalchemy import create_engine\n", + "import pymysql\n", + "import pandas as pd\n", + "\n", + "db_connection_str = 'mysql+pymysql://root:' + password + '@localhost:3306/test'\n", + "db_connection = create_engine(db_connection_str)\n", + "\n", + "df = pd.read_sql('SELECT * FROM girls', con=db_connection)\n", + "df.to_dict()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'id': 1, 'name': 'Emma'},\n", + " {'id': 2, 'name': 'Ann'},\n", + " {'id': 3, 'name': 'Kim'},\n", + " {'id': 4, 'name': 'Olivia'},\n", + " {'id': 5, 'name': 'Victoria'}]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.to_dict('records')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'id': [1, 2, 3, 4, 5], 'name': ['Emma', 'Ann', 'Kim', 'Olivia', 'Victoria']}" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.to_dict('list')" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{0: {'id': 1, 'name': 'Emma'},\n", + " 1: {'id': 2, 'name': 'Ann'},\n", + " 2: {'id': 3, 'name': 'Kim'},\n", + " 3: {'id': 4, 'name': 'Olivia'},\n", + " 4: {'id': 5, 'name': 'Victoria'}}" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.to_dict('index')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2: Convert MySQL Table to DataFrame with mysql.connector" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{0: {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},\n", + " 1: {0: bytearray(b'Emma'),\n", + " 1: bytearray(b'Ann'),\n", + " 2: bytearray(b'Kim'),\n", + " 3: bytearray(b'Olivia'),\n", + " 4: bytearray(b'Victoria')}}" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "import mysql.connector\n", + "\n", + "# Setup MySQL connection\n", + "db = mysql.connector.connect(\n", + " host=\"localhost\", # your host, usually localhost\n", + " user=\"root\", # your username\n", + " password=password, # your password\n", + " database=\"test\" # name of the data base\n", + ") \n", + "\n", + "# You must create a Cursor object. It will let you execute all the queries you need\n", + "cur = db.cursor()\n", + "\n", + "# Use all the SQL you like\n", + "cur.execute(\"SELECT * FROM girls\")\n", + "\n", + "# Put it all to a data frame\n", + "df_sql_data = pd.DataFrame(cur.fetchall())\n", + "\n", + "# Close the session\n", + "db.close()\n", + "\n", + "# Show the data\n", + "df_sql_data.to_dict()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/youtube/Youtube-PewDiePie.ipynb b/notebooks/youtube/Youtube-PewDiePie.ipynb index 1b7bec0..8e457ed 100644 --- a/notebooks/youtube/Youtube-PewDiePie.ipynb +++ b/notebooks/youtube/Youtube-PewDiePie.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ @@ -13,26 +13,26 @@ }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "(359, 8)\n" + "(143, 8)\n" ] } ], "source": [ "df = pd.read_csv(\n", - " \"~/Projects/MYP/Datasets/Youtube/PewDiePie20190210.csv\", sep=\"@\")\n", + " \"~/Projects/MYP/Datasets/Youtube/me20190528.csv\", sep=\"@\")\n", "print(df.shape)" ] }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 3, "metadata": {}, "outputs": [ { @@ -69,94 +69,94 @@ " \n", " \n", " 0\n", - " YOU HAD ONE JOB! - with editor Brad1\n", - " 5,293,108.0\n", - " 385,429.0\n", - " 4,083.0\n", + " PyCharm/IntelliJ fast and auto change of the color theme\n", + " 41.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 29,855.0\n", - " https://www.youtube.com/watch?v=B67OBHNCopk\n", - " SATIRE, reddit, you had one job, onejob\n", + " 2.0\n", + " https://www.youtube.com/embed/SsX9Fl958W0\n", + " https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", " \n", " \n", " 1\n", - " Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE\n", - " 5,358,466.0\n", - " 378,535.0\n", - " 3,951.0\n", + " How to add weather desklet to Linux Mint 19\n", + " 291.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 38,075.0\n", - " https://www.youtube.com/watch?v=kLM_9gBZIqY\n", - " SATIRE\n", + " 0.0\n", + " https://www.youtube.com/embed/-FPY_e0BdJs\n", + " https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", " \n", " \n", " 2\n", - " We broke another WORLD RECORD!\n", - " 8,558,673.0\n", - " 595,622.0\n", - " 7,901.0\n", + " How to easy integrate Google Calendar to Desktop for Linux Mint\n", + " 226.0\n", + " 1.0\n", + " 0.0\n", " 0.0\n", - " 53,664.0\n", - " https://www.youtube.com/watch?v=d1tAfXKc7-c\n", - " SATIRE\n", + " 0.0\n", + " https://www.youtube.com/embed/2evIujisdD0\n", + " https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", " \n", " \n", " 3\n", - " FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~\n", - " 3,609,152.0\n", - " 218,530.0\n", - " 3,126.0\n", + " Pandas use a list of values to select rows from a column\n", + " 45.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 17,595.0\n", - " https://www.youtube.com/watch?v=bMLdNrB5hAo\n", - " SATIRE\n", + " 10.0\n", + " https://www.youtube.com/embed/jlSbo5wmTPQ\n", + " https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", " \n", " \n", " 4\n", - " Don't Laugh Challenge, NEW SEASON!!!!!\n", - " 5,888,465.0\n", - " 569,900.0\n", - " 7,824.0\n", + " Pandas count and percentage by value for a column\n", + " 63.0\n", + " 3.0\n", " 0.0\n", - " 29,373.0\n", - " https://www.youtube.com/watch?v=Zgm_iM3f_ME\n", - " SATIRE\n", + " 0.0\n", + " 0.0\n", + " https://www.youtube.com/embed/P5pxJkv71BU\n", + " https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", " \n", " \n", "\n", "" ], "text/plain": [ - " title Views \\\n", - "0 YOU HAD ONE JOB! - with editor Brad1 5,293,108.0 \n", - "1 Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE 5,358,466.0 \n", - "2 We broke another WORLD RECORD! 8,558,673.0 \n", - "3 FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~ 3,609,152.0 \n", - "4 Don't Laugh Challenge, NEW SEASON!!!!! 5,888,465.0 \n", + " title Views \\\n", + "0 PyCharm/IntelliJ fast and auto change of the color theme 41.0 \n", + "1 How to add weather desklet to Linux Mint 19 291.0 \n", + "2 How to easy integrate Google Calendar to Desktop for Linux Mint 226.0 \n", + "3 Pandas use a list of values to select rows from a column 45.0 \n", + "4 Pandas count and percentage by value for a column 63.0 \n", "\n", - " Like Dislike Favorite Comment \\\n", - "0 385,429.0 4,083.0 0.0 29,855.0 \n", - "1 378,535.0 3,951.0 0.0 38,075.0 \n", - "2 595,622.0 7,901.0 0.0 53,664.0 \n", - "3 218,530.0 3,126.0 0.0 17,595.0 \n", - "4 569,900.0 7,824.0 0.0 29,373.0 \n", + " Like Dislike Favorite Comment \\\n", + "0 0.0 0.0 0.0 2.0 \n", + "1 0.0 0.0 0.0 0.0 \n", + "2 1.0 0.0 0.0 0.0 \n", + "3 3.0 0.0 0.0 10.0 \n", + "4 3.0 0.0 0.0 0.0 \n", "\n", - " videoID \\\n", - "0 https://www.youtube.com/watch?v=B67OBHNCopk \n", - "1 https://www.youtube.com/watch?v=kLM_9gBZIqY \n", - "2 https://www.youtube.com/watch?v=d1tAfXKc7-c \n", - "3 https://www.youtube.com/watch?v=bMLdNrB5hAo \n", - "4 https://www.youtube.com/watch?v=Zgm_iM3f_ME \n", + " videoID \\\n", + "0 https://www.youtube.com/embed/SsX9Fl958W0 \n", + "1 https://www.youtube.com/embed/-FPY_e0BdJs \n", + "2 https://www.youtube.com/embed/2evIujisdD0 \n", + "3 https://www.youtube.com/embed/jlSbo5wmTPQ \n", + "4 https://www.youtube.com/embed/P5pxJkv71BU \n", "\n", - " tags \n", - "0 SATIRE, reddit, you had one job, onejob \n", - "1 SATIRE \n", - "2 SATIRE \n", - "3 SATIRE \n", - "4 SATIRE " + " tags \n", + "0 https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg \n", + "1 https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg \n", + "2 https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg \n", + "3 https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg \n", + "4 https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg " ] }, - "execution_count": 45, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } @@ -168,16 +168,16 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "(359, 8)" + "(143, 8)" ] }, - "execution_count": 46, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -188,7 +188,7 @@ }, { "cell_type": "code", - "execution_count": 64, + "execution_count": 5, "metadata": { "scrolled": true }, @@ -199,7 +199,7 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -224,173 +224,43 @@ " \n", " \n", " 0\n", - " 1\n", - " 2\n", - " 3\n", - " 4\n", - " 5\n", - " 6\n", - " 7\n", - " 8\n", - " 9\n", - " ...\n", - " 38\n", - " 39\n", - " 40\n", - " 41\n", - " 42\n", - " 43\n", - " 44\n", - " 45\n", - " 46\n", - " 47\n", " \n", " \n", " \n", " \n", " 0\n", " True\n", - " True\n", - " True\n", - " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 1\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 2\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 3\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 4\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", "\n", - "

5 rows × 48 columns

\n", "" ], "text/plain": [ - " 0 1 2 3 4 5 6 7 8 9 ... \\\n", - "0 True True True True False False False False False False ... \n", - "1 True False False False False False False False False False ... \n", - "2 True False False False False False False False False False ... \n", - "3 True False False False False False False False False False ... \n", - "4 True False False False False False False False False False ... \n", - "\n", - " 38 39 40 41 42 43 44 45 46 47 \n", - "0 False False False False False False False False False False \n", - "1 False False False False False False False False False False \n", - "2 False False False False False False False False False False \n", - "3 False False False False False False False False False False \n", - "4 False False False False False False False False False False \n", - "\n", - "[5 rows x 48 columns]" + " 0\n", + "0 True\n", + "1 True\n", + "2 True\n", + "3 True\n", + "4 True" ] }, - "execution_count": 48, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -402,16 +272,16 @@ }, { "cell_type": "code", - "execution_count": 53, + "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "RangeIndex(start=0, stop=48, step=1)" + "RangeIndex(start=0, stop=1, step=1)" ] }, - "execution_count": 53, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -423,7 +293,7 @@ }, { "cell_type": "code", - "execution_count": 65, + "execution_count": 8, "metadata": {}, "outputs": [ { @@ -448,173 +318,43 @@ " \n", " \n", " 0\n", - " 1\n", - " 2\n", - " 3\n", - " 4\n", - " 5\n", - " 6\n", - " 7\n", - " 8\n", - " 9\n", - " ...\n", - " 38\n", - " 39\n", - " 40\n", - " 41\n", - " 42\n", - " 43\n", - " 44\n", - " 45\n", - " 46\n", - " 47\n", " \n", " \n", " \n", " \n", " 0\n", " True\n", - " True\n", - " True\n", - " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 1\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 2\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 3\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 4\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", "\n", - "

5 rows × 48 columns

\n", "" ], "text/plain": [ - " 0 1 2 3 4 5 6 7 8 9 ... \\\n", - "0 True True True True False False False False False False ... \n", - "1 True False False False False False False False False False ... \n", - "2 True False False False False False False False False False ... \n", - "3 True False False False False False False False False False ... \n", - "4 True False False False False False False False False False ... \n", - "\n", - " 38 39 40 41 42 43 44 45 46 47 \n", - "0 False False False False False False False False False False \n", - "1 False False False False False False False False False False \n", - "2 False False False False False False False False False False \n", - "3 False False False False False False False False False False \n", - "4 False False False False False False False False False False \n", - "\n", - "[5 rows x 48 columns]" + " 0\n", + "0 True\n", + "1 True\n", + "2 True\n", + "3 True\n", + "4 True" ] }, - "execution_count": 65, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } @@ -625,7 +365,7 @@ }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 9, "metadata": {}, "outputs": [ { @@ -650,173 +390,43 @@ " \n", " \n", " 0\n", - " 1\n", - " 2\n", - " 3\n", - " 4\n", - " 5\n", - " 6\n", - " 7\n", - " 8\n", - " 9\n", - " ...\n", - " 38\n", - " 39\n", - " 40\n", - " 41\n", - " 42\n", - " 43\n", - " 44\n", - " 45\n", - " 46\n", - " 47\n", " \n", " \n", " \n", " \n", " 0\n", - " SATIRE\n", - " reddit\n", - " you had one job\n", - " onejob\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " ...\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", + " https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", " \n", " \n", " 1\n", - " SATIRE\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " ...\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", + " https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", " \n", " \n", " 2\n", - " SATIRE\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " ...\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", + " https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", " \n", " \n", " 3\n", - " SATIRE\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " ...\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", + " https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", " \n", " \n", " 4\n", - " SATIRE\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " ...\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", + " https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", " \n", " \n", "\n", - "

5 rows × 48 columns

\n", "" ], "text/plain": [ - " 0 1 2 3 4 5 6 7 8 \\\n", - "0 SATIRE reddit you had one job onejob None None None None None \n", - "1 SATIRE None None None None None None None None \n", - "2 SATIRE None None None None None None None None \n", - "3 SATIRE None None None None None None None None \n", - "4 SATIRE None None None None None None None None \n", - "\n", - " 9 ... 38 39 40 41 42 43 44 45 46 47 \n", - "0 None ... None None None None None None None None None None \n", - "1 None ... None None None None None None None None None None \n", - "2 None ... None None None None None None None None None None \n", - "3 None ... None None None None None None None None None None \n", - "4 None ... None None None None None None None None None None \n", - "\n", - "[5 rows x 48 columns]" + " 0\n", + "0 https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", + "1 https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", + "2 https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", + "3 https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", + "4 https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg" ] }, - "execution_count": 66, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } @@ -827,7 +437,7 @@ }, { "cell_type": "code", - "execution_count": 73, + "execution_count": 10, "metadata": {}, "outputs": [ { @@ -835,3956 +445,422 @@ "output_type": "stream", "text": [ "ssssssssssssssssssssssssssssssssss0ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 reddit \n", - "2 you had one job\n", - "3 onejob \n", + "0 https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", "Name: 0, dtype: object\n", "ssssssssssssssssssssssssssssssssss1ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", "Name: 1, dtype: object\n", "ssssssssssssssssssssssssssssssssss2ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", "Name: 2, dtype: object\n", "ssssssssssssssssssssssssssssssssss3ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", "Name: 3, dtype: object\n", "ssssssssssssssssssssssssssssssssss4ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", "Name: 4, dtype: object\n", "ssssssssssssssssssssssssssssssssss5ssssssssssssssssssssssssssssssssss\n", - "0 player \n", - "1 unknown \n", - "2 PUBG \n", - "3 player unknowns \n", - "4 player unknown's\n", - "5 battleground \n", - "6 battle \n", - "7 ground \n", + "0 https://i.ytimg.com/vi/Ni2SjEuz__g/hqdefault.jpg\n", "Name: 5, dtype: object\n", "ssssssssssssssssssssssssssssssssss6ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/EXxJ-We2ygw/hqdefault.jpg\n", "Name: 6, dtype: object\n", "ssssssssssssssssssssssssssssssssss7ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme review\n", - "2 elon musk \n", + "0 https://i.ytimg.com/vi/tfU8pDNYlDA/hqdefault.jpg\n", "Name: 7, dtype: object\n", "ssssssssssssssssssssssssssssssssss8ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 The Battle Wizard . ENDING EXPLAINED\n", - "2 the battle wizard \n", - "3 battle wizard \n", - "4 battle wizard 1977 \n", - "5 battle wizard movie \n", - "6 movie review \n", - "7 movie \n", - "8 film review \n", - "9 pewdiepie \n", - "10 pewds \n", - "11 pewdie \n", - "12 pdp \n", - "13 wizard \n", + "0 https://i.ytimg.com/vi/nW5ltiwV-6Y/hqdefault.jpg\n", "Name: 8, dtype: object\n", "ssssssssssssssssssssssssssssssssss9ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 dr phil\n", - "2 react \n", + "0 https://i.ytimg.com/vi/Z1vISDOhC0k/hqdefault.jpg\n", "Name: 9, dtype: object\n", "ssssssssssssssssssssssssssssssssss10ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 Thats right... I'm a GAMER\n", - "2 gamer \n", - "3 gaming \n", - "4 youtube gaming \n", - "5 memes \n", - "6 pewdiepie \n", - "7 pewds \n", - "8 pewdie \n", - "9 pdp \n", + "0 https://i.ytimg.com/vi/lx7KFd6BPcg/hqdefault.jpg\n", "Name: 10, dtype: object\n", "ssssssssssssssssssssssssssssssssss11ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/3g6KG_8zq0E/hqdefault.jpg\n", "Name: 11, dtype: object\n", "ssssssssssssssssssssssssssssssssss12ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tiktok \n", - "2 tik tok \n", - "3 tik tok funny \n", - "4 tik tok compilation\n", - "5 funny tik toks \n", - "6 funny tiktok \n", - "7 funny tiktok memes \n", - "8 tiktok songs \n", - "9 tiktok cringe \n", - "10 cringe \n", - "11 cringe compilation \n", - "12 tiktok memes \n", - "13 tik tok memes \n", - "14 pewdiepie tiktok \n", - "15 pewdiepie \n", - "16 pewds \n", - "17 pewdie \n", - "18 pdp \n", - "19 #ad \n", - "20 4K video \n", + "0 https://i.ytimg.com/vi/-NVFQ_q3eRM/hqdefault.jpg\n", "Name: 12, dtype: object\n", "ssssssssssssssssssssssssssssssssss13ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/CA6lyOmfRbM/hqdefault.jpg\n", "Name: 13, dtype: object\n", "ssssssssssssssssssssssssssssssssss14ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 MY NEW SHOW / You Laugh You Lose\n", - "2 you laugh you lose \n", - "3 ylyl \n", - "4 you laugh you lose challenge \n", - "5 try not to laugh \n", - "6 try not to laugh challenge \n", - "7 pewdiepie \n", - "8 pewdiepie ylyl \n", - "9 ylyl pewds \n", - "10 pewdie \n", - "11 pdp \n", - "12 pewds \n", + "0 https://i.ytimg.com/vi/PIAzK1rvqIY/hqdefault.jpg\n", "Name: 14, dtype: object\n", "ssssssssssssssssssssssssssssssssss15ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 pew \n", - "2 news \n", + "0 https://i.ytimg.com/vi/nrF_Rgh88no/hqdefault.jpg\n", "Name: 15, dtype: object\n", "ssssssssssssssssssssssssssssssssss16ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 Sasuke Memes are NOT OK\n", - "2 sasuke \n", - "3 sasuke naruto \n", - "4 naruto \n", - "5 pewdiepie \n", - "6 meme review \n", - "7 memes \n", - "8 meme \n", - "9 pewds \n", - "10 pewdie \n", - "11 pdp \n", - "12 wave check \n", - "13 waves \n", - "14 wave hair \n", - "15 waves hair \n", + "0 https://i.ytimg.com/vi/4ixLp8aFomw/hqdefault.jpg\n", "Name: 16, dtype: object\n", "ssssssssssssssssssssssssssssssssss17ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/UvCO5gKQqtE/hqdefault.jpg\n", "Name: 17, dtype: object\n", "ssssssssssssssssssssssssssssssssss18ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 dr phil \n", - "2 dr phil spoiled teen \n", - "3 dr phil pewdiepie \n", - "4 Dr Phil VS Spoiled teen *destroyed by facts and logic*\n", - "5 dr phil spoiled \n", - "6 dr phil full episodes \n", - "7 pewds \n", - "8 pewdie \n", - "9 pewdiepie \n", - "10 pdp \n", - "11 dr phil 2019 \n", + "0 https://i.ytimg.com/vi/j80mqdfy8Fw/hqdefault.jpg\n", "Name: 18, dtype: object\n", "ssssssssssssssssssssssssssssssssss19ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/bKBpDywKje8/hqdefault.jpg\n", "Name: 19, dtype: object\n", "ssssssssssssssssssssssssssssssssss20ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 20, dtype: object\n", + "Series([], Name: 20, dtype: object)\n", "ssssssssssssssssssssssssssssssssss21ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/t_DI7NbjcFs/hqdefault.jpg\n", "Name: 21, dtype: object\n", "ssssssssssssssssssssssssssssssssss22ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 Im most handsome 2018 \n", - "2 gamer girls \n", - "3 reaction \n", - "4 react \n", - "5 gamer girls react \n", - "6 most handsome man \n", - "7 pewdiepie \n", - "8 pewds \n", - "9 pewdie \n", - "10 pdp \n", - "11 lwiay \n", - "12 pewdiepie lwiay \n", - "13 pokimane \n", - "14 lords mobile \n", - "15 ads \n", - "16 ad \n", - "17 lords mobile ad \n", - "18 mobile ads \n", - "19 handsome man \n", - "20 most handsome man winner\n", - "21 handsome \n", - "22 gamer girls reaction \n", - "23 gamer \n", - "24 girls \n", - "25 gaming \n", - "26 entertainment \n", + "0 https://i.ytimg.com/vi/Ol3Dwucax9U/hqdefault.jpg\n", "Name: 22, dtype: object\n", "ssssssssssssssssssssssssssssssssss23ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 comedy \n", - "3 you laugh you lose\n", - "4 compilation \n", - "5 try not to laugh \n", - "6 challenge \n", + "0 https://i.ytimg.com/vi/NbvHU_KoD74/hqdefault.jpg\n", "Name: 23, dtype: object\n", "ssssssssssssssssssssssssssssssssss24ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/zVQJQxpedm8/hqdefault.jpg\n", "Name: 24, dtype: object\n", "ssssssssssssssssssssssssssssssssss25ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 jesus \n", - "2 socalchrist \n", - "3 fake gamers \n", - "4 fake gamer girl \n", - "5 gamer girls \n", - "6 twitch girls \n", - "7 ricegum jakepaul \n", - "8 jake paul \n", - "9 jake paul amazon \n", - "10 amazon \n", - "11 amazon gift card \n", - "12 fake amazon giftcard\n", - "13 fake amazon \n", - "14 pewdiepie \n", - "15 pewds \n", - "16 pewdie \n", - "17 pdp \n", - "18 pew news \n", - "19 #ad \n", - "20 news \n", - "21 current affairs \n", - "22 ricegum jake paul \n", + "0 https://i.ytimg.com/vi/lCcE-0bykRU/hqdefault.jpg\n", "Name: 25, dtype: object\n", "ssssssssssssssssssssssssssssssssss26ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 vr chat\n", - "2 game \n", - "3 gaming \n", + "0 https://i.ytimg.com/vi/seLcRCulwl4/hqdefault.jpg\n", "Name: 26, dtype: object\n", "ssssssssssssssssssssssssssssssssss27ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 genius \n", - "2 review \n", - "3 lele pons \n", - "4 gabbi hannsomething \n", - "5 jacob whatever his name is\n", - "6 other people \n", + "0 https://i.ytimg.com/vi/ZfemCpfJNfU/hqdefault.jpg\n", "Name: 27, dtype: object\n", "ssssssssssssssssssssssssssssssssss28ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 reddit \n", - "2 reddit review \n", - "3 pewdiepie \n", - "4 pewds \n", - "5 pewdie \n", - "6 pdp \n", - "7 tseries \n", - "8 t series \n", - "9 pewdiepie vs tseries \n", - "10 pewdiepie vs t series \n", - "11 oopsie \n", - "12 /r/ \n", - "13 /r \n", - "14 reddit try not to laugh \n", - "15 reddit cringe \n", - "16 reddit stories \n", - "17 reddit cringe compilation\n", - "18 vox \n", - "19 vox media \n", - "20 pewdiepie vox media \n", - "21 pewdiepie vox \n", - "22 Unintentional Opsies \n", - "23 opsies \n", + "0 https://i.ytimg.com/vi/TgO-AkopLo4/hqdefault.jpg\n", "Name: 28, dtype: object\n", "ssssssssssssssssssssssssssssssssss29ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pewdiepie vs thanos\n", - "2 Pewdiepie vs Thanos\n", - "3 WHO would WIN? \n", - "4 pewdiepie \n", - "5 pewds \n", - "6 pewdie \n", - "7 pdp \n", - "8 thanos \n", - "9 thanos meme \n", - "10 thanos memes \n", - "11 tseries \n", + "0 https://i.ytimg.com/vi/HMB4zrP_-HY/hqdefault.jpg\n", "Name: 29, dtype: object\n", "ssssssssssssssssssssssssssssssssss30ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", - "3 awards\n", + "0 https://i.ytimg.com/vi/JBm8iptLnuA/hqdefault.jpg\n", "Name: 30, dtype: object\n", "ssssssssssssssssssssssssssssssssss31ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 We broke a world record!\n", - "2 world \n", - "3 record \n", - "4 world record \n", - "5 pewdiepie \n", - "6 pewds \n", - "7 pewdie \n", - "8 pdp \n", - "9 world record pewdipie \n", - "10 tseries \n", - "11 t series \n", - "12 youtube rewind \n", - "13 youtube rewind 2018 \n", + "0 https://i.ytimg.com/vi/Ynp0xyBgwt0/hqdefault.jpg\n", "Name: 31, dtype: object\n", "ssssssssssssssssssssssssssssssssss32ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/ftGiBv3LL_A/hqdefault.jpg\n", "Name: 32, dtype: object\n", "ssssssssssssssssssssssssssssssssss33ssssssssssssssssssssssssssssssssss\n", - "0 rewind 2018 \n", - "1 youtube rewind 2018\n", + "0 https://i.ytimg.com/vi/5pbRivDYzko/hqdefault.jpg\n", "Name: 33, dtype: object\n", "ssssssssssssssssssssssssssssssssss34ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 dr phil \n", - "2 Dr Phil ANNIHILATES spoiled Teen!!\n", - "3 dr phil spoiled daughter \n", - "4 dr phil full episodes \n", - "5 dr phil im white \n", - "6 dr phil annihilates \n", - "7 spoiled teen \n", - "8 dr phil spoiled \n", - "9 dr phil pewdiepie \n", - "10 dr phil 2018 \n", - "11 dr phil funny \n", - "12 dr phil meme review \n", - "13 dr phil treasure \n", - "14 dr phil video \n", - "15 dr phil tv show \n", - "16 pewdiepie \n", - "17 pewds \n", - "18 pewdie \n", - "19 pdp \n", + "0 https://i.ytimg.com/vi/3jlXIX5Ctyo/hqdefault.jpg\n", "Name: 34, dtype: object\n", "ssssssssssssssssssssssssssssssssss35ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 25 Dec 2018\n", + "0 https://i.ytimg.com/vi/mG9OnH9R5yM/hqdefault.jpg\n", "Name: 35, dtype: object\n", "ssssssssssssssssssssssssssssssssss36ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 lwiay\n", + "0 https://i.ytimg.com/vi/SnMXqyLqZwM/hqdefault.jpg\n", "Name: 36, dtype: object\n", "ssssssssssssssssssssssssssssssssss37ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 wsj \n", - "2 hack \n", + "0 https://i.ytimg.com/vi/30ndwJm1I5c/hqdefault.jpg\n", "Name: 37, dtype: object\n", "ssssssssssssssssssssssssssssssssss38ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 You SLAV You Lose \n", - "3 you laugh \n", - "4 you lose \n", - "5 try not to laugh \n", - "6 you laugh you lose \n", - "7 you laugh you lose pewdiepie\n", - "8 try not to laugh challenge \n", - "9 pewdiepie \n", - "10 pewds \n", - "11 pewdie \n", - "12 pdp \n", + "0 https://i.ytimg.com/vi/IoeYrz-fP2o/hqdefault.jpg\n", "Name: 38, dtype: object\n", "ssssssssssssssssssssssssssssssssss39ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 youtube rewind\n", - "2 rewind \n", - "3 2018 \n", - "4 roast \n", - "5 lwiay \n", - "6 ylyl \n", - "7 meme \n", - "8 review \n", - "Name: 39, dtype: object\n", + "Series([], Name: 39, dtype: object)\n", "ssssssssssssssssssssssssssssssssss40ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pewdiepie \n", - "2 pewds \n", - "3 pewdie \n", - "4 pdp \n", - "5 PewDiePie's biggest OOPSIE.\n", - "6 pew news \n", - "7 game awards 2018 \n", - "8 game awards 2018 cringe \n", + "0 https://i.ytimg.com/vi/hJMH_1o8eU0/hqdefault.jpg\n", "Name: 40, dtype: object\n", "ssssssssssssssssssssssssssssssssss41ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/OXA_ZD1gR6A/hqdefault.jpg\n", "Name: 41, dtype: object\n", "ssssssssssssssssssssssssssssssssss42ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tiktok \n", - "2 tiktok memes \n", - "3 tiktok songs \n", - "4 tiktok cringe \n", - "5 tiktok tutorial \n", - "6 tiktok hit or miss \n", - "7 tiktok music \n", - "8 tiktok fortnite \n", - "9 tiktok cringe compilation \n", - "10 tiktok epic \n", - "11 best tiktok \n", - "12 best tiktok videos \n", - "13 tiktok funny \n", - "14 tiktok funny videos \n", - "15 tiktok haha \n", - "16 tiktok epic memes \n", - "17 tiktok compilation \n", - "18 tiktok compilation 2018 \n", - "19 tiktok 2018 \n", - "20 Tik Tok Very Funny Haha Epic Compilation Montage BEST TIK TOK 2018 LOL\n", - "21 tiktok montage \n", - "22 pewdiepie tiktok \n", - "23 pewdiepie vs t series \n", - "24 pewdiepie \n", - "25 pewdie \n", - "26 pdp \n", + "0 https://i.ytimg.com/vi/duOHHDqI40c/hqdefault.jpg\n", "Name: 42, dtype: object\n", "ssssssssssssssssssssssssssssssssss43ssssssssssssssssssssssssssssssssss\n", - "0 player \n", - "1 unknown \n", - "2 PUBG \n", - "3 player unknowns \n", - "4 player unknown's\n", - "5 battleground \n", - "6 battle \n", - "7 ground \n", + "0 https://i.ytimg.com/vi/vbHFIALhSWE/hqdefault.jpg\n", "Name: 43, dtype: object\n", "ssssssssssssssssssssssssssssssssss44ssssssssssssssssssssssssssssssssss\n", - "0 TRY TO LAUGH NOT CHALLENGE \n", - "1 TRY NOT TO LAUGH \n", - "2 try not to laugh challenge \n", - "3 try not to laugh challenge impossible\n", - "4 try not to laugh challenge clean \n", - "5 try not to laugh \n", - "6 try not to laugh tiktok \n", - "7 tltl \n", - "8 pewdiepie \n", - "9 pewds \n", - "10 pewdie \n", - "11 ylyl \n", - "12 you laugh you lose \n", - "13 episode 1 season 1 \n", - "14 ep 1 \n", - "15 pdp \n", - "16 pewdiepie ylyl \n", - "17 video \n", - "18 youtube video \n", - "19 youtube channel \n", - "20 t series \n", - "21 tseries vs pewdiepie \n", - "22 tiktok \n", - "23 fortnite \n", - "24 fortnite funny moments \n", + "0 https://i.ytimg.com/vi/ZWytZoEVpGU/hqdefault.jpg\n", "Name: 44, dtype: object\n", "ssssssssssssssssssssssssssssssssss45ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 youtube\n", - "2 rewind \n", - "3 meme \n", - "4 yea \n", - "5 review \n", + "0 https://i.ytimg.com/vi/uoAV7651Op0/hqdefault.jpg\n", "Name: 45, dtype: object\n", "ssssssssssssssssssssssssssssssssss46ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme review\n", + "0 https://i.ytimg.com/vi/702lkQbZx50/hqdefault.jpg\n", "Name: 46, dtype: object\n", "ssssssssssssssssssssssssssssssssss47ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pewdiepie\n", - "2 fortnite \n", - "3 lwiay \n", - "4 ylyl \n", - "5 meme \n", - "6 review \n", - "7 season 7 \n", - "8 new \n", - "9 skins \n", + "0 https://i.ytimg.com/vi/7sgDvC4k6Xg/hqdefault.jpg\n", "Name: 47, dtype: object\n", "ssssssssssssssssssssssssssssssssss48ssssssssssssssssssssssssssssssssss\n", - "0 player \n", - "1 unknown \n", - "2 PUBG \n", - "3 player unknowns \n", - "4 player unknown's\n", - "5 battleground \n", - "6 battle \n", - "7 ground \n", + "0 https://i.ytimg.com/vi/cCoGsFVPVh0/hqdefault.jpg\n", "Name: 48, dtype: object\n", "ssssssssssssssssssssssssssssssssss49ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 2x \n", - "2 slow mo\n", - "3 50% \n", - "4 speed \n", - "Name: 49, dtype: object\n", + "Series([], Name: 49, dtype: object)\n", "ssssssssssssssssssssssssssssssssss50ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", + "0 https://i.ytimg.com/vi/Odog86JslbA/hqdefault.jpg\n", "Name: 50, dtype: object\n", "ssssssssssssssssssssssssssssssssss51ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tekashi69 \n", - "2 tekashi 6ix9ine \n", - "3 tekashi69 songs \n", - "4 6ix9ine \n", - "5 6ix9ine 2018 \n", - "6 Ninja \n", - "7 ninja fortnite \n", - "8 ninja fortnite gameplay \n", - "9 fortnite \n", - "10 fortnite funny moments \n", - "11 icy five ninja \n", - "12 alinity \n", - "13 alinity pewdiepie \n", - "14 alinity pewdiepie copystrike \n", - "15 pew news \n", - "16 pewdiepie \n", - "17 pewds \n", - "18 pdp \n", - "19 pewdie \n", - "20 youtube video \n", - "21 youtube channel \n", - "22 youtube \n", - "23 Tekashi69 BAN \n", - "24 Ninja caught selling underwear\n", - "25 Alinity facing 32 year prison.\n", - "26 smosh \n", - "27 news \n", - "28 news live \n", - "29 world news \n", + "0 https://i.ytimg.com/vi/SZO8jF9Z6vw/hqdefault.jpg\n", "Name: 51, dtype: object\n", "ssssssssssssssssssssssssssssssssss52ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 beat \n", - "2 saber \n", - "3 vr \n", - "4 gameplay\n", + "0 https://i.ytimg.com/vi/dAKyi8aFq3Y/hqdefault.jpg\n", "Name: 52, dtype: object\n", "ssssssssssssssssssssssssssssssssss53ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 The last hope for my channel...\n", - "2 pewdiepie \n", - "3 pewds \n", - "4 pdp \n", - "5 pewdie \n", - "6 last hope \n", - "7 youtube \n", - "8 youtube channel \n", + "0 https://i.ytimg.com/vi/GskbfPKP35E/hqdefault.jpg\n", "Name: 53, dtype: object\n", "ssssssssssssssssssssssssssssssssss54ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", + "0 https://i.ytimg.com/vi/sVxLiftJGbU/hqdefault.jpg\n", "Name: 54, dtype: object\n", "ssssssssssssssssssssssssssssssssss55ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 oblivion \n", - "2 skyrimn \n", - "3 skyrim \n", - "4 gameplay \n", - "5 funny \n", - "6 moments \n", - "7 compilation\n", - "8 meme \n", - "9 memes \n", + "0 https://i.ytimg.com/vi/0k0fvqikaoE/hqdefault.jpg\n", "Name: 55, dtype: object\n", "ssssssssssssssssssssssssssssssssss56ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 This video is blocked in your country.\n", - "2 video \n", - "3 youtube video \n", - "4 pewdiepie \n", - "5 youtube pewdiepie \n", - "6 this video is blocked \n", - "7 blocked \n", - "8 pewds \n", - "9 pewdie \n", - "10 pdp \n", - "11 article 13 \n", - "12 article 11 \n", - "13 youtube support \n", - "14 india \n", - "15 iisuperwomanii \n", - "16 taking a break \n", + "0 https://i.ytimg.com/vi/x8OCVDCDrDA/hqdefault.jpg\n", "Name: 56, dtype: object\n", "ssssssssssssssssssssssssssssssssss57ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 We made history!!! *again*\n", - "2 We made history! \n", - "3 we \n", - "4 made \n", - "5 history \n", - "6 pewdiepie \n", - "7 pewds \n", - "8 pdp \n", - "9 pewdie \n", - "10 lwaiy \n", - "11 tseries \n", - "12 t-series \n", - "13 lwiay pewdiepie \n", - "14 marzia \n", - "15 markiplier \n", - "16 try not to laugh \n", - "17 we made history again \n", + "0 https://i.ytimg.com/vi/yl3kavXxvHo/hqdefault.jpg\n", "Name: 57, dtype: object\n", "ssssssssssssssssssssssssssssssssss58ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh you lose\n", - "2 challenge \n", + "0 https://i.ytimg.com/vi/Ihbu0aZwkE8/hqdefault.jpg\n", "Name: 58, dtype: object\n", "ssssssssssssssssssssssssssssssssss59ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 A message to Obama\n", - "2 OBAMA \n", - "3 memes \n", - "4 meme \n", - "5 dank memes \n", - "6 memes 2018 \n", - "7 pewdiepie \n", - "8 pewds \n", - "9 pdp \n", - "10 pewdie \n", + "0 https://i.ytimg.com/vi/13viBxojGvA/hqdefault.jpg\n", "Name: 59, dtype: object\n", "ssssssssssssssssssssssssssssssssss60ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 TIKTOK \n", - "2 tik tok \n", - "3 tik tok cringe \n", - "4 tiktok pewdiepie \n", - "5 pewdiepie \n", - "6 pewds \n", - "7 pewdie \n", - "8 pdp \n", - "9 tiktok has gone too far \n", - "10 OK \n", - "11 TIK TOK HAS GONE TOO FAR NOW...\n", - "12 tiktok compilation \n", - "13 tiktok memes \n", - "14 meme \n", - "15 memes \n", - "16 pewdiepie memes \n", - "17 pewdiepie meme \n", - "18 pewdiepie tik tok \n", - "19 tiktok ad \n", - "20 tiktok funny \n", - "21 cringe challenge \n", - "22 cringe \n", - "23 cringe tiktok \n", - "24 funny tiktok videos \n", - "25 musically \n", - "26 musical.ly \n", - "27 tiktok trolls \n", + "0 https://i.ytimg.com/vi/DmSephyJNtQ/hqdefault.jpg\n", "Name: 60, dtype: object\n", "ssssssssssssssssssssssssssssssssss61ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 We made history!\n", - "2 we \n", - "3 made \n", - "4 history \n", - "5 pewdiepie \n", - "6 pewds \n", - "7 pdp \n", - "8 pewdie \n", - "9 lwaiy \n", - "10 tseries \n", - "11 t-series \n", - "12 lwiay pewdiepie \n", - "13 marzia \n", - "14 markiplier \n", - "15 try not to laugh\n", + "0 https://i.ytimg.com/vi/30pPGx0J6FU/hqdefault.jpg\n", "Name: 61, dtype: object\n", "ssssssssssssssssssssssssssssssssss62ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 you laugh you lose\n", - "3 challenge \n", - "Name: 62, dtype: object\n", + "Series([], Name: 62, dtype: object)\n", "ssssssssssssssssssssssssssssssssss63ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 deltarune \n", - "2 delta \n", - "3 rune \n", - "4 undertale \n", - "5 undertale 2 \n", - "6 squel \n", - "7 sequel \n", - "8 prequel \n", - "9 commentary \n", - "10 gameplay \n", - "11 walkthrough \n", - "12 pacifist \n", - "13 delta rune part 1 \n", - "14 chapter 1 \n", - "15 deltarune part 1 \n", - "16 soundtrack \n", - "17 undertale delta \n", - "18 undertale delta rune\n", - "19 delta rune undertale\n", - "20 part 1 \n", - "21 chapter 1 part 1 \n", + "0 https://i.ytimg.com/vi/eIRhXharV7k/hqdefault.jpg\n", "Name: 63, dtype: object\n", "ssssssssssssssssssssssssssssssssss64ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review \n", - "3 ben \n", - "4 shapiro \n", - "5 bonus meme\n", - "6 gnome \n", - "7 obama \n", - "8 elon musk \n", - "9 pikachu \n", - "10 tik tok \n", - "11 tracer \n", + "0 https://i.ytimg.com/vi/2waSmpD1zQg/hqdefault.jpg\n", "Name: 64, dtype: object\n", "ssssssssssssssssssssssssssssssssss65ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 I'm white \n", - "2 im white \n", - "3 im white dr phil \n", - "4 dr phil \n", - "5 im \n", - "6 white \n", - "7 dr phil black white girl \n", - "8 dr phil black girl acts white \n", - "9 dr phil black girl \n", - "10 dr phil full episodes \n", - "11 dr \n", - "12 phil \n", - "13 mom says her daughter \n", - "14 dr phil pewdiepie \n", - "15 dr phil #3 \n", - "16 dr phil 3 \n", - "17 react \n", - "18 pewds \n", - "19 pewdie \n", - "20 pewdiepie \n", - "21 pdp \n", - "22 dr phil destroys \n", - "23 dr phil memes \n", - "24 dr phil meme \n", - "25 dr phil october 2018 \n", - "26 meme \n", - "27 memes \n", - "28 im black \n", - "29 i'm black \n", - "30 im white dr phil full episode \n", - "31 im white dr phil full episodes\n", - "Name: 65, dtype: object\n", + "Series([], Name: 65, dtype: object)\n", "ssssssssssssssssssssssssssssssssss66ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/P4LonC3puS4/hqdefault.jpg\n", "Name: 66, dtype: object\n", "ssssssssssssssssssssssssssssssssss67ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 i need your help...\n", - "2 lwiay \n", - "3 help \n", - "4 pewdiepie \n", - "5 pewds \n", - "6 pdp \n", - "7 pewdie \n", + "0 https://i.ytimg.com/vi/oJdubyyJNIQ/hqdefault.jpg\n", "Name: 67, dtype: object\n", "ssssssssssssssssssssssssssssssssss68ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/UcvCdFfI3bs/hqdefault.jpg\n", "Name: 68, dtype: object\n", "ssssssssssssssssssssssssssssssssss69ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 apology video\n", - "2 my response \n", - "3 pewdiepie \n", - "4 logan paul \n", - "5 laura lee \n", - "6 tmartin \n", + "0 https://i.ytimg.com/vi/_fNZLrz97kg/hqdefault.jpg\n", "Name: 69, dtype: object\n", "ssssssssssssssssssssssssssssssssss70ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 fashion\n", - "2 meme \n", - "3 review \n", + "0 https://i.ytimg.com/vi/1tCbvYv_ibw/hqdefault.jpg\n", "Name: 70, dtype: object\n", "ssssssssssssssssssssssssssssssssss71ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 birds \n", - "2 birds aren't real \n", - "3 birds aren't real youtube \n", - "4 npc meme \n", - "5 npc memes \n", - "6 pewdiepie \n", - "7 pewds \n", - "8 pewdie \n", - "9 memes \n", - "10 meme \n", - "11 meme review \n", - "12 BIRDS. AREN'T. REAL. \n", - "13 review \n", - "14 meme compilation \n", - "15 meme compilation 2018 \n", - "16 everyone we have an announcement to make\n", + "0 https://i.ytimg.com/vi/EZ-im7m8630/hqdefault.jpg\n", "Name: 71, dtype: object\n", "ssssssssssssssssssssssssssssssssss72ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 npc meme \n", - "2 meme \n", - "3 funny \n", - "4 compilation\n", - "5 shane \n", - "6 logan \n", - "7 logan paul \n", - "8 show \n", - "9 youtube \n", - "10 red \n", - "11 youtube red\n", + "0 https://i.ytimg.com/vi/03ahRfkfwME/hqdefault.jpg\n", "Name: 72, dtype: object\n", "ssssssssssssssssssssssssssssssssss73ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 LWIAY\n", + "0 https://i.ytimg.com/vi/h27uLjDOK-M/hqdefault.jpg\n", "Name: 73, dtype: object\n", "ssssssssssssssssssssssssssssssssss74ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme review\n", - "2 spooktober \n", - "3 halloween \n", - "4 bone \n", - "5 skeleton \n", - "6 doot doot \n", - "7 sans \n", + "0 https://i.ytimg.com/vi/8OoLg39nNlo/hqdefault.jpg\n", "Name: 74, dtype: object\n", "ssssssssssssssssssssssssssssssssss75ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 you laugh you lose\n", - "3 challenge \n", - "4 moth \n", - "5 edition \n", - "6 meme \n", + "0 https://i.ytimg.com/vi/DJd0JYaVkqA/hqdefault.jpg\n", "Name: 75, dtype: object\n", "ssssssssssssssssssssssssssssssssss76ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 lwiay \n", - "2 reddit\n", + "0 https://i.ytimg.com/vi/hUXGQwTSfMs/hqdefault.jpg\n", "Name: 76, dtype: object\n", "ssssssssssssssssssssssssssssssssss77ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tseries \n", - "2 t series \n", - "3 diss \n", - "4 track \n", - "5 pewdiepie \n", - "6 song \n", - "7 rap \n", - "8 mixtape \n", - "9 disstrack \n", - "10 diss track \n", - "11 bitch lasagna\n", + "0 https://i.ytimg.com/vi/-zcJ4uB7XUo/hqdefault.jpg\n", "Name: 77, dtype: object\n", "ssssssssssssssssssssssssssssssssss78ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 𝓜𝓸𝓽𝓱 𝓜𝓮𝓶𝓮𝓼 \n", - "2 moth memes \n", - "3 moth meme \n", - "4 moth meme compilation \n", - "5 moth lamp \n", - "6 moth lamp meme compilation\n", - "7 pewdiepie meme review \n", - "8 pewdiepie \n", - "9 pewds \n", - "10 pdp \n", - "11 pewdie \n", - "12 meme review \n", - "13 memes \n", - "14 meme \n", - "15 moth \n", - "16 lamp \n", + "0 https://i.ytimg.com/vi/tQ_9a6UhUQs/hqdefault.jpg\n", "Name: 78, dtype: object\n", "ssssssssssssssssssssssssssssssssss79ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 lwiay \n", - "2 pewdiepie \n", - "3 pewds \n", - "4 pewdie \n", - "5 pewdiepie vs t series \n", - "6 ANNOUNCING ME NEW WEBSITE\n", - "7 website \n", - "8 new website \n", - "9 t series \n", + "0 https://i.ytimg.com/vi/ztwsGeT5lR0/hqdefault.jpg\n", "Name: 79, dtype: object\n", "ssssssssssssssssssssssssssssssssss80ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh you lose\n", - "2 ylyl \n", - "3 try not to \n", - "4 laugh \n", - "5 challenge \n", + "0 https://i.ytimg.com/vi/nOlH-P8-5PI/hqdefault.jpg\n", "Name: 80, dtype: object\n", "ssssssssssssssssssssssssssssssssss81ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 bowsette \n", - "2 meme review\n", + "0 https://i.ytimg.com/vi/BdppFIT_lIs/hqdefault.jpg\n", "Name: 81, dtype: object\n", "ssssssssssssssssssssssssssssssssss82ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pew news \n", - "2 serena williams \n", - "3 t series \n", - "4 youtube \n", - "5 alternative influence\n", + "0 https://i.ytimg.com/vi/7nYkJctgSSA/hqdefault.jpg\n", "Name: 82, dtype: object\n", "ssssssssssssssssssssssssssssssssss83ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 lego \n", - "2 star \n", - "3 wars \n", + "0 https://i.ytimg.com/vi/hZHfdOKFlAw/hqdefault.jpg\n", "Name: 83, dtype: object\n", "ssssssssssssssssssssssssssssssssss84ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 you laugh you lose \n", - "3 YOU LAUGH YOU LOSE \n", - "4 TRY NOT TO LAUGH SUPER HARD EDITION\n", - "5 try not to laugh \n", - "6 try not to laugh challenge \n", - "7 pewdiepie \n", - "8 pewds \n", - "9 pewdie \n", - "10 pdp \n", + "0 https://i.ytimg.com/vi/gYTJrTXaGwA/hqdefault.jpg\n", "Name: 84, dtype: object\n", "ssssssssssssssssssssssssssssssssss85ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", + "0 https://i.ytimg.com/vi/cFTB5EJUxzw/hqdefault.jpg\n", "Name: 85, dtype: object\n", "ssssssssssssssssssssssssssssssssss86ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 gucci \n", - "2 fashion\n", - "3 meme \n", + "0 https://i.ytimg.com/vi/T8EfomTlcfA/hqdefault.jpg\n", "Name: 86, dtype: object\n", "ssssssssssssssssssssssssssssssssss87ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", - "3 THANOS\n", - "4 CAR \n", + "0 https://i.ytimg.com/vi/ww8dRu4_1EY/hqdefault.jpg\n", "Name: 87, dtype: object\n", "ssssssssssssssssssssssssssssssssss88ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 Try Not To Laugh At Other Youtubers Try Not To Laugh Challenge\n", - "2 try not to laugh \n", - "3 try not to laugh challenge \n", - "4 try not to laugh challenge clean \n", - "5 try not to laugh challenge impossible \n", - "6 try not to laugh markiplier \n", - "7 try not to laugh jacksepticeye \n", - "8 try not to laugh pewdiepie edition \n", - "9 try not to laugh memes \n", - "10 memes \n", - "11 meme \n", - "12 funny memes \n", - "13 funny memes try not to laugh \n", - "14 ylyl \n", - "15 you laugh you lose \n", - "16 pewdiepie ylyl \n", - "17 pewdiepie \n", - "18 pewds \n", - "19 pdp \n", - "20 pewdie \n", - "21 tntl \n", - "22 laugh \n", - "23 try not to \n", - "24 markiplier \n", - "25 jacksepticeye \n", + "0 https://i.ytimg.com/vi/Bb896qn7S54/hqdefault.jpg\n", "Name: 88, dtype: object\n", "ssssssssssssssssssssssssssssssssss89ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tumblr \n", - "2 tumblr in action\n", - "3 reddit \n", - "4 reddit review \n", + "0 https://i.ytimg.com/vi/WgnmQk_2yF4/hqdefault.jpg\n", "Name: 89, dtype: object\n", "ssssssssssssssssssssssssssssssssss90ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/mtp0Mu-yj_o/hqdefault.jpg\n", "Name: 90, dtype: object\n", "ssssssssssssssssssssssssssssssssss91ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 YES PAPA \n", - "2 YES PAPA MEME \n", - "3 johny johny yes papa \n", - "4 johnny johnny \n", - "5 johny meme \n", - "6 baby johnny eating sugar\n", - "7 no papa no papa \n", - "8 no papa sugar \n", - "9 meme review \n", - "10 pewdiepie meme review \n", - "11 pewdiepie \n", - "12 pewds \n", - "13 pdp \n", - "14 pewdie \n", - "15 YES PAPA MEME EXPOSED \n", + "0 https://i.ytimg.com/vi/mkKDI6y2kyE/hqdefault.jpg\n", "Name: 91, dtype: object\n", "ssssssssssssssssssssssssssssssssss92ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 lwiay\n", - "Name: 92, dtype: object\n", + "Series([], Name: 92, dtype: object)\n", "ssssssssssssssssssssssssssssssssss93ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 try not to laugh\n", + "0 https://i.ytimg.com/vi/JToPoYip-C4/hqdefault.jpg\n", "Name: 93, dtype: object\n", "ssssssssssssssssssssssssssssssssss94ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 episode 1 \n", - "2 gameplay \n", - "3 wlaking \n", - "4 walking dead\n", - "5 final \n", - "6 season \n", - "7 last \n", + "0 https://i.ytimg.com/vi/AgRHEGB8Urs/hqdefault.jpg\n", "Name: 94, dtype: object\n", "ssssssssssssssssssssssssssssssssss95ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pew news\n", - "2 ksi \n", - "3 ninja \n", - "4 female \n", - "5 streamer\n", + "0 https://i.ytimg.com/vi/SRCToEkq7to/hqdefault.jpg\n", "Name: 95, dtype: object\n", "ssssssssssssssssssssssssssssssssss96ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 ylyl \n", - "2 laugh\n", - "3 lose \n", + "0 https://i.ytimg.com/vi/A6EIl677ntQ/hqdefault.jpg\n", "Name: 96, dtype: object\n", "ssssssssssssssssssssssssssssssssss97ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pubg \n", - "2 player unknown\n", - "3 squads \n", + "0 https://i.ytimg.com/vi/4HD5rCNYxng/hqdefault.jpg\n", "Name: 97, dtype: object\n", "ssssssssssssssssssssssssssssssssss98ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 98, dtype: object\n", + "Series([], Name: 98, dtype: object)\n", "ssssssssssssssssssssssssssssssssss99ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 What drinking her juice ACTUALLY gives you \n", - "2 jilly juice \n", - "3 dr phil \n", - "4 dr phil 2018 \n", - "5 dr phil jilly juice \n", - "6 dr phil jilly juice reaction \n", - "7 dr phil pewdiepie \n", - "8 pewdiepie dr phil \n", - "9 pewdiepie dr phil eminem \n", - "10 15 YEAR OLD CRIES OVER NOT GETTING $231 \n", - "11 dr phil 1 \n", - "12 dr phil 15 year old \n", - "13 LOGAN PAULS SISTER WANTS TO DO YOUTUBE - Dr Phil #2\n", - "14 YOUTUBER GOES ON DR PHIL. \n", - "15 dr phil playlist \n", - "16 pewdiepie \n", - "17 pewds \n", - "18 pdp \n", - "19 pewdie \n", - "20 juice \n", - "21 comedy \n", - "22 reaction \n", - "23 entertainment \n", - "24 jilly \n", + "0 https://i.ytimg.com/vi/hnc3bGtYQsQ/hqdefault.jpg\n", "Name: 99, dtype: object\n", "ssssssssssssssssssssssssssssssssss100ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh \n", - "2 you lose \n", - "3 try not to laugh challenge\n", - "4 challenge \n", - "5 try not to \n", + "0 https://i.ytimg.com/vi/cva2sxX5PgM/hqdefault.jpg\n", "Name: 100, dtype: object\n", "ssssssssssssssssssssssssssssssssss101ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 THAT TOTALLY HAPPENED.\n", - "2 that \n", - "3 totally \n", - "4 happened \n", - "5 /r thathappened \n", - "6 thathappened \n", - "7 redit \n", - "8 reddit \n", - "9 thathappened redit \n", - "10 pewdiepie \n", - "11 reddit review \n", - "12 reddit reaction \n", - "13 reddit cringe \n", - "14 cringe \n", - "15 reddit pewdiepie \n", - "16 pewds \n", - "17 pewdie \n", - "18 pdp \n", - "19 /r \n", - "20 meme \n", - "21 memes \n", - "22 meme review \n", + "0 https://i.ytimg.com/vi/cDOlBRzHRI0/hqdefault.jpg\n", "Name: 101, dtype: object\n", "ssssssssssssssssssssssssssssssssss102ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/Mxdze0Wo91U/hqdefault.jpg\n", "Name: 102, dtype: object\n", "ssssssssssssssssssssssssssssssssss103ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/YH_rnTjnWfg/hqdefault.jpg\n", "Name: 103, dtype: object\n", "ssssssssssssssssssssssssssssssssss104ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 TRY NOT TO LAUGH / EPISODE 1 / NEW SERIES\n", - "2 ylyl \n", - "3 you laugh \n", - "4 you lose \n", - "5 you laugh you lose \n", - "6 you laugh you lose challenge \n", - "7 pewdiepie ylyl \n", - "8 pewdiepie ylyl 1 \n", - "9 try not to laugh \n", - "10 try not to laugh challenge \n", - "11 try not to laugh challenge episode 1 \n", - "12 new series \n", - "13 pewdiepie series \n", - "14 pewds \n", - "15 pewdie \n", - "16 pdp \n", - "17 try not to laugh clean \n", - "18 skrattar du \n", - "19 skrattar du förlorar du \n", - "20 TNTL \n", - "21 tntl clean \n", - "Name: 104, dtype: object\n", + "Series([], Name: 104, dtype: object)\n", "ssssssssssssssssssssssssssssssssss105ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 detroit \n", - "2 detroit become human \n", - "3 detroit become human gameplay\n", - "4 gameplay detroit become human\n", - "5 gameplay \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 full \n", - "9 commentary \n", + "0 https://i.ytimg.com/vi/WFRBxz6AeZI/hqdefault.jpg\n", "Name: 105, dtype: object\n", "ssssssssssssssssssssssssssssssssss106ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 detroit \n", - "2 detroit become human \n", - "3 detroit become human gameplay\n", - "4 gameplay detroit become human\n", - "5 gameplay \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 full \n", - "9 commentary \n", + "0 https://i.ytimg.com/vi/7yuPVq9DtV0/hqdefault.jpg\n", "Name: 106, dtype: object\n", "ssssssssssssssssssssssssssssssssss107ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 detroit become human \n", - "2 detroit \n", - "3 detroit become human gameplay\n", - "4 gameplay detroit become human\n", - "5 gameplay \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 full \n", - "9 commentary \n", + "0 https://i.ytimg.com/vi/vYP6GdsEmg0/hqdefault.jpg\n", "Name: 107, dtype: object\n", "ssssssssssssssssssssssssssssssssss108ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 detroit \n", - "2 detroit become human \n", - "3 detroit become human gameplay\n", - "4 gameplay detroit become human\n", - "5 gameplay \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 full \n", - "9 commentary \n", + "0 https://i.ytimg.com/vi/7k4GbHQNmQo/hqdefault.jpg\n", "Name: 108, dtype: object\n", "ssssssssssssssssssssssssssssssssss109ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 My fans have turned against me...\n", - "2 pewdiepie \n", - "3 pewds \n", - "4 pdp \n", - "5 pewdie \n", - "6 pewdiepie fans \n", - "7 lwiay \n", - "8 pewdiepie lwaiy \n", + "0 https://i.ytimg.com/vi/o_CSmob64uU/hqdefault.jpg\n", "Name: 109, dtype: object\n", "ssssssssssssssssssssssssssssssssss110ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/o8Je7hPgsdU/hqdefault.jpg\n", "Name: 110, dtype: object\n", "ssssssssssssssssssssssssssssssssss111ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh you lose\n", + "0 https://i.ytimg.com/vi/iDFjTrl7J8w/hqdefault.jpg\n", "Name: 111, dtype: object\n", "ssssssssssssssssssssssssssssssssss112ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 fouseytube \n", - "2 drake \n", - "3 drake july 2018 \n", - "4 pewdiepie \n", - "5 pewds \n", - "6 pewdie \n", - "7 pdp \n", - "8 fouseytube drake \n", - "9 dj khaled \n", - "10 djkhaled drake \n", - "11 dj khaled fouseytube\n", - "12 drake concert live \n", - "13 drake concert \n", - "14 concert \n", - "15 new drake \n", - "16 lil \n", - "17 lil rapper \n", - "18 rapper \n", - "19 lil rappers \n", - "20 6ix9ine \n", - "21 tekashi69 \n", - "22 6ix9ine pewdiepie \n", - "23 tekashi \n", - "24 drake pewdiepie \n", - "Name: 112, dtype: object\n", + "Series([], Name: 112, dtype: object)\n", "ssssssssssssssssssssssssssssssssss113ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 the dobre \n", - "2 dobre \n", - "3 dobre brothers \n", - "4 dobre twins \n", - "5 dobre brothers song \n", - "6 dobre brothers pranks\n", - "7 prank \n", - "8 pranks \n", - "9 slime \n", + "0 https://i.ytimg.com/vi/q2CBNLsQbCM/hqdefault.jpg\n", "Name: 113, dtype: object\n", "ssssssssssssssssssssssssssssssssss114ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 Tekashi 6ix9ine \n", - "2 Tekashi \n", - "3 6ix9ine \n", - "4 Tekashi 6ix9ine saved by polite cat\n", - "5 six nine \n", - "6 six \n", - "7 nine \n", - "8 69 \n", - "9 tekashi69 \n", - "10 pewdiepie \n", - "11 pewds \n", - "12 pdp \n", - "13 pewdie \n", - "14 meme review \n", - "15 6ix9ine pewdiepie \n", - "16 six nine pewdiepie \n", - "17 tekashi69 pewdiepie \n", - "18 polite cat \n", - "19 cat meme \n", - "20 cat memes \n", - "21 cats \n", - "22 cat \n", - "23 memes \n", - "24 meme compilation \n", + "0 https://i.ytimg.com/vi/jEYQqLtK_Xw/hqdefault.jpg\n", "Name: 114, dtype: object\n", "ssssssssssssssssssssssssssssssssss115ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 funny memes\n", - "2 meme \n", - "3 memes \n", - "4 curb \n", - "5 compilation\n", + "0 https://i.ytimg.com/vi/k66FoY5ndfI/hqdefault.jpg\n", "Name: 115, dtype: object\n", "ssssssssssssssssssssssssssssssssss116ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pew news \n", - "2 youtube \n", - "3 vox media \n", - "4 elon musk \n", - "5 thai \n", - "6 hank green \n", - "7 jessica price\n", - "8 guild wars 2 \n", - "9 media \n", + "0 https://i.ytimg.com/vi/WbW0rHCX2UU/hqdefault.jpg\n", "Name: 116, dtype: object\n", "ssssssssssssssssssssssssssssssssss117ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 one hand clapping\n", + "0 https://i.ytimg.com/vi/2YoUqR9fuA4/hqdefault.jpg\n", "Name: 117, dtype: object\n", "ssssssssssssssssssssssssssssssssss118ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 IS THIS LIVE...? \n", - "2 YOU \n", - "3 CRINGE \n", - "4 LOSE \n", - "5 you cringe \n", - "6 cringe \n", - "7 you cringe you lose \n", - "8 you cringe you lose pewdiepie\n", - "9 cringe comp \n", - "10 cringe compilation \n", - "11 cringe compilation 2018 \n", - "12 cringe compilations \n", - "13 media cringe \n", - "14 news \n", - "15 news cringe \n", - "16 news cringe reaction \n", - "17 news cringe moments \n", - "18 pewdiepie \n", - "19 pewds \n", - "20 pewdie \n", - "21 pdp \n", - "22 cringe moments \n", - "23 cringe moments on tv \n", - "24 pewdiepie cringe \n", + "0 https://i.ytimg.com/vi/Sr0fZ298eM8/hqdefault.jpg\n", "Name: 118, dtype: object\n", "ssssssssssssssssssssssssssssssssss119ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 all woman\n", - "2 are \n", - "3 queen \n", + "0 https://i.ytimg.com/vi/_umr17a_AdQ/hqdefault.jpg\n", "Name: 119, dtype: object\n", "ssssssssssssssssssssssssssssssssss120ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review \n", - "3 slaps hand on car\n", - "4 car salesman meme\n", + "0 https://i.ytimg.com/vi/XQjyjn3MdxM/hqdefault.jpg\n", "Name: 120, dtype: object\n", "ssssssssssssssssssssssssssssssssss121ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 WHAT THE MEDIA DOESNT TELL YOU ABOUT PEWDIEPIE\n", - "2 pewdiepie \n", - "3 pewds \n", - "4 pewdie \n", - "5 pdp \n", - "6 media \n", - "7 pewdiepie media \n", - "8 pewdiepie wsj \n", - "9 pewdiepie scandal \n", - "Name: 121, dtype: object\n", + "Series([], Name: 121, dtype: object)\n", "ssssssssssssssssssssssssssssssssss122ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 twitch \n", - "2 twitch victims \n", - "3 twitch fails \n", - "4 twitch fails 2018 \n", - "5 twitch girls comp \n", - "6 twitch girls 2018 \n", - "7 twitch gone wrong \n", - "8 twitch compilation\n", - "9 pewdie \n", - "10 pewdiepie \n", - "11 pewds \n", - "12 pdp \n", + "0 https://i.ytimg.com/vi/m3Xf1ra2Ekg/hqdefault.jpg\n", "Name: 122, dtype: object\n", "ssssssssssssssssssssssssssssssssss123ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh you lose \n", - "2 ylyl \n", - "3 skrattar du förlorar du\n", + "0 https://i.ytimg.com/vi/DYsCJEfQh1U/hqdefault.jpg\n", "Name: 123, dtype: object\n", "ssssssssssssssssssssssssssssssssss124ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 news \n", - "2 tana' \n", - "3 tana \n", - "4 mongeau'\n", - "5 tanacon \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ + "0 https://i.ytimg.com/vi/PK-GvWWQ03g/hqdefault.jpg\n", "Name: 124, dtype: object\n", "ssssssssssssssssssssssssssssssssss125ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 vlog \n", - "2 summer\n", - "3 idk \n", + "0 https://i.ytimg.com/vi/vHab6BNrHU8/hqdefault.jpg\n", "Name: 125, dtype: object\n", "ssssssssssssssssssssssssssssssssss126ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 eu \n", - "2 ban \n", - "3 memes\n", - "4 not \n", - "5 cool \n", - "6 guys \n", + "0 https://i.ytimg.com/vi/JKfFCVPjo_g/hqdefault.jpg\n", "Name: 126, dtype: object\n", "ssssssssssssssssssssssssssssssssss127ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 joke \n", - "2 over \n", - "3 head \n", + "0 https://i.ytimg.com/vi/__d5Q6IF1Sg/hqdefault.jpg\n", "Name: 127, dtype: object\n", "ssssssssssssssssssssssssssssssssss128ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tanacon \n", - "2 tana mongeau w \n", - "3 tana mongeau \n", - "4 gaming disorder \n", - "5 gaming \n", - "6 disorder \n", - "7 gaming disorder 2018 \n", - "8 gaming disorder video\n", - "9 pewdiepie \n", - "10 pewds \n", - "11 pdp \n", - "12 pewdie \n", - "13 pew news \n", + "0 https://i.ytimg.com/vi/oLBqixxgd6Y/hqdefault.jpg\n", "Name: 128, dtype: object\n", "ssssssssssssssssssssssssssssssssss129ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 skratta \n", - "3 skrattar \n", - "4 you laugh \n", - "5 you lose \n", - "6 you laugh you lose\n", - "7 try not to laugh \n", - "8 challenge \n", - "9 YOU LAUGH YOU SAD \n", + "0 https://i.ytimg.com/vi/X2bUUkWC7dE/hqdefault.jpg\n", "Name: 129, dtype: object\n", "ssssssssssssssssssssssssssssssssss130ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 LWIAY\n", - "Name: 130, dtype: object\n", + "Series([], Name: 130, dtype: object)\n", "ssssssssssssssssssssssssssssssssss131ssssssssssssssssssssssssssssssssss\n", - "0 meme \n", - "1 review \n", - "2 youtubes\n", - "3 favorite\n", - "4 show \n", + "0 https://i.ytimg.com/vi/szPjXJeIGP8/hqdefault.jpg\n", "Name: 131, dtype: object\n", "ssssssssssssssssssssssssssssssssss132ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/eEHBjP06WSI/hqdefault.jpg\n", "Name: 132, dtype: object\n", "ssssssssssssssssssssssssssssssssss133ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 MEMES COULD GET BANNED (NEEDS HELP ASAP SEND TO ALL YOUR FRIENDS AND FAMILY)\n", - "2 memes \n", - "3 meme \n", - "4 memes banned \n", - "5 memes ban \n", - "6 ban \n", - "7 pewds \n", - "8 pewdie \n", - "9 pewdiepie \n", - "10 pew news \n", - "11 news \n", - "12 pew \n", + "0 https://i.ytimg.com/vi/epgHrLszj-Q/hqdefault.jpg\n", "Name: 133, dtype: object\n", "ssssssssssssssssssssssssssssssssss134ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 lil \n", - "2 tay \n", - "3 lil tay \n", - "4 pewdiepie\n", - "5 pewdie \n", - "6 pdp \n", - "7 pewds \n", + "0 https://i.ytimg.com/vi/t3ppxtEU6No/hqdefault.jpg\n", "Name: 134, dtype: object\n", "ssssssssssssssssssssssssssssssssss135ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 flex seal \n", - "2 flex spray \n", - "3 flex commercial\n", - "4 tape commercial\n", - "5 commercial \n", + "0 https://i.ytimg.com/vi/yd62ObxkV44/hqdefault.jpg\n", "Name: 135, dtype: object\n", "ssssssssssssssssssssssssssssssssss136ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 dr phil \n", - "2 logan paul\n", - "3 psycho \n", - "4 youtuber \n", + "0 https://i.ytimg.com/vi/AkiC0_09Zss/hqdefault.jpg\n", "Name: 136, dtype: object\n", "ssssssssssssssssssssssssssssssssss137ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 fridays \n", - "2 with \n", - "3 pewdiepie \n", - "4 fridays with pewdiepie\n", - "5 lwiay \n", + "0 https://i.ytimg.com/vi/Xz5XIHrT4LQ/hqdefault.jpg\n", "Name: 137, dtype: object\n", "ssssssssssssssssssssssssssssssssss138ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 YLYL \n", - "2 SKRATTA DU \n", - "3 TRY NOT TO LAUGH\n", - "4 CHALLENGE \n", + "0 https://i.ytimg.com/vi/_lsDECLUt3k/hqdefault.jpg\n", "Name: 138, dtype: object\n", "ssssssssssssssssssssssssssssssssss139ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pewdiepie \n", - "2 react \n", - "3 world \n", - "4 dr phil \n", - "5 spoiled \n", - "6 brat \n", - "7 beverly hills\n", - "8 girl \n", - "9 15 \n", + "0 https://i.ytimg.com/vi/iBsg75W2Vig/hqdefault.jpg\n", "Name: 139, dtype: object\n", "ssssssssssssssssssssssssssssssssss140ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pew news \n", - "2 news \n", - "3 pewdiepie\n", + "0 https://i.ytimg.com/vi/sUtkJUJuq2U/hqdefault.jpg\n", "Name: 140, dtype: object\n", "ssssssssssssssssssssssssssssssssss141ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 lwiay\n", + "0 https://i.ytimg.com/vi/YzhLEjUD8hk/hqdefault.jpg\n", "Name: 141, dtype: object\n", "ssssssssssssssssssssssssssssssssss142ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 baldis basics \n", - "2 baldi's basics in education and learning \n", - "3 baldi's basics in education and learning secrets \n", - "4 baldis basics gameplay \n", - "5 baldis basics game \n", - "6 baldi's basics \n", - "7 baldis classroom \n", - "8 baldis education \n", - "9 baldis education and learning \n", - "10 baldis \n", - "11 basics \n", - "12 BALDIS BASICS IS THE SPOOKIEST GAME IN THE HISTORY OF THE WORLD AND UNIVERSE\n", - "13 baldis basics scary \n", - "14 baldis basics speedrun \n", - "15 pewds \n", - "16 pewdiepie \n", - "17 pewdie \n", - "18 pdp \n", - "19 baldi pewdiepie \n", - "Name: 142, dtype: object\n", - "ssssssssssssssssssssssssssssssssss143ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 media \n", - "2 vice \n", - "3 news \n", - "4 article \n", - "5 pewdiepie\n", - "Name: 143, dtype: object\n", - "ssssssssssssssssssssssssssssssssss144ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", - "Name: 144, dtype: object\n", - "ssssssssssssssssssssssssssssssssss145ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 humble \n", - "2 humble brag \n", - "3 bragging \n", - "4 youtuber \n", - "5 humble youtubers\n", - "6 youtubers humble\n", - "7 rich youtubers \n", - "8 rich youtube \n", - "Name: 145, dtype: object\n", - "ssssssssssssssssssssssssssssssssss146ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 lwiay \n", - "2 pewds \n", - "3 pewdie \n", - "4 pewdiepie\n", - "5 pdp \n", - "Name: 146, dtype: object\n", - "ssssssssssssssssssssssssssssssssss147ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 book \n", - "2 review \n", - "3 literature\n", - "4 club \n", - "Name: 147, dtype: object\n", - "ssssssssssssssssssssssssssssssssss148ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 fortnite\n", - "2 cringe \n", - "3 ali a \n", - "4 ninja \n", - "5 summit \n", - "Name: 148, dtype: object\n", - "ssssssssssssssssssssssssssssssssss149ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 sleep \n", - "2 challenge\n", - "3 horror \n", - "4 video \n", - "5 game \n", - "6 play \n", - "Name: 149, dtype: object\n", - "ssssssssssssssssssssssssssssssssss150ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 TESTING OUT EYETRACKING \n", - "2 eyetracker \n", - "3 eye tracking \n", - "4 eye tracker \n", - "5 tobii \n", - "6 tobii eye tracker \n", - "7 tobii eye tracking \n", - "8 tobii review \n", - "9 tobii eye tracker review\n", - "10 pewdiepie \n", - "11 pewds \n", - "12 pewdie \n", - "13 pdp \n", - "14 tracker \n", - "15 eye \n", - "16 tracking \n", - "Name: 150, dtype: object\n", - "ssssssssssssssssssssssssssssssssss151ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh you lose\n", - "2 ylyl \n", - "3 india \n", - "4 indian \n", - "5 meme \n", - "6 comedy \n", - "Name: 151, dtype: object\n", - "ssssssssssssssssssssssssssssssssss152ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 152, dtype: object\n", - "ssssssssssssssssssssssssssssssssss153ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 meme review\n", - "2 savage \n", - "3 patrick \n", - "4 fortnite \n", - "5 pubg \n", - "6 meme \n", - "7 memes \n", - "8 spongebob \n", - "Name: 153, dtype: object\n", - "ssssssssssssssssssssssssssssssssss154ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 trapland\n", - "Name: 154, dtype: object\n", - "ssssssssssssssssssssssssssssssssss155ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 155, dtype: object\n", - "ssssssssssssssssssssssssssssssssss156ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 156, dtype: object\n", - "ssssssssssssssssssssssssssssssssss157ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 157, dtype: object\n", - "ssssssssssssssssssssssssssssssssss158ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pew \n", - "2 news \n", - "Name: 158, dtype: object\n", - "ssssssssssssssssssssssssssssssssss159ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 trap adventure 2\n", - "Name: 159, dtype: object\n", - "ssssssssssssssssssssssssssssssssss160ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 lwiay \n", - "Name: 160, dtype: object\n", - "ssssssssssssssssssssssssssssssssss161ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 hmmm \n", - "Name: 161, dtype: object\n", - "ssssssssssssssssssssssssssssssssss162ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 162, dtype: object\n", - "ssssssssssssssssssssssssssssssssss163ssssssssssssssssssssssssssssssssss\n", - "0 party in backyard\n", - "1 hej monika \n", - "2 monika \n", - "3 monica \n", - "4 song \n", - "5 pewdiepie \n", - "6 sing \n", - "7 singing \n", - "Name: 163, dtype: object\n", - "ssssssssssssssssssssssssssssssssss164ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 trap adventure 2 \n", - "2 rage \n", - "3 quit \n", - "4 game \n", - "5 videogame \n", - "6 trap \n", - "7 adventure \n", - "8 free download \n", - "9 link \n", - "10 trap adventure download \n", - "11 trap adventure 2 download \n", - "12 trap adventure 2 free download\n", - "Name: 164, dtype: object\n", - "ssssssssssssssssssssssssssssssssss165ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 165, dtype: object\n", - "ssssssssssssssssssssssssssssssssss166ssssssssssssssssssssssssssssssssss\n", - "0 vr chat\n", - "Name: 166, dtype: object\n", - "ssssssssssssssssssssssssssssssssss167ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 jacksfilms\n", - "2 lwiay \n", - "3 yiay \n", - "Name: 167, dtype: object\n", - "ssssssssssssssssssssssssssssssssss168ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepi\n", - "1 indian \n", - "2 meme \n", - "Name: 168, dtype: object\n", - "ssssssssssssssssssssssssssssssssss169ssssssssssssssssssssssssssssssssss\n", - "0 ylyl\n", - "Name: 169, dtype: object\n", - "ssssssssssssssssssssssssssssssssss170ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 mad lad \n", - "Name: 170, dtype: object\n", - "ssssssssssssssssssssssssssssssssss171ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "Name: 171, dtype: object\n", - "ssssssssssssssssssssssssssssssssss172ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 you laugh you lose\n", - "2 ylyl \n", - "3 laugh \n", - "4 lose \n", - "Name: 172, dtype: object\n", - "ssssssssssssssssssssssssssssssssss173ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 nice guy \n", - "2 nice guys\n", - "3 reddit \n", - "Name: 173, dtype: object\n", - "ssssssssssssssssssssssssssssssssss174ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 vine \n", - "2 instagram\n", - "Name: 174, dtype: object\n", - "ssssssssssssssssssssssssssssssssss175ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 vr \n", - "2 vr chat \n", - "3 la noire\n", - "4 vr cases\n", - "Name: 175, dtype: object\n", - "ssssssssssssssssssssssssssssssssss176ssssssssssssssssssssssssssssssssss\n", - "0 im14thisisdeep \n", - "1 im 14 this is deep\n", - "2 this is deep \n", - "3 this is so deep \n", - "Name: 176, dtype: object\n", - "ssssssssssssssssssssssssssssssssss177ssssssssssssssssssssssssssssssssss\n", - "0 rick \n", - "1 and morty \n", - "2 rick and morty\n", - "Name: 177, dtype: object\n", - "ssssssssssssssssssssssssssssssssss178ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 the impossible quiz\n", - "Name: 178, dtype: object\n", - "ssssssssssssssssssssssssssssssssss179ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 179, dtype: object\n", - "ssssssssssssssssssssssssssssssssss180ssssssssssssssssssssssssssssssssss\n", - "0 YLYL\n", - "Name: 180, dtype: object\n", - "ssssssssssssssssssssssssssssssssss181ssssssssssssssssssssssssssssssssss\n", - "0 To the moon \n", - "1 sequel \n", - "2 finding paradise\n", - "3 paradice \n", - "4 walkthrough \n", - "5 playthrough \n", - "6 lets play \n", - "7 pewdiepie \n", - "8 part 1 \n", - "Name: 181, dtype: object\n", - "ssssssssssssssssssssssssssssssssss182ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 zootopia \n", - "2 doki doki literature club\n", - "3 doki doki \n", - "4 meme review \n", - "5 meme \n", - "6 death stranding \n", - "Name: 182, dtype: object\n", - "ssssssssssssssssssssssssssssssssss183ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 183, dtype: object\n", - "ssssssssssssssssssssssssssssssssss184ssssssssssssssssssssssssssssssssss\n", - "0 ylyl\n", - "Name: 184, dtype: object\n", - "ssssssssssssssssssssssssssssssssss185ssssssssssssssssssssssssssssssssss\n", - "0 doki doki \n", - "1 literature \n", - "2 club \n", - "3 litterature\n", - "4 part 1 \n", - "Name: 185, dtype: object\n", - "ssssssssssssssssssssssssssssssssss186ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 186, dtype: object\n", - "ssssssssssssssssssssssssssssssssss187ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 187, dtype: object\n", - "ssssssssssssssssssssssssssssssssss188ssssssssssssssssssssssssssssssssss\n", - "0 getting over it \n", - "1 walkthrough \n", - "2 playthrough \n", - "3 get over it \n", - "4 hiking \n", - "5 hammer \n", - "6 climb \n", - "7 climb game \n", - "8 clop \n", - "9 qwop \n", - "10 funny game \n", - "11 getting over it part 1\n", - "12 tutorial \n", - "13 full \n", - "Name: 188, dtype: object\n", - "ssssssssssssssssssssssssssssssssss189ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 189, dtype: object\n", - "ssssssssssssssssssssssssssssssssss190ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 190, dtype: object\n", - "ssssssssssssssssssssssssssssssssss191ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 191, dtype: object\n", - "ssssssssssssssssssssssssssssssssss192ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 192, dtype: object\n", - "ssssssssssssssssssssssssssssssssss193ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 jacksepticeye\n", - "2 whiskey \n", - "3 irish \n", - "4 review \n", - "Name: 193, dtype: object\n", - "ssssssssssssssssssssssssssssssssss194ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 194, dtype: object\n", - "ssssssssssssssssssssssssssssssssss195ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 south park \n", - "2 the fractured \n", - "3 but whole \n", - "4 south park game\n", - "5 sequel \n", - "6 new \n", - "7 gameplay \n", - "8 walkthrough \n", - "9 part 1 \n", - "10 full game \n", - "Name: 195, dtype: object\n", - "ssssssssssssssssssssssssssssssssss196ssssssssssssssssssssssssssssssssss\n", - "0 lwiay \n", - "1 reddit\n", - "Name: 196, dtype: object\n", - "ssssssssssssssssssssssssssssssssss197ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "Name: 197, dtype: object\n", - "ssssssssssssssssssssssssssssssssss198ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 198, dtype: object\n", - "ssssssssssssssssssssssssssssssssss199ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 pewd \n", - "4 pewdiepie cooking \n", - "5 cooking \n", - "6 how to \n", - "7 how to cook \n", - "8 how to cook meatballs \n", - "9 meatballs \n", - "10 meat balls \n", - "11 how to cook meatballs in a pan\n", - "12 how to cook meatballs in sauce\n", - "13 meatballs recipe \n", - "14 meatballs recipe tasty \n", - "15 tasty \n", - "16 recipe \n", - "17 best recipe \n", - "18 how to make \n", - "19 how to make meatballs \n", - "20 cook \n", - "21 homemade \n", - "Name: 199, dtype: object\n", - "ssssssssssssssssssssssssssssssssss200ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 200, dtype: object\n", - "ssssssssssssssssssssssssssssssssss201ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 201, dtype: object\n", - "ssssssssssssssssssssssssssssssssss202ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 202, dtype: object\n", - "ssssssssssssssssssssssssssssssssss203ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 anime \n", - "2 myanimelist \n", - "3 favourite anime\n", - "Name: 203, dtype: object\n", - "ssssssssssssssssssssssssssssssssss204ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 you \n", - "2 laugh \n", - "3 lose \n", - "4 challenge\n", - "Name: 204, dtype: object\n", - "ssssssssssssssssssssssssssssssssss205ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 205, dtype: object\n", - "ssssssssssssssssssssssssssssssssss206ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 whiskey \n", - "2 japanese\n", - "3 review \n", - "Name: 206, dtype: object\n", - "ssssssssssssssssssssssssssssssssss207ssssssssssssssssssssssssssssssssss\n", - "0 ylyl \n", - "1 you laugh you lose\n", - "2 try not to laugh \n", - "3 challenge \n", - "Name: 207, dtype: object\n", - "ssssssssssssssssssssssssssssssssss208ssssssssssssssssssssssssssssssssss\n", - "0 hardest\n", - "1 game \n", - "2 ever \n", - "Name: 208, dtype: object\n", - "ssssssssssssssssssssssssssssssssss209ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 how to \n", - "2 get started\n", - "3 youtube \n", - "4 youtuber \n", - "Name: 209, dtype: object\n", - "ssssssssssssssssssssssssssssssssss210ssssssssssssssssssssssssssssssssss\n", - "0 drawing \n", - "1 youtuber \n", - "2 youtubers\n", - "Name: 210, dtype: object\n", - "ssssssssssssssssssssssssssssssssss211ssssssssssssssssssssssssssssssssss\n", - "0 Pewdiepie\n", - "1 would \n", - "2 you \n", - "3 rather \n", - "Name: 211, dtype: object\n", - "ssssssssssssssssssssssssssssssssss212ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 stream \n", - "2 twitch \n", - "3 fail \n", - "4 fails \n", - "Name: 212, dtype: object\n", - "ssssssssssssssssssssssssssssssssss213ssssssssssssssssssssssssssssssssss\n", - "0 you \n", - "1 laugh \n", - "2 you lose\n", - "Name: 213, dtype: object\n", - "ssssssssssssssssssssssssssssssssss214ssssssssssssssssssssssssssssssssss\n", - "0 Pewdiepie\n", - "1 Jake \n", - "2 Logan \n", - "3 Paul \n", - "4 Team 10 \n", - "5 Dab \n", - "Name: 214, dtype: object\n", - "ssssssssssssssssssssssssssssssssss215ssssssssssssssssssssssssssssssssss\n", - "0 wormax.io\n", - "1 wormax \n", - "2 snake \n", - "3 game \n", - "4 online \n", - "5 free \n", - "Name: 215, dtype: object\n", - "ssssssssssssssssssssssssssssssssss216ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 respect \n", - "2 women \n", - "3 piers morgan \n", - "4 good morning britain\n", - "Name: 216, dtype: object\n", - "ssssssssssssssssssssssssssssssssss217ssssssssssssssssssssssssssssssssss\n", - "0 women \n", - "1 bbc \n", - "2 bbc 3\n", - "Name: 217, dtype: object\n", - "ssssssssssssssssssssssssssssssssss218ssssssssssssssssssssssssssssssssss\n", - "0 fridays \n", - "1 with \n", - "2 pewdiepie\n", - "Name: 218, dtype: object\n", - "ssssssssssssssssssssssssssssssssss219ssssssssssssssssssssssssssssssssss\n", - "0 5 weird \n", - "1 stuff \n", - "2 5 weird stuff online\n", - "Name: 219, dtype: object\n", - "ssssssssssssssssssssssssssssssssss220ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 katy perry\n", - "Name: 220, dtype: object\n", - "ssssssssssssssssssssssssssssssssss221ssssssssssssssssssssssssssssssssss\n", - "0 reacting \n", - "1 fridays \n", - "2 with pewdiepie \n", - "3 fridays with pewdiepie\n", - "4 react \n", - "5 fan submission \n", - "6 fan \n", - "7 fans \n", - "Name: 221, dtype: object\n", - "ssssssssssssssssssssssssssssssssss222ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 YOU LAUGH YOU'RE OUT \n", - "2 you laugh \n", - "3 you laugh you \n", - "4 you laugh you're \n", - "5 you laugh lose \n", - "6 you laugh you lose pewdiepie\n", - "7 laugh \n", - "8 lose \n", - "9 laugh lose \n", - "Name: 222, dtype: object\n", - "ssssssssssssssssssssssssssssssssss223ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 oblivion \n", - "2 elder scrolls\n", - "Name: 223, dtype: object\n", - "ssssssssssssssssssssssssssssssssss224ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 REACTING TO CRINGEY SPEED RUNS\n", - "2 cringe compilation \n", - "3 cringe compilation 2017 \n", - "4 speed runs \n", - "5 speed run \n", - "6 cringe \n", - "7 cringe reaction \n", - "8 reaction \n", - "9 cringe react \n", - "10 reacting to cringe \n", - "11 cringy reaction \n", - "12 cringey \n", - "13 reacting to cringey videos \n", - "14 cringey speed runs \n", - "15 speed \n", - "16 run \n", - "17 pewdiepie reaction \n", - "18 react \n", - "Name: 224, dtype: object\n", - "ssssssssssssssssssssssssssssssssss225ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 the rich life of pewdiepie \n", - "2 before he was famous \n", - "3 before he was famous pewdiepie \n", - "4 pewdiepie rich \n", - "5 pewdiepie net worth \n", - "6 how much money does pewdiepie make\n", - "7 how much money \n", - "8 youtube money \n", - "9 money \n", - "10 net worth \n", - "11 networth \n", - "12 rich \n", - "13 rich life \n", - "14 the rich life \n", - "Name: 225, dtype: object\n", - "ssssssssssssssssssssssssssssssssss226ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 react \n", - "2 react world\n", - "3 greenscreen\n", - "4 competition\n", - "Name: 226, dtype: object\n", - "ssssssssssssssssssssssssssssssssss227ssssssssssssssssssssssssssssssssss\n", - "0 moral \n", - "1 moral machine\n", - "Name: 227, dtype: object\n", - "ssssssssssssssssssssssssssssssssss228ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 e3 \n", - "2 react \n", - "3 react world\n", - "Name: 228, dtype: object\n", - "ssssssssssssssssssssssssssssssssss229ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 jake \n", - "2 paul \n", - "3 it's \n", - "4 everyday \n", - "5 bro \n", - "6 react \n", - "7 react world\n", - "8 fine \n", - "Name: 229, dtype: object\n", - "ssssssssssssssssssssssssssssssssss230ssssssssssssssssssssssssssssssssss\n", - "0 respect\n", - "1 women \n", - "2 react \n", - "3 meme \n", - "Name: 230, dtype: object\n", - "ssssssssssssssssssssssssssssssssss231ssssssssssssssssssssssssssssssssss\n", - "0 try not to \n", - "1 try not \n", - "2 dont laugh \n", - "3 try not to laugh\n", - "4 challenge \n", - "Name: 231, dtype: object\n", - "ssssssssssssssssssssssssssssssssss232ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 test \n", - "2 harvard \n", - "3 skin \n", - "4 race \n", - "Name: 232, dtype: object\n", - "ssssssssssssssssssssssssssssssssss233ssssssssssssssssssssssssssssssssss\n", - "0 fidget spinner \n", - "1 fidget spinner tricks\n", - "2 trick \n", - "3 fidget spinner unbox \n", - "Name: 233, dtype: object\n", - "ssssssssssssssssssssssssssssssssss234ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 angry \n", - "2 challenge\n", - "Name: 234, dtype: object\n", - "ssssssssssssssssssssssssssssssssss235ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 buzzfeed\n", - "2 drunk \n", - "3 goggle \n", - "4 goggles \n", - "Name: 235, dtype: object\n", - "ssssssssssssssssssssssssssssssssss236ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 Little Nightmares Gameplay \n", - "5 Little Nightmares Walkthrough Part 1\n", - "6 Little Nightmares Gameplay Part 1 \n", - "7 Little Nightmares Pewdiepie \n", - "8 Little Nightmares Trailer \n", - "9 Little Nightmares Full Gameplay \n", - "10 Little Nightmares PS4 \n", - "11 Little Nightmares Review \n", - "12 Little Nightmares Part 1 \n", - "13 Little Nightmares Reaction \n", - "14 Little Nightmares Scary \n", - "15 Little Nightmares Game \n", - "16 Scary Games \n", - "17 New PS4 Games \n", - "18 New Games 2017 \n", - "19 PS4 Games 2017 \n", - "20 Best Games 2017 \n", - "Name: 236, dtype: object\n", - "ssssssssssssssssssssssssssssssssss237ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 buzzfeed\n", - "Name: 237, dtype: object\n", - "ssssssssssssssssssssssssssssssssss238ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 238, dtype: object\n", - "ssssssssssssssssssssssssssssssssss239ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 barbie \n", - "5 youtube channel\n", - "6 vlogger \n", - "7 vlog \n", - "Name: 239, dtype: object\n", - "ssssssssssssssssssssssssssssssssss240ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "Name: 240, dtype: object\n", - "ssssssssssssssssssssssssssssssssss241ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 family friendly\n", - "5 frozen \n", - "6 frozen games \n", - "Name: 241, dtype: object\n", - "ssssssssssssssssssssssssssssssssss242ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 can this video\n", - "5 get \n", - "Name: 242, dtype: object\n", - "ssssssssssssssssssssssssssssssssss243ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 everything \n", - "5 game \n", - "6 everything game \n", - "7 play as anything \n", - "8 play as everything\n", - "9 play as \n", - "10 play everything \n", - "Name: 243, dtype: object\n", - "ssssssssssssssssssssssssssssssssss244ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 mass \n", - "5 effect \n", - "6 andromeda\n", - "7 video \n", - "8 game \n", - "9 ME \n", - "Name: 244, dtype: object\n", - "ssssssssssssssssssssssssssssssssss245ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "Name: 245, dtype: object\n", - "ssssssssssssssssssssssssssssssssss246ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 mind \n", - "5 blown \n", - "Name: 246, dtype: object\n", - "ssssssssssssssssssssssssssssssssss247ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 how dirty \n", - "5 is your mind\n", - "6 dirty mind \n", - "7 photos \n", - "8 funny \n", - "Name: 247, dtype: object\n", - "ssssssssssssssssssssssssssssssssss248ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 try not \n", - "5 to \n", - "6 laugh \n", - "7 try not to laugh\n", - "8 dont laugh \n", - "9 challenge \n", - "Name: 248, dtype: object\n", - "ssssssssssssssssssssssssssssssssss249ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 249, dtype: object\n", - "ssssssssssssssssssssssssssssssssss250ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 smash \n", - "5 or \n", - "6 pass \n", - "Name: 250, dtype: object\n", - "ssssssssssssssssssssssssssssssssss251ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 before he was famous\n", - "5 famous \n", - "6 young \n", - "7 young pewdiepie \n", - "Name: 251, dtype: object\n", - "ssssssssssssssssssssssssssssssssss252ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 try not \n", - "5 to get \n", - "6 try not to get \n", - "7 scared \n", - "8 challenge \n", - "9 scared challenge \n", - "10 try not to get scared challenge\n", - "Name: 252, dtype: object\n", - "ssssssssssssssssssssssssssssssssss253ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 253, dtype: object\n", - "ssssssssssssssssssssssssssssssssss254ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 re7 \n", - "5 RESIDENT eVIL 7 \n", - "6 GAMEPLAY \n", - "7 Resident Evil 7: Biohazard\n", - "8 BIOHAZARD \n", - "9 rewind \n", - "10 biohazard \n", - "11 survival horror \n", - "12 ps4 \n", - "13 playstation 4 \n", - "14 vr \n", - "15 demo \n", - "Name: 254, dtype: object\n", - "ssssssssssssssssssssssssssssssssss255ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 double gal\n", - "5 gun \n", - "6 double \n", - "7 gal \n", - "8 girl \n", - "9 anime \n", - "10 animes \n", - "Name: 255, dtype: object\n", - "ssssssssssssssssssssssssssssssssss256ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 cringe \n", - "5 try not \n", - "6 challenge \n", - "7 try not to\n", - "8 handshake \n", - "9 handshakes\n", - "Name: 256, dtype: object\n", - "ssssssssssssssssssssssssssssssssss257ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "Name: 257, dtype: object\n", - "ssssssssssssssssssssssssssssssssss258ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 beat \n", - "5 subscribers\n", - "6 most \n", - "Name: 258, dtype: object\n", - "ssssssssssssssssssssssssssssssssss259ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "Name: 259, dtype: object\n", - "ssssssssssssssssssssssssssssssssss260ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 to be continued\n", - "5 meme \n", - "6 compilation \n", - "7 continue \n", - "8 jojo \n", - "9 jojos \n", - "10 bizarre \n", - "11 adventure \n", - "Name: 260, dtype: object\n", - "ssssssssssssssssssssssssssssssssss261ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 how long can you watch\n", - "5 how long \n", - "6 watch \n", - "7 challenge \n", - "8 watching \n", - "9 time \n", - "Name: 261, dtype: object\n", - "ssssssssssssssssssssssssssssssssss262ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 the walking dead \n", - "5 walking dead \n", - "6 part 1 \n", - "7 season 3 \n", - "8 telltale \n", - "9 game \n", - "10 the walking dead seasons 3\n", - "11 walking dead full game \n", - "Name: 262, dtype: object\n", - "ssssssssssssssssssssssssssssssssss263ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "Name: 263, dtype: object\n", - "ssssssssssssssssssssssssssssssssss264ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 who is \n", - "5 more likely \n", - "6 markiplier \n", - "7 jacksepticeye \n", - "8 who is more likely\n", - "9 most likely \n", - "Name: 264, dtype: object\n", - "ssssssssssssssssssssssssssssssssss265ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 Bottleflip \n", - "5 Challenge \n", - "6 Bottle \n", - "7 Dab \n", - "8 Meme \n", - "9 Jacksepticeye\n", - "Name: 265, dtype: object\n", - "ssssssssssssssssssssssssssssssssss266ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 hot \n", - "5 sauce \n", - "6 lootcrate\n", - "Name: 266, dtype: object\n", - "ssssssssssssssssssssssssssssssssss267ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 deleting\n", - "5 channel \n", - "Name: 267, dtype: object\n", - "ssssssssssssssssssssssssssssssssss268ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 vlog \n", - "5 birdabo \n", - "Name: 268, dtype: object\n", - "ssssssssssssssssssssssssssssssssss269ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 Tuber Simulator \n", - "5 Tuber \n", - "6 Simulator \n", - "7 Pewdiepie Simulator \n", - "8 Pewdiepie Game \n", - "9 Youtube Game \n", - "10 IOS \n", - "11 Android \n", - "12 Youtuber Simulator \n", - "13 Competition \n", - "14 Fridays \n", - "15 Fridays with Pewdiepie\n", - "Name: 269, dtype: object\n", - "ssssssssssssssssssssssssssssssssss270ssssssssssssssssssssssssssssssssss\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 Vlog \n", - "5 Jacksepticeye\n", - "6 Slippy \n", - "7 Holiday \n", - "8 video \n", - "9 log \n", - "10 kickthepj \n", - "Name: 270, dtype: object\n", - "ssssssssssssssssssssssssssssssssss271ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 my \n", - "5 favourite\n", - "6 videos \n", - "7 ever \n", - "Name: 271, dtype: object\n", - "ssssssssssssssssssssssssssssssssss272ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 meme \n", - "4 react \n", - "5 spicy \n", - "6 dank \n", - "Name: 272, dtype: object\n", - "ssssssssssssssssssssssssssssssssss273ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 cringy \n", - "4 cringe \n", - "5 cringe kid \n", - "6 cringe compilation\n", - "7 cringe react \n", - "8 react \n", - "Name: 273, dtype: object\n", - "ssssssssssssssssssssssssssssssssss274ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 Happy Wheels \n", - "4 Happy Wheels 3D\n", - "5 Guts and Glory \n", - "6 Let's Play \n", - "7 Download \n", - "8 Alpha \n", - "9 Gameplay \n", - "10 Montage \n", - "Name: 274, dtype: object\n", - "ssssssssssssssssssssssssssssssssss275ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 kick the pj\n", - "4 pj \n", - "5 google \n", - "6 google feud\n", - "Name: 275, dtype: object\n", - "ssssssssssssssssssssssssssssssssss276ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 lootcrate \n", - "4 5 weird stuff online\n", - "5 vlog \n", - "6 unboxing \n", - "Name: 276, dtype: object\n", - "ssssssssssssssssssssssssssssssssss277ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 Marzia \n", - "4 Gangbeasts \n", - "5 Multiplayer \n", - "6 gang beasts \n", - "7 gan \n", - "8 gang \n", - "9 beasts \n", - "10 funny multiplayer\n", - "11 funny \n", - "12 multiplayer \n", - "13 2 player \n", - "14 coop \n", - "Name: 277, dtype: object\n", - "ssssssssssssssssssssssssssssssssss278ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 Welcome to the game \n", - "4 Steam \n", - "5 deep web \n", - "6 illegal \n", - "7 hackers \n", - "8 hacking \n", - "9 hack \n", - "10 Welcome to the game red room \n", - "11 welcome to the game all codes\n", - "12 Hacking Game \n", - "Name: 278, dtype: object\n", - "ssssssssssssssssssssssssssssssssss279ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "Name: 279, dtype: object\n", - "ssssssssssssssssssssssssssssssssss280ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 react \n", - "4 subscriber \n", - "5 special \n", - "6 montage \n", - "7 old pewdiepie\n", - "8 new pewdiepie\n", - "9 vlog \n", - "10 fridays \n", - "Name: 280, dtype: object\n", - "ssssssssssssssssssssssssssssssssss281ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 vlog \n", - "4 kicked out\n", - "5 moving \n", - "6 house \n", - "7 landlord \n", - "Name: 281, dtype: object\n", - "ssssssssssssssssssssssssssssssssss282ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 diamond \n", - "4 play button\n", - "5 playbutton \n", - "6 youtube \n", - "7 unboxing \n", - "Name: 282, dtype: object\n", - "ssssssssssssssssssssssssssssssssss283ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 283, dtype: object\n", - "ssssssssssssssssssssssssssssssssss284ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 uncharted 4 \n", - "4 uncharted \n", - "5 gameplay \n", - "6 uncharted 4 gameplay \n", - "7 uncharted 4 walkthrough part 1\n", - "8 through \n", - "9 play \n", - "10 walk \n", - "11 let's play \n", - "12 uncharted 4 trailer \n", - "13 gameplay walkthrough \n", - "14 a theif's end \n", - "15 review \n", - "16 multiplayer \n", - "Name: 284, dtype: object\n", - "ssssssssssssssssssssssssssssssssss285ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 sophie's curse\n", - "4 sohpie \n", - "5 curse \n", - "6 steam \n", - "7 horror \n", - "8 jumpscare \n", - "9 let's play \n", - "Name: 285, dtype: object\n", - "ssssssssssssssssssssssssssssssssss286ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 286, dtype: object\n", - "ssssssssssssssssssssssssssssssssss287ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 video games \n", - "4 dark souls 3 \n", - "5 dark souls 3 gameplay\n", - "6 gameplay \n", - "7 lets play \n", - "8 lets \n", - "9 play \n", - "10 commentary \n", - "11 dark souls \n", - "12 part 2 \n", - "13 game \n", - "14 walk \n", - "15 through \n", - "16 walkthrough \n", - "17 playthrough \n", - "Name: 287, dtype: object\n", - "ssssssssssssssssssssssssssssssssss288ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 288, dtype: object\n", - "ssssssssssssssssssssssssssssssssss289ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 video games\n", - "4 60 seconds \n", - "5 60 \n", - "6 seconds \n", - "7 steam \n", - "8 lets play \n", - "Name: 289, dtype: object\n", - "ssssssssssssssssssssssssssssssssss290ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 290, dtype: object\n", - "ssssssssssssssssssssssssssssssssss291ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 291, dtype: object\n", - "ssssssssssssssssssssssssssssssssss292ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 video games \n", - "4 pewdiepie iq\n", - "5 iq \n", - "6 iq test \n", - "7 smart \n", - "8 how smart \n", - "Name: 292, dtype: object\n", - "ssssssssssssssssssssssssssssssssss293ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 293, dtype: object\n", - "ssssssssssssssssssssssssssssssssss294ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 294, dtype: object\n", - "ssssssssssssssssssssssssssssssssss295ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 295, dtype: object\n", - "ssssssssssssssssssssssssssssssssss296ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 walk through\n", - "7 video games \n", - "8 lets play \n", - "Name: 296, dtype: object\n", - "ssssssssssssssssssssssssssssssssss297ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 walk through\n", - "7 video games \n", - "8 lets play \n", - "9 world chef \n", - "Name: 297, dtype: object\n", - "ssssssssssssssssssssssssssssssssss298ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 298, dtype: object\n", - "ssssssssssssssssssssssssssssssssss299ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 walk through \n", - "7 video games \n", - "8 lets play \n", - "9 mgsv \n", - "10 metal gear solid \n", - "11 the phantom pain \n", - "12 metal gear solid 5\n", - "13 intense \n", - "14 youtube gaming \n", - "15 gaming \n", - "16 gameplay \n", - "Name: 299, dtype: object\n", - "ssssssssssssssssssssssssssssssssss300ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 300, dtype: object\n", - "ssssssssssssssssssssssssssssssssss301ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 walk through\n", - "7 video games \n", - "8 lets play \n", - "Name: 301, dtype: object\n", - "ssssssssssssssssssssssssssssssssss302ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 walk through \n", - "7 video games \n", - "8 lets play \n", - "9 spookys \n", - "10 spooky's \n", - "11 house \n", - "12 of jumpscares\n", - "13 jumpscare \n", - "14 jumpscares \n", - "15 jumpscared \n", - "16 horror \n", - "17 scary \n", - "18 funny \n", - "19 reaction \n", - "Name: 302, dtype: object\n", - "ssssssssssssssssssssssssssssssssss303ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 303, dtype: object\n", - "ssssssssssssssssssssssssssssssssss304ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 304, dtype: object\n", - "ssssssssssssssssssssssssssssssssss305ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 305, dtype: object\n", - "ssssssssssssssssssssssssssssssssss306ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through\n", - "10 video games \n", - "11 vlog vlog \n", - "12 vlog \n", - "13 vlogs \n", - "Name: 306, dtype: object\n", - "ssssssssssssssssssssssssssssssssss307ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 307, dtype: object\n", - "ssssssssssssssssssssssssssssssssss308ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through \n", - "10 video games \n", - "11 The Walking Dead - Season 2 (TV Season)\n", - "12 telltale game \n", - "13 telltale games \n", - "14 walking dead \n", - "15 story \n", - "16 zombie \n", - "Name: 308, dtype: object\n", - "ssssssssssssssssssssssssssssssssss309ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through\n", - "10 video games \n", - "Name: 309, dtype: object\n", - "ssssssssssssssssssssssssssssssssss310ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 310, dtype: object\n", - "ssssssssssssssssssssssssssssssssss311ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through \n", - "10 video games \n", - "11 the imossible \n", - "12 quiz \n", - "13 question \n", - "14 questions \n", - "15 funny \n", - "16 reaction \n", - "17 the impossible quiz\n", - "18 all answers \n", - "19 answers \n", - "20 cheat \n", - "Name: 311, dtype: object\n", - "ssssssssssssssssssssssssssssssssss312ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through \n", - "10 video games \n", - "11 the wolf among us trailer\n", - "12 telltale \n", - "13 wolf among us \n", - "14 Gameplay \n", - "15 Ps3 \n", - "16 review \n", - "17 telltale games \n", - "18 part 1 \n", - "19 Xbox \n", - "20 the wolf among us \n", - "21 among \n", - "22 snowwhite \n", - "23 snow white \n", - "24 fairytale \n", - "Name: 312, dtype: object\n", - "ssssssssssssssssssssssssssssssssss313ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through \n", - "10 video games \n", - "11 linger \n", - "12 oculus \n", - "13 rift \n", - "14 reaction \n", - "15 oculus rift \n", - "16 vr \n", - "17 virtual reality\n", - "Name: 313, dtype: object\n", - "ssssssssssssssssssssssssssssssssss314ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through\n", - "10 video games \n", - "Name: 314, dtype: object\n", - "ssssssssssssssssssssssssssssssssss315ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through\n", - "10 video games \n", - "Name: 315, dtype: object\n", - "ssssssssssssssssssssssssssssssssss316ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 316, dtype: object\n", - "ssssssssssssssssssssssssssssssssss317ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 317, dtype: object\n", - "ssssssssssssssssssssssssssssssssss318ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 lets \n", - "3 play \n", - "4 let´s play \n", - "5 horror \n", - "6 game \n", - "7 walkthrough\n", - "8 playthrough\n", - "9 letsplay \n", - "10 mod \n", - "11 gameplay \n", - "12 trailer \n", - "13 commentary \n", - "14 funny \n", - "Name: 318, dtype: object\n", - "ssssssssssssssssssssssssssssssssss319ssssssssssssssssssssssssssssssssss\n", - "0 Sequence\n", - "1 01 \n", - "2 19 \n", - "Name: 319, dtype: object\n", - "ssssssssssssssssssssssssssssssssss320ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 lets \n", - "3 play \n", - "4 let´s play \n", - "5 horror \n", - "6 game \n", - "7 walkthrough\n", - "8 playthrough\n", - "9 letsplay \n", - "10 mod \n", - "11 gameplay \n", - "12 trailer \n", - "13 commentary \n", - "14 funny \n", - "Name: 320, dtype: object\n", - "ssssssssssssssssssssssssssssssssss321ssssssssssssssssssssssssssssssssss\n", - "0 condemned \n", - "1 part \n", - "2 condmned \n", - "3 parrt \n", - "4 condomned \n", - "5 pewdiepie \n", - "6 lets \n", - "7 play \n", - "8 let's play\n", - "9 video \n", - "10 games \n", - "11 horror \n", - "12 xbox \n", - "13 ps3 \n", - "14 hd \n", - "15 pewdie \n", - "16 scary \n", - "17 game \n", - "18 scary game\n", - "19 gameplay \n", - "20 ending \n", - "21 secret \n", - "22 jumpscare \n", - "23 pop \n", - "24 pewds \n", - "Name: 321, dtype: object\n", - "ssssssssssssssssssssssssssssssssss322ssssssssssssssssssssssssssssssssss\n", - "0 Amnesiaaa \n", - "1 followed \n", - "2 by \n", - "3 death \n", - "4 ch2 \n", - "5 part \n", - "6 amnesia \n", - "7 the \n", - "8 dark \n", - "9 descent \n", - "10 pewdiepie\n", - "11 pewdie \n", - "12 custom \n", - "13 Ghosts \n", - "14 Tape \n", - "15 Pewdiepie\n", - "16 screaming\n", - "17 scream \n", - "18 girly \n", - "19 girl \n", - "20 horror \n", - "21 Scared \n", - "22 Creepy \n", - "23 Funny \n", - "24 chapter \n", - "Name: 322, dtype: object\n", - "ssssssssssssssssssssssssssssssssss323ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 323, dtype: object\n", - "ssssssssssssssssssssssssssssssssss324ssssssssssssssssssssssssssssssssss\n", - "0 callllllllll \n", - "1 28 \n", - "2 calling \n", - "3 27 \n", - "4 part \n", - "5 26 \n", - "6 clallinin \n", - "7 penis \n", - "8 the \n", - "9 lets \n", - "10 play \n", - "11 walkthrough \n", - "12 playthrough \n", - "13 through \n", - "14 wii \n", - "15 gameplay \n", - "16 Suzutani \n", - "17 suzutani \n", - "18 The \n", - "19 Possession \n", - "20 possession \n", - "21 ghosts \n", - "22 yt:quality=high\n", - "23 pewdiepie \n", - "24 funny \n", - "25 scary \n", - "26 wierd \n", - "Name: 324, dtype: object\n", - "ssssssssssssssssssssssssssssssssss325ssssssssssssssssssssssssssssssssss\n", - "0 calling \n", - "1 27 \n", - "2 part \n", - "3 26 \n", - "4 clallinin \n", - "5 penis \n", - "6 the \n", - "7 lets \n", - "8 play \n", - "9 walkthrough \n", - "10 playthrough \n", - "11 through \n", - "12 wii \n", - "13 gameplay \n", - "14 Suzutani \n", - "15 suzutani \n", - "16 The \n", - "17 Possession \n", - "18 possession \n", - "19 ghosts \n", - "20 yt:quality=high\n", - "21 pewdiepie \n", - "22 funny \n", - "23 scary \n", - "24 wierd \n", - "Name: 325, dtype: object\n", - "ssssssssssssssssssssssssssssssssss326ssssssssssssssssssssssssssssssssss\n", - "0 part \n", - "1 26 \n", - "2 clallinin \n", - "3 penis \n", - "4 calling \n", - "5 the \n", - "6 lets \n", - "7 play \n", - "8 walkthrough \n", - "9 playthrough \n", - "10 through \n", - "11 wii \n", - "12 gameplay \n", - "13 Suzutani \n", - "14 suzutani \n", - "15 The \n", - "16 Possession \n", - "17 possession \n", - "18 ghosts \n", - "19 yt:quality=high\n", - "20 pewdiepie \n", - "21 funny \n", - "22 scary \n", - "23 wierd \n", - "Name: 326, dtype: object\n", - "ssssssssssssssssssssssssssssssssss327ssssssssssssssssssssssssssssssssss\n", - "0 calling \n", - "1 the \n", - "2 lets \n", - "3 play \n", - "4 walkthrough \n", - "5 playthrough \n", - "6 through \n", - "7 wii \n", - "8 gameplay \n", - "9 Suzutani \n", - "10 suzutani \n", - "11 The \n", - "12 Possession \n", - "13 possession \n", - "14 ghosts \n", - "15 yt:quality=high\n", - "16 pewdiepie \n", - "17 funny \n", - "18 scary \n", - "19 wierd \n", - "Name: 327, dtype: object\n", - "ssssssssssssssssssssssssssssssssss328ssssssssssssssssssssssssssssssssss\n", - "0 the \n", - "1 attic \n", - "2 part \n", - "3 The \n", - "4 lets \n", - "5 play \n", - "6 playthrough \n", - "7 pewdiepie \n", - "8 chapter \n", - "9 scary \n", - "10 pewdie \n", - "11 walkthrough \n", - "12 horror \n", - "13 scared \n", - "14 screaming \n", - "15 scream \n", - "16 Funny \n", - "17 Horror Fiction\n", - "18 Maze \n", - "19 Game \n", - "20 Weird \n", - "21 Creepy \n", - "22 Open \n", - "23 Scare \n", - "24 Next \n", - "25 Strange \n", - "26 Prank \n", - "27 Story \n", - "28 Outside \n", - "29 Scary Maze \n", - "30 Rat \n", - "31 Scaring \n", - "Name: 328, dtype: object\n", - "ssssssssssssssssssssssssssssssssss329ssssssssssssssssssssssssssssssssss\n", - "0 Sequence \n", - "1 01 \n", - "2 aom \n", - "3 Afraid \n", - "4 Of \n", - "5 Monsters \n", - "6 director's \n", - "7 cut \n", - "8 ending \n", - "9 all endings\n", - "10 soundtrack \n", - "11 creepy \n", - "12 half \n", - "13 life \n", - "14 mod \n", - "15 sweden \n", - "16 pewdiepie \n", - "17 pewdie \n", - "18 scary \n", - "19 Scream \n", - "20 Game \n", - "21 Scared \n", - "22 Maze \n", - "23 Weird \n", - "24 Screaming \n", - "25 Strange \n", - "26 Funny \n", - "27 Prank \n", - "28 Scary Maze \n", - "29 Scaring \n", - "Name: 329, dtype: object\n", - "ssssssssssssssssssssssssssssssssss330ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 330, dtype: object\n", - "ssssssssssssssssssssssssssssssssss331ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 331, dtype: object\n", - "ssssssssssssssssssssssssssssssssss332ssssssssssssssssssssssssssssssssss\n", - "0 octodad \n", - "1 Octodad \n", - "2 Official \n", - "3 Trailer \n", - "4 octodad ending \n", - "5 octodad trailer \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 lets \n", - "9 play \n", - "10 let's \n", - "11 pewdiepie \n", - "12 funny \n", - "13 wierd \n", - "14 indie \n", - "15 Trailer (promotion)\n", - "16 Game \n", - "17 Weird \n", - "18 Gameplay \n", - "19 Playthrough Part \n", - "20 Humour \n", - "21 Play (theatre) \n", - "22 Crazy \n", - "23 Random \n", - "24 Silly \n", - "25 Mission \n", - "Name: 332, dtype: object\n", - "ssssssssssssssssssssssssssssssssss333ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 333, dtype: object\n", - "ssssssssssssssssssssssssssssssssss334ssssssssssssssssssssssssssssssssss\n", - "0 octodad \n", - "1 Octodad \n", - "2 Official \n", - "3 Trailer \n", - "4 octodad ending \n", - "5 octodad trailer \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 lets \n", - "9 play \n", - "10 let's \n", - "11 pewdiepie \n", - "12 funny \n", - "13 wierd \n", - "14 indie \n", - "15 Trailer (promotion)\n", - "16 Game \n", - "17 Weird \n", - "18 Gameplay \n", - "19 Playthrough Part \n", - "Name: 334, dtype: object\n", - "ssssssssssssssssssssssssssssssssss335ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough \n", - "5 naked \n", - "6 scared \n", - "7 playthrough \n", - "8 amnesia \n", - "9 the \n", - "10 dark \n", - "11 descent \n", - "12 custom \n", - "13 story \n", - "14 mod \n", - "15 100% \n", - "16 scary \n", - "17 Scary and Funny Moments\n", - "18 scariest \n", - "19 moment \n", - "20 funny \n", - "21 Black \n", - "22 Plauge \n", - "23 Requiem \n", - "24 Frictional \n", - "25 how \n", - "26 to \n", - "27 Top \n", - "28 Scary \n", - "29 Moments \n", - "30 /W \n", - "31 PewDiePie \n", - "32 countdown \n", - "33 library of alexandria \n", - "34 part \n", - "35 Stephanos House \n", - "36 Stephano \n", - "37 piggeh \n", - "38 bro \n", - "39 Funny \n", - "40 Best \n", - "41 Let's \n", - "42 Game \n", - "43 Weird \n", - "44 Part 2 \n", - "Name: 335, dtype: object\n", - "ssssssssssssssssssssssssssssssssss336ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 336, dtype: object\n", - "ssssssssssssssssssssssssssssssssss337ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 337, dtype: object\n", - "ssssssssssssssssssssssssssssssssss338ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough\n", - "5 naked \n", - "6 scared \n", - "7 playthrough\n", - "8 amnesia \n", - "9 the \n", - "10 dark \n", - "11 descent \n", - "12 custom \n", - "13 story \n", - "14 mod \n", - "15 100% \n", - "16 scary \n", - "17 cannibalism\n", - "18 funny \n", - "19 moments \n", - "20 moment \n", - "21 top \n", - "22 pewdie \n", - "23 monster \n", - "24 trailer \n", - "25 100 \n", - "26 part 2 \n", - "27 episode 2 \n", - "28 Amnesia \n", - "29 nightmare \n", - "Name: 338, dtype: object\n", - "ssssssssssssssssssssssssssssssssss339ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough \n", - "5 magicka \n", - "6 playthrough \n", - "7 through \n", - "8 xebaz \n", - "9 tsubasahara \n", - "10 part 2 \n", - "11 magic wizards and shit\n", - "12 game \n", - "13 gameplay \n", - "14 playthrough part \n", - "15 mission \n", - "16 kevin \n", - "17 video game \n", - "18 Orlando Magic \n", - "19 Magic Johnson \n", - "20 playstation \n", - "21 trick \n", - "22 ps2 \n", - "23 xbox \n", - "24 card \n", - "25 tricks \n", - "26 ARMA 2 \n", - "27 john \n", - "28 revealed \n", - "29 david \n", - "30 criss \n", - "31 PlayStation 3 \n", - "32 Xbox \n", - "Name: 339, dtype: object\n", - "ssssssssssssssssssssssssssssssssss340ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough\n", - "5 naked \n", - "6 scared \n", - "7 playthrough\n", - "8 amnesia \n", - "9 the \n", - "10 dark \n", - "11 descent \n", - "12 custom \n", - "13 story \n", - "14 mod \n", - "15 100% \n", - "16 scary \n", - "17 cannibalism\n", - "18 funny \n", - "19 moments \n", - "20 moment \n", - "21 top \n", - "22 pewdie \n", - "23 monster \n", - "24 trailer \n", - "25 100 \n", - "26 part 5 \n", - "27 episode 5 \n", - "28 Through \n", - "29 portal \n", - "30 secret room\n", - "31 trollface \n", - "32 problem \n", - "Name: 340, dtype: object\n", - "ssssssssssssssssssssssssssssssssss341ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough\n", - "5 naked \n", - "6 scared \n", - "7 playthrough\n", - "8 amnesia \n", - "9 the \n", - "10 dark \n", - "11 descent \n", - "12 custom \n", - "13 story \n", - "14 mod \n", - "15 100% \n", - "16 scary \n", - "17 cannibalism\n", - "18 funny \n", - "19 moments \n", - "20 moment \n", - "21 top \n", - "22 pewdie \n", - "23 monster \n", - "24 trailer \n", - "25 100 \n", - "26 part 3 \n", - "27 episode 3 \n", - "28 Through \n", - "29 portal \n", - "30 game \n", - "31 level \n", - "32 let's \n", - "33 let's play \n", - "34 gameplay \n", - "35 techno \n", - "36 kevin \n", - "37 games \n", - "Name: 341, dtype: object\n", - "ssssssssssssssssssssssssssssssssss342ssssssssssssssssssssssssssssssssss\n", - "0 dead \n", - "1 island \n", - "2 Dead island gameplay \n", - "3 co-op \n", - "4 coop \n", - "5 lets \n", - "6 play \n", - "7 let \n", - "8 playthrough \n", - "9 walkthrough \n", - "10 dead island lets play \n", - "11 dead island playthrough\n", - "12 ending \n", - "13 zombie \n", - "14 zombies \n", - "15 survival \n", - "16 horror \n", - "17 pegi \n", - "18 uk \n", - "19 violence \n", - "20 violent \n", - "21 open \n", - "22 world \n", - "23 sandbox \n", - "24 Zombie \n", - "25 Horror \n", - "26 Banoi \n", - "27 Undead \n", - "28 PC \n", - "29 Xbox \n", - "30 360 \n", - "31 Playstation \n", - "32 PS3 \n", - "33 Deep \n", - "34 Silver \n", - "35 Techland \n", - "36 2011 \n", - "37 yt:quality=high \n", - "38 HD \n", - "39 720 \n", - "40 1080 \n", - "41 pewdiepie \n", - "42 morfar \n", - "43 cam \n", - "44 camera \n", - "45 pre \n", - "46 order \n", - "47 weapon \n", - "Name: 342, dtype: object\n", - "ssssssssssssssssssssssssssssssssss343ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough \n", - "5 naked \n", - "6 scared \n", - "7 playthrough \n", - "8 Fatal \n", - "9 Frame \n", - "10 Lets \n", - "11 blind \n", - "12 fatal \n", - "13 frame \n", - "14 II \n", - "15 pewdie \n", - "16 ending \n", - "17 part 1 \n", - "18 Fatal Frame Playthrough part 1\n", - "19 episode \n", - "20 let's \n", - "21 let's play \n", - "22 crimson \n", - "23 butterfly \n", - "24 scary \n", - "25 game \n", - "26 vampire \n", - "27 funny \n", - "28 gameplay \n", - "29 zero \n", - "30 playthrough part \n", - "31 mission \n", - "32 scream \n", - "33 anime \n", - "34 video \n", - "Name: 343, dtype: object\n", - "ssssssssssssssssssssssssssssssssss344ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough \n", - "5 naked \n", - "6 scared \n", - "7 playthrough \n", - "8 Fatal \n", - "9 Frame \n", - "10 Lets \n", - "11 blind \n", - "12 fatal \n", - "13 frame \n", - "14 II \n", - "15 pewdie \n", - "16 ending \n", - "17 part 1 \n", - "18 Fatal Frame Playthrough part 1\n", - "19 episode \n", - "20 let's \n", - "21 let's play \n", - "22 crimson \n", - "23 butterfly \n", - "24 scary \n", - "25 game \n", - "26 vampire \n", - "27 funny \n", - "28 gameplay \n", - "29 zero \n", - "30 playthrough part \n", - "31 mission \n", - "32 scream \n", - "33 anime \n", - "34 video game \n", - "35 can \n", - "36 basket \n", - "37 kevin \n", - "38 playstation \n", - "39 ps2 \n", - "Name: 344, dtype: object\n", - "ssssssssssssssssssssssssssssssssss345ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough \n", - "5 naked \n", - "6 scared \n", - "7 playthrough \n", - "8 Fatal \n", - "9 Frame \n", - "10 Lets \n", - "11 blind \n", - "12 fatal \n", - "13 frame \n", - "14 II \n", - "15 pewdie \n", - "16 ending \n", - "17 part 1 \n", - "18 Fatal Frame Playthrough part 1\n", - "19 episode \n", - "20 let's \n", - "21 let's play \n", - "22 crimson \n", - "23 butterfly \n", - "24 scary \n", - "25 game \n", - "26 vampire \n", - "27 funny \n", - "28 gameplay \n", - "29 zero \n", - "30 playthrough part \n", - "31 mission \n", - "32 scream \n", - "33 anime \n", - "34 video game \n", - "35 playstation \n", - "36 ps2 \n", - "37 basket \n", - "38 xbox \n", - "39 ps3 \n", - "40 maze \n", - "41 games \n", - "42 weird \n", - "43 creepy \n", - "44 screaming \n", - "Name: 345, dtype: object\n", - "ssssssssssssssssssssssssssssssssss346ssssssssssssssssssssssssssssssssss\n", - "0 Tags:\n", - "Name: 346, dtype: object\n", - "ssssssssssssssssssssssssssssssssss347ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 347, dtype: object\n", - "ssssssssssssssssssssssssssssssssss348ssssssssssssssssssssssssssssssssss\n", - "0 pewdie \n", - "1 xebaz \n", - "2 playing\n", - "3 fear \n", - "Name: 348, dtype: object\n", - "ssssssssssssssssssssssssssssssssss349ssssssssssssssssssssssssssssssssss\n", - "0 pewdie \n", - "1 Xebaz \n", - "2 are \n", - "3 playing\n", - "4 fear \n", - "5 again \n", - "6 and \n", - "7 still \n", - "8 failing\n", - "9 know \n", - "10 you \n", - "11 like \n", - "12 it \n", - "Name: 349, dtype: object\n", - "ssssssssssssssssssssssssssssssssss350ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 Amnesia \n", - "4 custom \n", - "5 story \n", - "6 la \n", - "7 caza \n", - "8 playthrough\n", - "9 walkthrough\n", - "10 walk \n", - "11 through \n", - "12 scary \n", - "13 fun \n", - "14 james \n", - "15 scream \n", - "16 moment \n", - "17 game \n", - "18 scared \n", - "19 horror \n", - "20 movie \n", - "21 gameplay \n", - "22 part 5 \n", - "23 episode 5 \n", - "Name: 350, dtype: object\n", - "ssssssssssssssssssssssssssssssssss351ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 Amnesia \n", - "4 custom \n", - "5 story \n", - "6 la \n", - "7 caza \n", - "8 playthrough\n", - "9 walkthrough\n", - "10 walk \n", - "11 through \n", - "12 scary \n", - "13 fun \n", - "14 james \n", - "15 scream \n", - "16 moment \n", - "17 game \n", - "18 scared \n", - "19 horror \n", - "20 movie \n", - "21 gameplay \n", - "22 part 4 \n", - "23 episode 4 \n", - "Name: 351, dtype: object\n", - "ssssssssssssssssssssssssssssssssss352ssssssssssssssssssssssssssssssssss\n", - "0 Sequence\n", - "1 01 \n", - "2 1 \n", - "Name: 352, dtype: object\n", - "ssssssssssssssssssssssssssssssssss353ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 Amnesia \n", - "4 custom \n", - "5 story \n", - "6 la \n", - "7 caza \n", - "8 playthrough\n", - "9 walkthrough\n", - "10 walk \n", - "11 through \n", - "12 scary \n", - "13 fun \n", - "14 james \n", - "15 scream \n", - "16 moment \n", - "17 game \n", - "18 scared \n", - "19 horror \n", - "20 movie \n", - "21 gameplay \n", - "22 part 2 \n", - "23 episode 2 \n", - "Name: 353, dtype: object\n", - "ssssssssssssssssssssssssssssssssss354ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 amnesia \n", - "4 DLC \n", - "5 Justine \n", - "6 Amnesia \n", - "7 justine \n", - "8 walkthrough\n", - "9 walk \n", - "10 through \n", - "11 pewdiepie \n", - "12 naked \n", - "13 scared \n", - "14 playthrough\n", - "15 the \n", - "16 dark \n", - "17 descent \n", - "18 dlc \n", - "19 100% \n", - "20 scary \n", - "21 funny \n", - "22 moments \n", - "23 moment \n", - "24 top \n", - "25 pewdie \n", - "26 ending \n", - "27 explained \n", - "28 monster \n", - "29 trailer \n", - "30 100 \n", - "31 part 5 \n", - "32 episode 5 \n", - "33 final \n", - "34 last \n", - "35 episode \n", - "36 part \n", - "Name: 354, dtype: object\n", - "ssssssssssssssssssssssssssssssssss355ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 amnesia \n", - "4 DLC \n", - "5 Justine \n", - "6 Amnesia \n", - "7 justine \n", - "8 walkthrough\n", - "9 walk \n", - "10 through \n", - "11 pewdiepie \n", - "12 naked \n", - "13 scared \n", - "14 playthrough\n", - "15 the \n", - "16 dark \n", - "17 descent \n", - "18 dlc \n", - "19 100% \n", - "20 scary \n", - "21 funny \n", - "22 moments \n", - "23 moment \n", - "24 top \n", - "25 pewdie \n", - "26 ending \n", - "27 explained \n", - "28 monster \n", - "29 trailer \n", - "30 100 \n", - "31 part 3 \n", - "32 episode 3 \n", - "Name: 355, dtype: object\n", - "ssssssssssssssssssssssssssssssssss356ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 356, dtype: object\n", - "ssssssssssssssssssssssssssssssssss357ssssssssssssssssssssssssssssssssss\n", - "0 amnesia \n", - "1 fuck \n", - "2 scariest \n", - "3 moment \n", - "4 scary \n", - "5 horrible \n", - "6 scream \n", - "7 like \n", - "8 girl \n", - "9 reaction \n", - "10 funny \n", - "11 screaming \n", - "12 dark \n", - "13 descent \n", - "14 the \n", - "15 commentary \n", - "16 gothic \n", - "17 horror \n", - "18 moments \n", - "19 playthrough\n", - "20 first \n", - "21 lets \n", - "22 play \n", - "23 guide \n", - "24 prank \n", - "25 walkthrough\n", - "26 part \n", - "27 within \n", - "28 screams \n", - "29 subbed \n", - "30 scared \n", - "31 xdddd \n", - "32 turner \n", - "33 screamed \n", - "34 till \n", - "35 straight \n", - "36 tears \n", - "37 spoiler \n", - "38 yep \n", - "39 suiting \n", - "40 laughed \n", - "41 shriek \n", - "42 wheres \n", - "43 lmfao \n", - "44 yelping \n", - "45 upload \n", - "46 toby \n", - "Name: 357, dtype: object\n", - "ssssssssssssssssssssssssssssssssss358ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 358, dtype: object\n" + "Series([], Name: 142, dtype: object)\n" ] } ], @@ -4796,7 +872,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 11, "metadata": {}, "outputs": [ { @@ -4834,106 +910,106 @@ " \n", " \n", " 0\n", - " YOU HAD ONE JOB! - with editor Brad1\n", - " 5,292,299.0\n", - " 385,260.0\n", - " 4,080.0\n", + " PyCharm/IntelliJ fast and auto change of the color theme\n", + " 41.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 29,859.0\n", - " https://www.youtube.com/watch?v=B67OBHNCopk\n", - " [SATIRE, reddit, you had one job, onejob]\n", - " <pandas.io.formats.style.Styler object at 0x7f782f9170b8>\n", + " 2.0\n", + " https://www.youtube.com/embed/SsX9Fl958W0\n", + " https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", + " <pandas.io.formats.style.Styler object at 0x7ff60af976d8>\n", " \n", " \n", " 1\n", - " Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE\n", - " 5,358,149.0\n", - " 378,460.0\n", - " 3,950.0\n", + " How to add weather desklet to Linux Mint 19\n", + " 291.0\n", " 0.0\n", - " 38,075.0\n", - " https://www.youtube.com/watch?v=kLM_9gBZIqY\n", - " [SATIRE]\n", - " <pandas.io.formats.style.Styler object at 0x7f782f9170b8>\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " https://www.youtube.com/embed/-FPY_e0BdJs\n", + " https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", + " <pandas.io.formats.style.Styler object at 0x7ff60af976d8>\n", " \n", " \n", " 2\n", - " We broke another WORLD RECORD!\n", - " 8,557,324.0\n", - " 595,577.0\n", - " 7,899.0\n", + " How to easy integrate Google Calendar to Desktop for Linux Mint\n", + " 226.0\n", + " 1.0\n", + " 0.0\n", " 0.0\n", - " 53,664.0\n", - " https://www.youtube.com/watch?v=d1tAfXKc7-c\n", - " [SATIRE]\n", - " <pandas.io.formats.style.Styler object at 0x7f782f9170b8>\n", + " 0.0\n", + " https://www.youtube.com/embed/2evIujisdD0\n", + " https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", + " <pandas.io.formats.style.Styler object at 0x7ff60af976d8>\n", " \n", " \n", " 3\n", - " FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~\n", - " 3,609,152.0\n", - " 218,517.0\n", - " 3,125.0\n", + " Pandas use a list of values to select rows from a column\n", + " 45.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 17,595.0\n", - " https://www.youtube.com/watch?v=bMLdNrB5hAo\n", - " [SATIRE]\n", - " <pandas.io.formats.style.Styler object at 0x7f782f9170b8>\n", + " 10.0\n", + " https://www.youtube.com/embed/jlSbo5wmTPQ\n", + " https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", + " <pandas.io.formats.style.Styler object at 0x7ff60af976d8>\n", " \n", " \n", " 4\n", - " Don't Laugh Challenge, NEW SEASON!!!!!\n", - " 5,888,349.0\n", - " 569,878.0\n", - " 7,822.0\n", + " Pandas count and percentage by value for a column\n", + " 63.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 29,373.0\n", - " https://www.youtube.com/watch?v=Zgm_iM3f_ME\n", - " [SATIRE]\n", - " <pandas.io.formats.style.Styler object at 0x7f782f9170b8>\n", + " 0.0\n", + " https://www.youtube.com/embed/P5pxJkv71BU\n", + " https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", + " <pandas.io.formats.style.Styler object at 0x7ff60af976d8>\n", " \n", " \n", "\n", "" ], "text/plain": [ - " title Views \\\n", - "0 YOU HAD ONE JOB! - with editor Brad1 5,292,299.0 \n", - "1 Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE 5,358,149.0 \n", - "2 We broke another WORLD RECORD! 8,557,324.0 \n", - "3 FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~ 3,609,152.0 \n", - "4 Don't Laugh Challenge, NEW SEASON!!!!! 5,888,349.0 \n", + " title Views \\\n", + "0 PyCharm/IntelliJ fast and auto change of the color theme 41.0 \n", + "1 How to add weather desklet to Linux Mint 19 291.0 \n", + "2 How to easy integrate Google Calendar to Desktop for Linux Mint 226.0 \n", + "3 Pandas use a list of values to select rows from a column 45.0 \n", + "4 Pandas count and percentage by value for a column 63.0 \n", "\n", - " Like Dislike Favorite Comment \\\n", - "0 385,260.0 4,080.0 0.0 29,859.0 \n", - "1 378,460.0 3,950.0 0.0 38,075.0 \n", - "2 595,577.0 7,899.0 0.0 53,664.0 \n", - "3 218,517.0 3,125.0 0.0 17,595.0 \n", - "4 569,878.0 7,822.0 0.0 29,373.0 \n", + " Like Dislike Favorite Comment \\\n", + "0 0.0 0.0 0.0 2.0 \n", + "1 0.0 0.0 0.0 0.0 \n", + "2 1.0 0.0 0.0 0.0 \n", + "3 3.0 0.0 0.0 10.0 \n", + "4 3.0 0.0 0.0 0.0 \n", "\n", - " videoID \\\n", - "0 https://www.youtube.com/watch?v=B67OBHNCopk \n", - "1 https://www.youtube.com/watch?v=kLM_9gBZIqY \n", - "2 https://www.youtube.com/watch?v=d1tAfXKc7-c \n", - "3 https://www.youtube.com/watch?v=bMLdNrB5hAo \n", - "4 https://www.youtube.com/watch?v=Zgm_iM3f_ME \n", + " videoID \\\n", + "0 https://www.youtube.com/embed/SsX9Fl958W0 \n", + "1 https://www.youtube.com/embed/-FPY_e0BdJs \n", + "2 https://www.youtube.com/embed/2evIujisdD0 \n", + "3 https://www.youtube.com/embed/jlSbo5wmTPQ \n", + "4 https://www.youtube.com/embed/P5pxJkv71BU \n", "\n", - " tags \\\n", - "0 [SATIRE, reddit, you had one job, onejob] \n", - "1 [SATIRE] \n", - "2 [SATIRE] \n", - "3 [SATIRE] \n", - "4 [SATIRE] \n", + " tags \\\n", + "0 https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg \n", + "1 https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg \n", + "2 https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg \n", + "3 https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg \n", + "4 https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg \n", "\n", " nameurl \n", - "0 \n", - "1 \n", - "2 \n", - "3 \n", - "4 " + "0 \n", + "1 \n", + "2 \n", + "3 \n", + "4 " ] }, - "execution_count": 5, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } @@ -4956,7 +1032,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 12, "metadata": { "scrolled": false }, @@ -4982,63 +1058,63 @@ " \n", " \n", " 0\n", - " YOU HAD ONE JOB! - with editor Brad1\n", - " 5,292,299.0\n", - " 385,260.0\n", - " 4,080.0\n", + " PyCharm/IntelliJ fast and auto change of the color theme\n", + " 41.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 29,859.0\n", - " https://www.youtube.com/watch?v=B67OBHNCopk\n", - " [SATIRE, reddit, you had one job, onejob]\n", - " XXXXX\n", + " 2.0\n", + " https://www.youtube.com/embed/SsX9Fl958W0\n", + " https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", " 1\n", - " Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE\n", - " 5,358,149.0\n", - " 378,460.0\n", - " 3,950.0\n", + " How to add weather desklet to Linux Mint 19\n", + " 291.0\n", " 0.0\n", - " 38,075.0\n", - " https://www.youtube.com/watch?v=kLM_9gBZIqY\n", - " [SATIRE]\n", - " XXXXX\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " https://www.youtube.com/embed/-FPY_e0BdJs\n", + " https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", " 2\n", - " We broke another WORLD RECORD!\n", - " 8,557,324.0\n", - " 595,577.0\n", - " 7,899.0\n", + " How to easy integrate Google Calendar to Desktop for Linux Mint\n", + " 226.0\n", + " 1.0\n", + " 0.0\n", " 0.0\n", - " 53,664.0\n", - " https://www.youtube.com/watch?v=d1tAfXKc7-c\n", - " [SATIRE]\n", - " XXXXX\n", + " 0.0\n", + " https://www.youtube.com/embed/2evIujisdD0\n", + " https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", " 3\n", - " FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~\n", - " 3,609,152.0\n", - " 218,517.0\n", - " 3,125.0\n", + " Pandas use a list of values to select rows from a column\n", + " 45.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 17,595.0\n", - " https://www.youtube.com/watch?v=bMLdNrB5hAo\n", - " [SATIRE]\n", - " XXXXX\n", + " 10.0\n", + " https://www.youtube.com/embed/jlSbo5wmTPQ\n", + " https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", " 4\n", - " Don't Laugh Challenge, NEW SEASON!!!!!\n", - " 5,888,349.0\n", - " 569,878.0\n", - " 7,822.0\n", + " Pandas count and percentage by value for a column\n", + " 63.0\n", + " 3.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 29,373.0\n", - " https://www.youtube.com/watch?v=Zgm_iM3f_ME\n", - " [SATIRE]\n", - " XXXXX\n", + " https://www.youtube.com/embed/P5pxJkv71BU\n", + " https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", "" @@ -5047,7 +1123,7 @@ "" ] }, - "execution_count": 6, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } @@ -5061,7 +1137,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 13, "metadata": {}, "outputs": [ { @@ -5084,244 +1160,244 @@ " \n", " \n", " \n", - " 77\n", - " bitch lasagna\n", - " 124,994,006.0\n", - " 6,176,065.0\n", - " 648,864.0\n", + " 91\n", + " No Python Interpreter Configured For The Module - PyCharm/IntelliJ\n", + " 11,367.0\n", + " 27.0\n", + " 20.0\n", " 0.0\n", - " 924,648.0\n", - " https://www.youtube.com/watch?v=6Dh-RL__uN4\n", - " [SATIRE, tseries, t series, diss, track, pewdiepie, song, rap, mixtape, disstrack, diss track, bitch lasagna]\n", - " XXXXX\n", + " 8.0\n", + " https://www.youtube.com/embed/mkKDI6y2kyE\n", + " https://i.ytimg.com/vi/mkKDI6y2kyE/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 263\n", - " THE RUBY PLAYBUTTON / YouTube 50 Mil Sub Reward Unbox\n", - " 61,378,839.0\n", - " 4,311,930.0\n", - " 145,857.0\n", + " 124\n", + " python extract text from image or pdf\n", + " 6,229.0\n", + " 16.0\n", + " 29.0\n", " 0.0\n", - " 609,535.0\n", - " https://www.youtube.com/watch?v=7Vj5M0qKh8g\n", - " [pewdiepie, pewds, pdp, pewdie]\n", - " XXXXX\n", + " 11.0\n", + " https://www.youtube.com/embed/PK-GvWWQ03g\n", + " https://i.ytimg.com/vi/PK-GvWWQ03g/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 33\n", - " YouTube Rewind 2018 but it's actually good\n", - " 47,979,866.0\n", - " 7,776,590.0\n", - " 79,097.0\n", + " 23\n", + " apex legends game requires directx 11 feature video card\n", + " 5,690.0\n", + " 36.0\n", + " 10.0\n", " 0.0\n", - " 705,084.0\n", - " https://www.youtube.com/watch?v=By_Cn5ixYLg\n", - " [rewind 2018, youtube rewind 2018]\n", - " XXXXX\n", + " 9.0\n", + " https://www.youtube.com/embed/NbvHU_KoD74\n", + " https://i.ytimg.com/vi/NbvHU_KoD74/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 309\n", - " GAME BANNED FROM KIDS? - Talking Angela\n", - " 37,174,431.0\n", - " 575,115.0\n", - " 16,369.0\n", + " 46\n", + " Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2\n", + " 5,397.0\n", + " 62.0\n", + " 2.0\n", " 0.0\n", - " 64,433.0\n", - " https://www.youtube.com/watch?v=pzYxlKSgxh0\n", - " [pewdiepie, pewdie, pewds, let's play, playthrough, walkthrough, play, walk, through, walk through, video games]\n", - " XXXXX\n", + " 26.0\n", + " https://www.youtube.com/embed/702lkQbZx50\n", + " https://i.ytimg.com/vi/702lkQbZx50/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 229\n", - " JAKE PAUL\n", - " 36,792,100.0\n", - " 1,832,490.0\n", - " 144,973.0\n", + " 134\n", + " ubuntu 16 04 server install headless google chrome\n", + " 4,468.0\n", + " 24.0\n", + " 6.0\n", " 0.0\n", - " 269,260.0\n", - " https://www.youtube.com/watch?v=TuIcBPm90aM\n", - " [pewdiepie, jake, paul, it's, everyday, bro, react, react world, fine]\n", - " XXXXX\n", + " 5.0\n", + " https://www.youtube.com/embed/t3ppxtEU6No\n", + " https://i.ytimg.com/vi/t3ppxtEU6No/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 267\n", - " DELETING MY CHANNEL\n", - " 35,035,463.0\n", - " 1,728,372.0\n", - " 261,139.0\n", + " 125\n", + " mysql 5 7 vs mysql 8 do you need to upgrade to mysql 8\n", + " 4,391.0\n", + " 12.0\n", + " 18.0\n", " 0.0\n", - " 220,740.0\n", - " https://www.youtube.com/watch?v=Y39LE5ZoKjw\n", - " [pewdiepie, pewds, pdp, pewdie, deleting, channel]\n", - " XXXXX\n", + " 9.0\n", + " https://www.youtube.com/embed/vHab6BNrHU8\n", + " https://i.ytimg.com/vi/vHab6BNrHU8/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 257\n", - " SHOOTING MY 50 MILLION AWARD!\n", - " 30,554,862.0\n", - " 1,110,375.0\n", - " 131,648.0\n", + " 116\n", + " Python read validate and import CSV JSON file to MySQL\n", + " 3,513.0\n", + " 12.0\n", + " 1.0\n", " 0.0\n", - " 106,113.0\n", - " https://www.youtube.com/watch?v=Jrvfoybj98Q\n", - " [pewdiepie, pewds, pdp, pewdie]\n", - " XXXXX\n", + " 6.0\n", + " https://www.youtube.com/embed/WbW0rHCX2UU\n", + " https://i.ytimg.com/vi/WbW0rHCX2UU/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 282\n", - " THE DIAMOND PLAY BUTTON!! (Part 1)\n", - " 29,833,868.0\n", - " 1,254,868.0\n", - " 43,421.0\n", + " 68\n", + " How to add annotations in new Youtube studio\n", + " 3,495.0\n", + " 21.0\n", + " 24.0\n", " 0.0\n", - " 120,324.0\n", - " https://www.youtube.com/watch?v=VY4wCi1pPkU\n", - " [pewdiepie, pewds, pdp, diamond, play button, playbutton, youtube, unboxing]\n", - " XXXXX\n", + " 6.0\n", + " https://www.youtube.com/embed/UcvCdFfI3bs\n", + " https://i.ytimg.com/vi/UcvCdFfI3bs/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 45\n", - " YouTube Rewind 2018 review\n", - " 27,723,233.0\n", - " 2,213,948.0\n", - " 95,125.0\n", + " 32\n", + " Apex Legends MSVCP140.dll Is Missing Fix, MSVCP120.dll Is Missing, not starting\n", + " 2,358.0\n", + " 14.0\n", + " 2.0\n", " 0.0\n", - " 138,585.0\n", - " https://www.youtube.com/watch?v=wYT1Qq6mo4I\n", - " [SATIRE, youtube, rewind, meme, yea, review]\n", - " XXXXX\n", + " 11.0\n", + " https://www.youtube.com/embed/ftGiBv3LL_A\n", + " https://i.ytimg.com/vi/ftGiBv3LL_A/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 163\n", - " PewDiePie Hej Monika Remix by Party In Backyard\n", - " 26,513,160.0\n", - " 951,974.0\n", - " 20,537.0\n", + " 13\n", + " Install latest NVIDIA drivers for Linux Mint 19/Ubuntu 18.04\n", + " 1,728.0\n", + " 13.0\n", + " 0.0\n", " 0.0\n", - " 140,487.0\n", - " https://www.youtube.com/watch?v=Vk8UEWHYfEg\n", - " [party in backyard, hej monika, monika, monica, song, pewdiepie, sing, singing]\n", - " XXXXX\n", + " 6.0\n", + " https://www.youtube.com/embed/CA6lyOmfRbM\n", + " https://i.ytimg.com/vi/CA6lyOmfRbM/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 311\n", - " The Impossible Quiz.\n", - " 26,013,637.0\n", - " 519,621.0\n", - " 7,816.0\n", + " 80\n", + " Simple ways to create shortcut in Linux Mint 19\n", + " 1,652.0\n", + " 9.0\n", + " 2.0\n", " 0.0\n", - " 39,587.0\n", - " https://www.youtube.com/watch?v=rOZ0OHaPmnk\n", - " [pewdiepie, pewdie, pewds, let's play, playthrough, walkthrough, play, walk, through, walk through, video games, the imossible, quiz, question, questions, funny, reaction, the impossible quiz, all answers, answers, cheat]\n", - " XXXXX\n", + " 6.0\n", + " https://www.youtube.com/embed/nOlH-P8-5PI\n", + " https://i.ytimg.com/vi/nOlH-P8-5PI/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 219\n", - " THE MOST ANNOYING SOUND IN THE WORLD!\n", - " 25,912,961.0\n", - " 867,935.0\n", - " 22,510.0\n", + " 52\n", + " linux mint disable login keyring\n", + " 1,592.0\n", + " 8.0\n", + " 1.0\n", " 0.0\n", - " 76,637.0\n", - " https://www.youtube.com/watch?v=baylWdHClNE\n", - " [5 weird, stuff, 5 weird stuff online]\n", - " XXXXX\n", + " 11.0\n", + " https://www.youtube.com/embed/dAKyi8aFq3Y\n", + " https://i.ytimg.com/vi/dAKyi8aFq3Y/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 264\n", - " WHO'S MORE LIKELY TO...?\n", - " 24,147,214.0\n", - " 838,813.0\n", - " 11,568.0\n", + " 81\n", + " The simplest way to run python headless test with Chrome on Ubuntu\n", + " 1,077.0\n", + " 8.0\n", + " 0.0\n", " 0.0\n", - " 79,185.0\n", - " https://www.youtube.com/watch?v=jA0xR2Ho9UU\n", - " [pewdiepie, pewds, pdp, pewdie, who is, more likely, markiplier, jacksepticeye, who is more likely, most likely]\n", - " XXXXX\n", + " 2.0\n", + " https://www.youtube.com/embed/BdppFIT_lIs\n", + " https://i.ytimg.com/vi/BdppFIT_lIs/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 265\n", - " BOTTLEFLIP CHALLENGE!\n", - " 23,462,006.0\n", - " 879,230.0\n", - " 14,349.0\n", + " 122\n", + " java benchmarks examples\n", + " 922.0\n", + " 5.0\n", + " 3.0\n", " 0.0\n", - " 75,539.0\n", - " https://www.youtube.com/watch?v=lyl6ibqnyis\n", - " [pewdiepie, pewds, pdp, pewdie, Bottleflip, Challenge, Bottle, Dab, Meme, Jacksepticeye]\n", - " XXXXX\n", + " 0.0\n", + " https://www.youtube.com/embed/m3Xf1ra2Ekg\n", + " https://i.ytimg.com/vi/m3Xf1ra2Ekg/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 225\n", - " THE RICH LIFE OF PEWDIEPIE\n", - " 20,579,289.0\n", - " 728,175.0\n", - " 22,673.0\n", + " 76\n", + " Easy way to convert dictionary to SQL insert with Python\n", + " 864.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 42,467.0\n", - " https://www.youtube.com/watch?v=GP9egt__qeI\n", - " [pewdiepie, the rich life of pewdiepie, before he was famous, before he was famous pewdiepie, pewdiepie rich, pewdiepie net worth, how much money does pewdiepie make, how much money, youtube money, money, net worth, networth, rich, rich life, the rich life]\n", - " XXXXX\n", + " 0.0\n", + " https://www.youtube.com/embed/hUXGQwTSfMs\n", + " https://i.ytimg.com/vi/hUXGQwTSfMs/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 41\n", - " Bitch Lasagna v1.2\n", - " 19,952,287.0\n", - " 1,758,301.0\n", - " 69,186.0\n", + " 14\n", + " Linux Mint identify, fix sound problems, set default device\n", + " 859.0\n", + " 4.0\n", + " 0.0\n", " 0.0\n", - " 152,529.0\n", - " https://www.youtube.com/watch?v=PX5QgITQAwk\n", - " [SATIRE]\n", - " XXXXX\n", + " 1.0\n", + " https://www.youtube.com/embed/PIAzK1rvqIY\n", + " https://i.ytimg.com/vi/PIAzK1rvqIY/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 248\n", - " TRY NOT TO LAUGH CHALLENGE #09 {Important Videos Edition}\n", - " 16,337,867.0\n", - " 591,088.0\n", - " 12,407.0\n", + " 71\n", + " python performance profiling in pycharm\n", + " 825.0\n", + " 0.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 52,244.0\n", - " https://www.youtube.com/watch?v=IBhgOkorEZ4\n", - " [pewdiepie, pewds, pdp, pewdie, try not, to, laugh, try not to laugh, dont laugh, challenge]\n", - " XXXXX\n", + " https://www.youtube.com/embed/EZ-im7m8630\n", + " https://i.ytimg.com/vi/EZ-im7m8630/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 164\n", - " Trap Adventure 2 - WHO MADE THIS GAME AND WHY 😡😡? ! \" 🤰😡 - #001\n", - " 16,329,824.0\n", - " 725,795.0\n", - " 17,000.0\n", + " 70\n", + " Python Cumulative Sum per Group with Pandas\n", + " 801.0\n", + " 5.0\n", " 0.0\n", - " 39,152.0\n", - " https://www.youtube.com/watch?v=C1ObitoLwhM\n", - " [pewdiepie, trap adventure 2, rage, quit, game, videogame, trap, adventure, free download, link, trap adventure download, trap adventure 2 download, trap adventure 2 free download]\n", - " XXXXX\n", + " 0.0\n", + " 1.0\n", + " https://www.youtube.com/embed/1tCbvYv_ibw\n", + " https://i.ytimg.com/vi/1tCbvYv_ibw/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 260\n", - " DOES HE MAKE IT?\n", - " 16,282,855.0\n", - " 618,113.0\n", - " 11,354.0\n", + " 50\n", + " Linux Mint 19 How to change user password\n", + " 735.0\n", + " 6.0\n", + " 0.0\n", " 0.0\n", - " 34,363.0\n", - " https://www.youtube.com/watch?v=EfnDkNpXDBk\n", - " [pewdiepie, pewds, pdp, pewdie, to be continued, meme, compilation, continue, jojo, jojos, bizarre, adventure]\n", - " XXXXX\n", + " 2.0\n", + " https://www.youtube.com/embed/Odog86JslbA\n", + " https://i.ytimg.com/vi/Odog86JslbA/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 52\n", - " Bich Lasagna V2. - Beat Saber / PART 1\n", - " 15,950,359.0\n", - " 1,005,972.0\n", - " 35,975.0\n", + " 21\n", + " play fortnite linux virtual machine\n", + " 532.0\n", + " 2.0\n", + " 1.0\n", " 0.0\n", - " 69,851.0\n", - " https://www.youtube.com/watch?v=2kpR0BdouNE\n", - " [SATIRE, beat, saber, vr, gameplay]\n", - " XXXXX\n", + " 1.0\n", + " https://www.youtube.com/embed/t_DI7NbjcFs\n", + " https://i.ytimg.com/vi/t_DI7NbjcFs/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", "" @@ -5330,7 +1406,7 @@ "" ] }, - "execution_count": 7, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -5343,22 +1419,22 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "" + "" ] }, - "execution_count": 8, + "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { - "image/png": "\n", + "image/png": "\n", "text/plain": [ "
" ] @@ -5375,22 +1451,22 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "" + "" ] }, - "execution_count": 9, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEDCAYAAADOc0QpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAIABJREFUeJzt3Xd4VHX2x/H3SSOUJJSEUEJvAemEIiU0qSqIooJrw8LaNe666q6rrru/dYtSRRQVEXVFxYbSJZGEKqGXTELooWQCoYaQ+v39kUEjJiTCJHfKeT0PD8m9l8yZ+4wfT75z5x4xxqCUUsqz+FhdgFJKKefTcFdKKQ+k4a6UUh5Iw10ppTyQhrtSSnkgDXellPJAloa7iMwWEbuI7CjHsY1FJE5ENovINhEZWRk1KqWUO7K6c58DDC/nsS8AnxljugDjgDcrqiillHJ3loa7MSYeyCy+TURaiMgSEdkoIgkiEnnxcCDY8XUIcKQSS1VKKbfiZ3UBJZgFPGSM2S0iPSnq0AcBLwPLRORxoDpwnXUlKqWUa3OpcBeRGkBv4HMRubi5iuPv8cAcY8zrInIt8KGItDfGFFpQqlJKuTSXCneKlolOGWM6l7Dvfhzr88aYtSISCIQC9kqsTyml3ILVb6j+gjHmDLBPRG4FkCKdHLsPAoMd29sCgUCGJYUqpZSLEyvvCikinwADKOrA04GXgFhgJlAf8AfmGWNeEZF2wDtADYreXP2TMWaZFXUrpZSrszTclVJKVQyXWpZRSinlHJa9oRoaGmqaNm1q1cMrpZRb2rhx43FjTFhZx1kW7k2bNiUxMdGqh1dKKbckIgfKc5wuyyillAfScFdKKQ+k4a6UUh7I1T6hqpRSAOTl5ZGWlsaFCxesLsUSgYGBRERE4O/vf0X/XsNdKeWS0tLSCAoKomnTphS715RXMMZw4sQJ0tLSaNas2RX9DF2WUUq5pAsXLlCnTh2vC3YAEaFOnTpX9VuLhrtSymV5Y7BfdLXPXcPdRRlj+HJTGgdPnLe6FKWUG9Jwd1FbDp3i6c+2MmxKPHNW76OwUO8BpFRlGjhwIEuXLv3FtilTpjBhwgTGjh1rUVXlp+HuomJtdnwEoprW4uVvdzFu1jr2H8+yuiylvMb48eOZN2/eL7bNmzePCRMmMH/+fIuqKj8NdxcVa7PTrUkt5t7Xg9du7UTSsTMMnxrPe6v2UaBdvFIVbuzYsSxcuJDc3FwA9u/fz5EjR2jUqBHt27cHoKCggGeeeYbu3bvTsWNH3n77bQAeffRRFixYAMCYMWO47777AJg9ezZ/+ctfyMrK4vrrr6dTp060b9+eTz/91On166WQLujY6QvsPHKGPw1vg4gwtlsEfVuG8uevtvP373axePtR/jO2I83DalhdqlKV4m/f7mTXkTNO/ZntGgTz0o3XlLq/du3a9OjRg8WLFzN69GjmzZvHbbfd9os3Ot977z1CQkLYsGEDOTk59OnTh6FDh9KvXz8SEhIYNWoUhw8f5ujRowAkJCQwbtw4lixZQoMGDVi4cCEAp0+fdupzA+3cXVJcctHkwMGR4T9tqxcSyHv3RDHptk6kpJ9lxNQE3onfq128UhWo+NLMvHnzGD9+/C/2L1u2jLlz59K5c2d69uzJiRMn2L1790/hvmvXLtq1a0d4eDhHjx5l7dq19O7dmw4dOrB8+XKeffZZEhISCAkJcXrt2rm7oFibnYY1q9I6/JeduYhwc9eLXfwO/m9REot2HOW/YzvRsq528cpzXa7DrkijR48mJiaGTZs2cf78ebp168b+/ft/2m+MYfr06QwbNuxX//bUqVMsWbKE6OhoMjMz+eyzz6hRowZBQUEEBQWxadMmFi1axAsvvMDgwYN58cUXnVq7du4u5kJeAat2H2dQZN1Sr3OtGxzIO3d3Y+q4zuw7nsXIaQm8vXKPdvFKOVmNGjUYOHAg991336+6doBhw4Yxc+ZM8vLyAEhJSSErq+jCh169ejFlyhSio6Pp168fr732Gv369QPgyJEjVKtWjTvvvJNnnnmGTZs2Ob127dxdzPp9mWTnFTAosu5ljxMRRnduyLUt6vDCVzt4dbGNxTuO8dqtHWlZN6iSqlXK840fP54xY8b86soZgAceeID9+/fTtWtXjDGEhYXx9ddfA9CvXz+WLVtGy5YtadKkCZmZmT+F+/bt23nmmWfw8fHB39+fmTNnOr1uy2aoRkVFGR3W8WsvL9jJvA0H2fLiUAL9fcv1b4wxfLvtKC99s4Os3AJirmvNg/2a4eerv5gp95WUlETbtm2tLsNSJZ0DEdlojIkq69/qf/0uxBjDCls6fVqEljvYoaiLH9WpActi+jM4si7/XmLjlplrSEk/W4HVKqVcWZnhLiKzRcQuIjtK2f87EdkmIttFZI2IdHJ+md5hT8Y5DmVmM7CMJZnShAVVYead3ZhxR1cOnczmhmmrmBGXSn5BoZMrVUq5uvJ07nOA4ZfZvw/ob4zpAPwdmOWEurzSiqSiSyDLWm8vy/Ud67M8Jpoh14Tz36XJjHlzDbZjzr1GWKnKYNWysSu42udeZrgbY+KBzMvsX2OMOen4dh0QcVUVebFYm53IekE0qFn1qn9WnRpVmHFHV978XVeOnMrmxumrmLZiN3naxSs3ERgYyIkTJ7wy4C/ezz0wMPCKf4azr5a5H1hc2k4RmQhMBGjcuLGTH9q9nc7OI/HASX4f3dypP3dkh/r0al6HlxbsZNLyFJbuPMZ/x3aiXYNgpz6OUs4WERFBWloaGRkZVpdiiYuTmK6U08JdRAZSFO59SzvGGDMLx7JNVFSU9/3v+DLiUzIoKDQMbnt1SzIlqV09gOnju3B9h/q88PUORr2xiscGteSRAS0J8NP31JVr8vf3v+IpRMpJV8uISEfgXWC0MeaEM36mt4mz2alVzZ/OjWpV2GMMb1+P5THR3NCxPlO+383oGavZcdj597RQSlnvqsNdRBoDXwJ3GWNSrr4k71NQaIhLtjOgTV18fSp28kyt6gFMGdeFWXd14/i5HG6asZpJy5LJzde1eKU8SZnLMiLyCTAACBWRNOAlwB/AGPMW8CJQB3jT8XH5/PJcYK9+tuXQKU6ez7viSyCvxNBr6tGjWW1e+W4X02JTWbYrnddu7UT7hs6/gZFSqvKVGe7GmF/fUOGX+x8AHnBaRV4ozmbH10fo3yqsUh+3ZrUAJt3Wmes71OfPX21n9IzVPNy/BY8PbkkVv/J/iEop5Xr03TQXsMIxmCOkmr8ljz+4bTjLnurPmC4NeSMulRunr2Jb2ilLalFKOYeGu8WOns4m6eiZq/7g0tUKqebPa7d24v0J3TmTnc+YN9fw7yU2LuQVWFqXUurKaLhbLNZ2cTCHteF+0cA2dVn2dDRju0Yw84c93DB9FZsPniz7HyqlXIqGu8XibHYialV1qWEbwYH+/HtsRz64rwdZOfncMnMNry5O0i5eKTei4W6hC3kFrE49cdnBHFbq3zqMpTHR3N69EW+v3Mv10xLYeEC7eKXcgYa7hdbuPVGuwRxWCg7059WbOzL3vh5cyCtk7Ftr+L+Fu7SLV8rFabhbKM5mp6q/L72a17G6lDJFtw5jyVP9uKNHY95J2MfIqQkk7i/1fnJKKYtpuFvEGMOKJDt9Wv62wRxWCgr05//GdODjB3qSk1/IrW+v5e/f7SI7V7t4pVyNhrtFdtvPcfhUtksvyZSmT8tQlsZEc2fPJry3ah8jpsbz4z7t4pVyJRruFrl4CaQ7hjtAjSp+/P2m9vzvwZ4UGMPts9by8oKdnM/Nt7o0pRQa7paJTbLTrn4w9UKu/Gb8rqB3i1CWPBnNPdc2Zc6a/QyfksC6vXpjUKWspuFugVPnc9l48KTbdu2Xql7Fj5dHXcO8ib0QgXGz1vHSNzvIytEuXimraLhbYKVjMMegChjMYaVezeuw+Ml+TOjTlLnrDjB8ajxr9hy3uiylvJKGuwXibHZqVw+gU0RNq0txumoBfrx04zV8OvFafEW44531vPD1ds5pF69UpdJwr2QFhYYfUjIY0CaswgdzWKlHs9osfjKa+/s24+P1Bxk2OZ7VqdrFK1VZNNwr2eaDJzl1Ps9j1tsvp2qAL3+9oR3zH7qWKn4+/O7d9fz5q+2cvZBndWlKeTwN90oW6xjM0a+SB3NYqVuT2ix6sh8To5sz78eiLj4+xTsn2itVWcoMdxGZLSJ2EdlRyv5IEVkrIjki8kfnl+hZYm12ujetRUhVawZzWCXQ35c/j2zL/Id7UzXAl7tn/8hzX2zjjHbxSlWI8nTuc4Dhl9mfCTwBvOaMgjzZ4VPZ2I6d9YolmdJ0bVyLhU/046H+Lfgs8RDDJsfzQ7Ld6rKU8jhlhrsxJp6iAC9tv90YswHQFqwMP38qNdziSqwV6O/LcyMi+fKRPtSo4se972/gT/O3cjpbX0JKOUulrrmLyEQRSRSRxIwM71tzjbPZaVy7Gi3Cqltdikvo3Kgm3z7el0cGtOCLTYcZNjmeOJt28Uo5Q6WGuzFmljEmyhgTFRbmPW8oAmTnFrA69bjLDuawSqC/L38aHslXj/QmpKo/E+Zs4A+fbeX0ee3ilboaerVMJVm79zg5+YVevd5+OR0jarLg8T48PqglX285zJDJK/l+V7rVZSnltjTcK0mszU61AF96Nq9tdSkuq4qfL38Y2oavH+lD7eoBPDA3kac/3cKp87lWl6aU2ynPpZCfAGuBNiKSJiL3i8hDIvKQY389EUkDngZecBwTXLFluxdjDLFJdvq2DKWKn3sM5rBSh4gQFjzWlycGt2LB1iMMmRzPsp3HrC5LKbfiV9YBxpjxZew/BkQ4rSIPlJx+liOnL/DE4FZWl+I2Avx8eHpIa4a2C+eZ+duY+OFGRnduwMs3XkOt6gFWl6eUy9NlmUpw8RLIgbre/pu1bxjCN4/2Iea61izcdpQhk1eyZId28UqVRcO9EsQm2WnfMJjwYPcezGGVAD8fnryuFQse60t4cCAPfbSRxz/ZTGaWrsUrVRoN9wp2MiuXTQdPMqiNdu1Xq12DYL5+tA9/GNKaJTuOMmTSShZtP2p1WUq5JA33CrYyJYNCA4PaevenUp3F39eHxwe34tvH+9KgZlUe+XgTj368iePncqwuTSmXouFewWJtdkJrBNCxYYjVpXiUyHrBfPVIb54Z1oblu9IZOjme77YdwRhjdWlKuQQN9wqUX1DIypQMBrSpi48HD+awip+vD48ObMl3T/SlUa2qPPa/zTzy8SYyzmoXr5SGewXadPAUp7O9YzCHlVqHB/HFw715dngkK5LsDJ28km+2HNYuXnk1DfcKFGuz4+cj9G0VanUpHs/P14eHB7Rg4RN9aVKnOk/O28LvP9yI/ewFq0tTyhIa7hUo1pZOj2a1CQ70rsEcVmrl6OL/PDKSH1IyGDIpnq83axevvI+GewVJO3melPRzuiRjAV8fYWJ0CxY90Y8WYdV56tMtPDg3kfQz2sUr76HhXkHifhrMoeFulZZ1a/D5Q7154fq2JOw+zpBJK/liY5p28coraLhXkBU2O03rVKN5WA2rS/Fqvj7CA/2as/jJfrQOD+IPn2/l/g8SOXZau3jl2TTcK8D53HzW7Dmh95JxIc3DavDp76/lxRvasWbPcYZMXsnniYe0i1ceS8O9AqxJPUFufiGDvXxWqqvx9RHu69uMJU9G07ZeMM/M38aEORs4ejrb6tKUcjoN9woQm2yneoAvPZrpYA5X1DS0OvMm9uJvo65h/d5Mhk6K59MNB7WLVx5Fw93JjDHE2ez0axVGgJ+eXlfl4yPc07spS5+K5pqGwTz7xXbunv0jh09pF688Q3kmMc0WEbuI7Chlv4jINBFJFZFtItLV+WW6j6SjZzl6+oJeJeMmGtepxv8e6MXfR1/DxgMnGTY5nk9+1C5eub/ytJZzgOGX2T8CaOX4MxGYefVlua+45KJLIAdEhllciSovHx/hrmuLuvgODUN4/svt3PXej6SdPG91aUpdsTLD3RgTD2Re5pDRwFxTZB1QU0TqO6tAd7MiKZ2OESHUDdLBHO6mUe1qfPxAT/5xU3s2Hyzq4j9ad4DCQu3ilftxxqJwQ+BQse/THNt+RUQmikiiiCRmZGQ44aFdS2ZWLpsPnWKgDuZwWz4+wp29mrA0JpoujWvxwtc7uPO99RzK1C5euZdKfcfPGDPLGBNljIkKC/O8ZYuVKXaMgcFtNdzdXUStanx4fw9evbkD29JOM2xKPB+u3a9dvHIbzgj3w0CjYt9HOLZ5nRVJdkJrVKF9Ax3M4QlEhPE9GrM0JppuTWrx1292cse76zh4Qrt45fqcEe4LgLsdV830Ak4bY7xusGVeQSHxKRkMbBOmgzk8TMOaVZl7Xw/+c0tHdh4+w7Ap8cxZvU+7eOXSynMp5CfAWqCNiKSJyP0i8pCIPOQ4ZBGwF0gF3gEeqbBqXdjGAyc5cyFfl2Q8lIhwW/dGLHs6mp7Na/Pyt7sY98469h/Psro0pUrkV9YBxpjxZew3wKNOq8hNxdns+PsKfVt53nsJ6mf1Q6ry/r3dmb8xjVe+28XwqfH8aVgk9/Zuqr+xKZeiH6F0khU2Oz2b1aFGlTL/f6ncnIhwa1Qjlsf0p3eLUF75bhe3vb2WvRnnrC5NqZ9ouDvBoczzpNrP6V0gvUy9kEDeuyeK12/tREr6WUZMTeDdhL0U6Fq8cgEa7k4Q6xjMMVjD3euICLd0i2D50/3p1yqUfyxM4ta31rBHu3hlMQ13J1hhs9M8tDpNQ6tbXYqySHhwIO/cHcWU2zuzJyOLEVMTeHvlHu3ilWU03K9SVk4+63Qwh6Koi7+pS0OWPx3NgNZhvLrYxi0z15BqP2t1acoLabhfpdWpx8ktKNQlGfWTukGBvH1XN6aN78KBE1mMnLaKmT/sIb+g0OrSlBfRcL9Kccl2alTxI6qpDuZQPxMRRnVqwLKY/gxqU5d/Lynq4lPStYtXlUPD/SoYY4i12YluHaqDOVSJwoKqMPPOrrxxRxcOnczmhmmrmBGXql28qnCaSFdh55EzpJ/J0btAqssSEW7o2IDlMdEMuSac/y5NZsyba7AdO2N1acqDabhfhTjHJZADNNxVOdSpUYUZd3Tlzd915cipbG6cvorpK3aTp128qgAa7ldhhc1Op0Y1CQuqYnUpyo2M7FCf5U/3Z3j7+ry+PIWbZqwm6ah28cq5NNyv0IlzOWxNO8Ug7drVFahdPYDp47vw1p1dST9zgRunr2LK9ynk5msXr5xDw/0K/ZCcoYM51FUb3r4+y2P6c33H+kz5fjejZ6xm55HTVpelPICG+xWKtdmpG1SFaxoEW12KcnO1qgcwdVwXZt3VjePnchj9xmomLdcuXl0dDfcr8PNgjrqI6G1elXMMvaYey2OiGdWpAdNW7GbUG6vYcVi7eHVlNNyvwIb9mZzNyWeQLskoJ6tZLYBJt3fm3bujyMzKZfSM1by+LJmc/AKrS1NuRsP9CsTZ7AT4+tC3ZajVpSgPdV27cJbH9GdMl4ZMj01l1PTVbEs7ZXVZyo2UK9xFZLiIJItIqog8V8L+JiKyQkS2icgPIhLh/FJdR6zNTs/mtamugzlUBQqp5s9rt3bi/Xu7czo7jzFvruE/S2zaxatyKc8MVV9gBjACaAeMF5F2lxz2GjDXGNMReAV41dmFuooDJ7LYk5HFIL1RmKokAyPrsjQmmlu6NuTNH/Zww7RVbDmkXby6vPJ07j2AVGPMXmNMLjAPGH3JMe2AWMfXcSXs9xgXB3NouKvKFFLVn/+M7cScCd05l5PPzW+u5tXFSVzI0y5elaw84d4QOFTs+zTHtuK2Ajc7vh4DBIlInUt/kIhMFJFEEUnMyMi4knotF2uz0yKsOk3q6GAOVfkGtCnq4m+LasTbK/dy/bQENh08aXVZygU56w3VPwL9RWQz0B84DPyqpTDGzDLGRBljosLCwpz00JUnKyef9XsztWtXlgoO9Odft3Rk7n09yM4tYOzMNfxzkXbx6pfKE+6HgUbFvo9wbPuJMeaIMeZmY0wX4C+ObR63KLjKMZhjUGS41aUoRXTrMJbGRDOuR2Nmxe9l5NQENh7ItLos5SLKE+4bgFYi0kxEAoBxwILiB4hIqIhc/FnPA7OdW6ZriE2yExToR1TTWlaXohQAQYH+/HNMBz66vyc5+YWMfWstf/9uF9m52sV7uzLD3RiTDzwGLAWSgM+MMTtF5BURGeU4bACQLCIpQDjwfxVUr2UKCw1xyXaiW4fh76sfD1CupW+rUJbGRHNnzya8t2ofI6bGs2G/dvHeTIyxZjp7VFSUSUxMtOSxr8T2tNPc+MYqXr+1E7d08+jL+JWbW7PnOM9+sY20k9nc27spzwxrQ7UA/UyGpxCRjcaYqLKO0xa0nGJtdkRgQBv3eyNYeZfeLUJZ8mQ0d/dqwvur9zNiagLr956wuixVyTTcyyk22U7nRjWpU0MHcyjXV72KH38b3Z55E3thDNw+ax0vfbODrJx8q0tTlUTDvRwyzuaw9ZAO5lDup1fzOix5qh8T+jRl7roDDJ8az5o9x60uS1UCDfdy+CHZ8alUvQukckPVAvx46cZr+HTitfiKcMc76/nr19rFezoN93KItdmpFxxIu/o6mEO5rx7NarP4yWju79uMj9YfYNiUeFanahfvqTTcy5CbX0jC7uMMjAzTwRzK7VUN8OWvN7Tj899fi7+vD797dz1//mo7Zy/kWV2acjIN9zIk7s/kXE6+fipVeZSoprVZ/GQ/HuzXjE9+PMjwKQkk7HbP+z2pkmm4l2GFzU6Anw99Wv7qPmhKubVAf1/+cn075j/Umyr+Ptz13o88/+U27eI9hIZ7GeJsdq5tXkc/BKI8VrcmtVj0RD9+3785n244xLDJ8axM0S7e3Wm4X8a+41nsPa6DOZTnC/T35fkRbfni4d5Uq+LHPbN/5Nn52zijXbzb0nC/DB3MobxNl8a1+O7xvjwyoAWfbzzE0EnxxDn+O1DuRcP9MmJt6bSqW4NGtatZXYpSlSbQ35c/DY/kq0f6EFzVjwlzNvDHz7dy+rx28e5Ew70U53Ly+XGfDuZQ3qtTo5p8+3hfHhvYkq82H2bolJWsSEq3uixVThrupVi1O4O8AqPhrrxaFT9f/jisDV8/0oda1QK4/4NEnv50C6fO51pdmiqDhnspViTZCQ70o1sTHcyhVIeIEBY81pcnBrVkwdYjDJkcz/Jd2sW7Mg33EhQN5sggunUYfjqYQykAAvx8eHpoG75+tA+hNarw4NxEnpq3mZNZ2sW7Ik2uEmw/fJrj53IYrDcKU+pX2jcM4ZtH+/DUda34bttRhkyOZ8mOY1aXpS5RrnAXkeEikiwiqSLyXAn7G4tInIhsFpFtIjLS+aVWnouDOfq31nBXqiQBfj48dV1rFjzWl7pBVXjoo408/slmMrWLdxllhruI+AIzgBFAO2C8iLS75LAXKJqt2oWiAdpvOrvQyhSXbKdr41rUrh5gdSlKubR2DYL55rE+/GFIa5bsOMrQyStZvP2o1WUpyte59wBSjTF7jTG5wDxg9CXHGODi/XBDgCPOK7Fy2c9cYFvaab1KRqly8vf14fHBrfj28b7UCwnk4Y838ej/NnHiXI7VpXm18oR7Q+BQse/THNuKexm4U0TSgEXA4yX9IBGZKCKJIpKYkeGa9674IbmoroE6dUmp3ySyXjBfPdKHZ4a1YfnOdIZMjmfhNu3ireKsN1THA3OMMRHASOBDEfnVzzbGzDLGRBljosLCXHPQ9ApbOvVDAmlbP8jqUpRyO/6+Pjw6sCXfPdGXRrWq8uj/NvHwRxvJOKtdfGUrT7gfBhoV+z7Csa24+4HPAIwxa4FAINQZBVamnPwCVu0+zsDIujqYQ6mr0Do8iC8e7s2zwyNZkWRn6OSVLNh6BGOM1aV5jfKE+waglYg0E5EAit4wXXDJMQeBwQAi0paicHfNdZfL2LDvJFm5BQzW9Xalrpqfrw8PD2jBwif60rhOdZ74ZDMPfbQR+9kLVpfmFcoMd2NMPvAYsBRIouiqmJ0i8oqIjHIc9gfgQRHZCnwC3Gvc8H/RK2zpVPHzoXcLt/ulQymX1So8iC8eupbnR0QSl5zB0MnxfL35sHbxFUysOsFRUVEmMTHRkscuiTGGAa/9QPPQ6rw/oYfV5SjlkVLt53hm/lY2HzzFdW3D+eeY9tQNDrS6LLciIhuNMVFlHaefUHXYezyLAyfO6yWQSlWglnVrMP+h3rxwfVsSdmdw3aSVfLkpTbv4CqDh7nBxIMFADXelKpSvj/BAv+YsfrIfrcODePqzrTzwQSLpZ3Qt3pk03B1ibXbahAcRUUsHcyhVGZqH1eDT31/LX29ox+o9xxkyaSXzN2oX7ywa7sCZC3n8uC9Tu3alKpmvj3B/32YseTKayHrB/PHzrUyYs4Gjp7OtLs3tabgDq3YfJ7/Q6F0glbJI09DqzJvYi5dvbMf6vZkMnRTPZxsOaRd/FTTcKRrMEVLVny6NalpdilJey8dHuLdPM5Y81Y92DYL50xfbuOf9DRw5pV38lfD6cC8sNKxMsdNfB3Mo5RKa1KnOJw/24u+jryFxfyZDJ8fzyY8HtYv/jbw+zbYdPs3xc7m6JKOUC/HxEe66tilLn4qmQ8MQnv9yO3fP/pG0k+etLs1teH24xyal4yPQv7Vr3shMKW/WqHY1Pn6gJ/+4qT2bDpxk2OR4Pl5/QLv4ctBwT7bTrUktalbTwRxKuSIfH+HOXk1Y8lQ0nRvX5C9f7eB3767nUKZ28Zfj1eGefuYCOw6f0UsglXIDjWpX46P7e/LPMR3YlnaaYVPi+XDtfgoLtYsviVeH+8VPpeotB5RyDyLCHT0bszQmmm5NavHXb3Zyx7vrOHhCu/hLeXW4x9rsNKxZlTbhOphDKXfSsGZV5t7Xg3/f0oGdh88wbEo8H6zRLr44rw33nPwCVqUeZ2BkmA7mUMoNiQi3dy/q4ns0q81LC3Yy7p11HDiRZXVpLsFrw3393kzO5xYwODLc6lKUUlehQc2qzJnQnf+O7UjS0aIufvaqfV7fxXttuMfa7AT6+3BtizpWl6KUukoiwq28xwzdAAANO0lEQVRRjVge05/eLUJ55btd3D5rLfuOe28XX65wF5HhIpIsIqki8lwJ+yeLyBbHnxQROeX8Up3HGEOszU7vFqEE+vtaXY5SyknqhQTy3j1RvH5rJ5KPnWX4lHjeTdhLgRd28WWGu4j4AjOAEUA7YLyItCt+jDEmxhjT2RjTGZgOfFkRxTrLnowsDmbqYA6lPJGIcEu3CJY/3Z++LUP5x8Ikbn1rDXsyzlldWqUqT+feA0g1xuw1xuQC84DRlzl+PEVzVF1WrC0d0MEcSnmy8OBA3r0nism3d2JPRhYjpyYwK36P13Tx5Qn3hsChYt+nObb9iog0AZoBsaXsnygiiSKSmJGR8VtrdZpYm53IekE0rFnVshqUUhVPRBjTJYLlMdFEtw7jn4tsjH1rDan2s1aXVuGc/YbqOGC+MaagpJ3GmFnGmChjTFRYmDX3cjmdnceG/Sd1SUYpL1I3OJBZd3Vj6rjO7Duexchpq3hr5R7yCwqtLq3ClCfcDwONin0f4dhWknG4+JJMwu4MCnQwh1JeR0QY3bkhy2P6M6hNXf612MYtb61ld7pndvHlCfcNQCsRaSYiARQF+IJLDxKRSKAWsNa5JTpXbJKdWtX86dyoltWlKKUsEBZUhZl3duWNO7pwKPM8109bxYy4VI/r4ssMd2NMPvAYsBRIAj4zxuwUkVdEZFSxQ8cB84wL34uzoNDwQ0oG/VuH4eujn0pVyluJCDd0bMCymGiGtAvnv0uTuXnmGpKPeU4XL1ZlcVRUlElMTKzUx9x08CQ3v7mGaeO7MKpTg0p9bKWU61q47SgvfrODMxfyeHJwK37fvwX+LjqZTUQ2GmOiyjrONauvILFJdnx9hP6tdDCHUupn13esz7KYaIa3r89ry1K4acZqko6esbqsq+Jd4W4rGswRUs3f6lKUUi6mTo0qTB/fhbfu7Er6mQuMemMVU7/fTZ6brsV7TbgfPZ3NrqNn9BJIpdRlDW9fn2Ux/RnZoT6Tv09h9Bur2XnktNVl/WZeE+5xtqIPTWm4K6XKUrt6AFPHdeHtu7phP5vD6DdWM2l5Crn57tPFe024x9rsRNSqSqu6NawuRSnlJoZdU4/vn47mxk4NmLZiN6PeWMWOw+7RxXtFuF/IK2B16nEGRdbVwRxKqd+kZrUAJt/emXfvjiIzK5fRM1bz+rJkcvJL/CC+y/CKcF+39wTZeQW6JKOUumLXtQtneUx/burckOmxqYyavpptaa57d3OvCPdYm52q/r70aq6DOZRSVy6kmj+v39aJ2fdGcSo7lzFvruG/S20u2cV7fLhfHMzRp2UdHcyhlHKKQZHhLIvpzy1dGzIjbg83TFvF1kOu1cV7fLin2s+RdjKbQTorVSnlRCFV/fnP2E7MmdCdczn5jHlzNf9abONCnmt08R4f7itsdgAGRuqnUpVSzjegTV2WxkRzW1Qj3lq5h+unJbDp4Emry/L8cI+12WlXP5j6ITqYQylVMYID/fnXLR2Ze18PsnMLGDtzDf9clGRpF+/R4X76fB4bD+hgDqVU5YhuHcbSmGhu796YWfF7GTk1gY0HMi2pxaPDfaVjMIfOSlVKVZagQH9evbkDH93fk5z8Qsa+tZZ/fLeL7NzK7eI9OtzjbHZqVw+gc6OaVpeilPIyfVuFsjQmmt/1bMy7q/YxcloCG/ZXXhfvseFeUGj4IdnOAB3MoZSySI0qfvzjpg7874Ge5BUUctvba/nbtzsrpYv32HDfcugkJ8/nMUhnpSqlLNa7ZShLn4rmrl5NeH/1fv6xcFeFP6ZfeQ4SkeHAVMAXeNcY868SjrkNeBkwwFZjzB1OrPM3W+EYzNFPB3MopVxA9Sp+vDK6PSPa16d5WPUKf7wyw11EfIEZwBAgDdggIguMMbuKHdMKeB7oY4w5KSKWt8uxNjvdm9YipKoO5lBKuY5rW1TObVDKsyzTA0g1xuw1xuQC84DRlxzzIDDDGHMSwBhjd26Zv82RU9nYjp3VSyCVUl6rPOHeEDhU7Ps0x7biWgOtRWS1iKxzLOP8iohMFJFEEUnMyMi4sorLIdbxqVQNd6WUt3LWG6p+QCtgADAeeEdEfnX9oTFmljEmyhgTFRZWcWvhcTY7jWtXo0WYDuZQSnmn8oT7YaBRse8jHNuKSwMWGGPyjDH7gBSKwr7SXcgrYPUeHcyhlPJu5Qn3DUArEWkmIgHAOGDBJcd8TVHXjoiEUrRMs9eJdZbb2j0nuJBXqEsySimvVma4G2PygceApUAS8JkxZqeIvCIioxyHLQVOiMguIA54xhhzoqKKvpxYm51qAb70bF7biodXSimXUK7r3I0xi4BFl2x7sdjXBnja8ccyPw/mCKWKnw7mUEp5L4/6hGpK+jkOn8pmsC7JKKW8nEeF+wpbOoDeBVIp5fU8KtzjbHbaNwwmPDjQ6lKUUspSHhPup87nFg3maKNdu1JKeUy4r0zJoNDokoxSSoEHhXuszU6d6gF0itDBHEop5RHhnl9QyA/JGQxoUxcfHcyhlFKeEe6bD53idHYeg3Uwh1JKAR4S7iuS7Pj5CH1bhVpdilJKuQSPCPc4m53uTWsTHKiDOZRSCjwg3NNOnic5/awuySilVDFuH+5xjsEcegmkUkr9zO3DPdZmp2mdajQPrfiBs0op5S7cOtyzcwtYs+cEA3Uwh1JK/YJbh/uaPcfJydfBHEopdSm3DvdYm53qAb70aKaDOZRSqji3DfeLgzn6ttLBHEopdalyhbuIDBeRZBFJFZHnSth/r4hkiMgWx58HnF/qL9mOneXo6QsMjgyv6IdSSim3U+aYPRHxBWYAQ4A0YIOILDDG7Lrk0E+NMY9VQI0linVcAjkgMqyyHlIppdxGeTr3HkCqMWavMSYXmAeMrtiyyhZrs9MxIoS6QTqYQymlLlWecG8IHCr2fZpj26VuEZFtIjJfRBqV9INEZKKIJIpIYkZGxhWUWyQzK5fNB08yUAdzKKVUiZz1huq3QFNjTEdgOfBBSQcZY2YZY6KMMVFhYVe+nLIyxU6hQS+BVEqpUpQn3A8DxTvxCMe2nxhjThhjchzfvgt0c055JYu1ZRBaowodGoZU5MMopZTbKk+4bwBaiUgzEQkAxgELih8gIvWLfTsKSHJeib+UX1DIymQ7A9uE6WAOpZQqRZlXyxhj8kXkMWAp4AvMNsbsFJFXgERjzALgCREZBeQDmcC9FVXwxgMnOXMhX+8CqZRSl1FmuAMYYxYBiy7Z9mKxr58HnnduaSXz9RH6tw6jbyu9BFIppUpTrnB3JVFNa/PBfT2sLkMppVya295+QCmlVOk03JVSygNpuCullAfScFdKKQ+k4a6UUh5Iw10ppTyQhrtSSnkgDXellPJAYoyx5oFFMoADljz45YUCx60uwgXoeSii56GInoefWX0umhhjyvyIvmXh7qpEJNEYE2V1HVbT81BEz0MRPQ8/c5dzocsySinlgTTclVLKA2m4/9osqwtwEXoeiuh5KKLn4WducS50zV0ppTyQdu5KKeWBNNyVUsoDeXW4i8h+EdkuIltEJNGxrbaILBeR3Y6/a1ldZ0UQkdkiYheRHcW2lfjcpcg0EUkVkW0i0tW6yp2rlPPwsogcdrwutojIyGL7nnech2QRGWZN1c4nIo1EJE5EdonIThF50rHdq14TlzkP7veaMMZ47R9gPxB6ybb/AM85vn4O+LfVdVbQc48GugI7ynruwEhgMSBAL2C91fVX8Hl4GfhjCce2A7YCVYBmwB7A1+rn4KTzUB/o6vg6CEhxPF+vek1c5jy43WvCqzv3UowGPnB8/QFwk4W1VBhjTDxFw8yLK+25jwbmmiLrgJoiUr9yKq1YpZyH0owG5hljcowx+4BUwCNmPhpjjhpjNjm+PgskAQ3xstfEZc5DaVz2NeHt4W6AZSKyUUQmOraFG2OOOr4+BoRbU5olSnvuDYFDxY5L4/IveE/wmGO5YXaxpTmvOA8i0hToAqzHi18Tl5wHcLPXhLeHe19jTFdgBPCoiEQX32mKfu/yymtFvfm5AzOBFkBn4CjwurXlVB4RqQF8ATxljDlTfJ83vSZKOA9u95rw6nA3xhx2/G0HvqLo16n0i79eOv62W1dhpSvtuR8GGhU7LsKxzSMZY9KNMQXGmELgHX7+Ndujz4OI+FMUaB8bY750bPa610RJ58EdXxNeG+4iUl1Egi5+DQwFdgALgHsch90DfGNNhZYo7bkvAO52XCHRCzhd7Fd1j3PJ2vEYil4XUHQexolIFRFpBrQCfqzs+iqCiAjwHpBkjJlUbJdXvSZKOw9u+Zqw+h1dq/4AzSl6l3srsBP4i2N7HWAFsBv4Hqhtda0V9Pw/oejXyzyK1gnvL+25U3RFxAyKrgTYDkRZXX8Fn4cPHc9zG0X/8dYvdvxfHOchGRhhdf1OPA99KVpy2QZscfwZ6W2vicucB7d7TejtB5RSygN57bKMUkp5Mg13pZTyQBruSinlgTTclVLKA2m4K6WUB9JwV0opD6ThrpRSHuj/AaF+wq6HAHAMAAAAAElFTkSuQmCC\n", + "image/png": "\n", "text/plain": [ "
" ] @@ -5407,7 +1483,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 16, "metadata": {}, "outputs": [], "source": [ @@ -5416,7 +1492,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 17, "metadata": {}, "outputs": [ { @@ -5455,118 +1531,118 @@ " \n", " \n", " 0\n", - " YOU HAD ONE JOB! - with editor Brad1\n", - " 5,292,299.0\n", - " 385,260.0\n", - " 4,080.0\n", + " PyCharm/IntelliJ fast and auto change of the color theme\n", + " 41.0\n", + " 0.0\n", " 0.0\n", - " 29,859.0\n", - " https://www.youtube.com/watch?v=B67OBHNCopk\n", - " [SATIRE, reddit, you had one job, onejob]\n", - " <a href=\"https://www.youtube.com/watch?v=B67OBHNCopk\">XXXXX</a>\n", - " YOU HAD ONE JOB! - w\n", + " 0.0\n", + " 2.0\n", + " https://www.youtube.com/embed/SsX9Fl958W0\n", + " https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", + " <a href=\"https://www.youtube.com/embed/SsX9Fl958W0\">XXXXX</a>\n", + " PyCharm/IntelliJ fas\n", " \n", " \n", " 1\n", - " Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE\n", - " 5,358,149.0\n", - " 378,460.0\n", - " 3,950.0\n", + " How to add weather desklet to Linux Mint 19\n", + " 291.0\n", + " 0.0\n", " 0.0\n", - " 38,075.0\n", - " https://www.youtube.com/watch?v=kLM_9gBZIqY\n", - " [SATIRE]\n", - " <a href=\"https://www.youtube.com/watch?v=kLM_9gBZIqY\">XXXXX</a>\n", - " Demi Lovato DID a WH\n", + " 0.0\n", + " 0.0\n", + " https://www.youtube.com/embed/-FPY_e0BdJs\n", + " https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", + " <a href=\"https://www.youtube.com/embed/-FPY_e0BdJs\">XXXXX</a>\n", + " How to add weather d\n", " \n", " \n", " 2\n", - " We broke another WORLD RECORD!\n", - " 8,557,324.0\n", - " 595,577.0\n", - " 7,899.0\n", + " How to easy integrate Google Calendar to Desktop for Linux Mint\n", + " 226.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 53,664.0\n", - " https://www.youtube.com/watch?v=d1tAfXKc7-c\n", - " [SATIRE]\n", - " <a href=\"https://www.youtube.com/watch?v=d1tAfXKc7-c\">XXXXX</a>\n", - " We broke another WOR\n", + " https://www.youtube.com/embed/2evIujisdD0\n", + " https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", + " <a href=\"https://www.youtube.com/embed/2evIujisdD0\">XXXXX</a>\n", + " How to easy integrat\n", " \n", " \n", " 3\n", - " FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~\n", - " 3,609,152.0\n", - " 218,517.0\n", - " 3,125.0\n", + " Pandas use a list of values to select rows from a column\n", + " 45.0\n", + " 3.0\n", " 0.0\n", - " 17,595.0\n", - " https://www.youtube.com/watch?v=bMLdNrB5hAo\n", - " [SATIRE]\n", - " <a href=\"https://www.youtube.com/watch?v=bMLdNrB5hAo\">XXXXX</a>\n", - " FLOSSING in VR with\n", + " 0.0\n", + " 10.0\n", + " https://www.youtube.com/embed/jlSbo5wmTPQ\n", + " https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", + " <a href=\"https://www.youtube.com/embed/jlSbo5wmTPQ\">XXXXX</a>\n", + " Pandas use a list of\n", " \n", " \n", " 4\n", - " Don't Laugh Challenge, NEW SEASON!!!!!\n", - " 5,888,349.0\n", - " 569,878.0\n", - " 7,822.0\n", + " Pandas count and percentage by value for a column\n", + " 63.0\n", + " 3.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 29,373.0\n", - " https://www.youtube.com/watch?v=Zgm_iM3f_ME\n", - " [SATIRE]\n", - " <a href=\"https://www.youtube.com/watch?v=Zgm_iM3f_ME\">XXXXX</a>\n", - " Don't Laugh Challeng\n", + " https://www.youtube.com/embed/P5pxJkv71BU\n", + " https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", + " <a href=\"https://www.youtube.com/embed/P5pxJkv71BU\">XXXXX</a>\n", + " Pandas count and per\n", " \n", " \n", "\n", "" ], "text/plain": [ - " title Views \\\n", - "0 YOU HAD ONE JOB! - with editor Brad1 5,292,299.0 \n", - "1 Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE 5,358,149.0 \n", - "2 We broke another WORLD RECORD! 8,557,324.0 \n", - "3 FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~ 3,609,152.0 \n", - "4 Don't Laugh Challenge, NEW SEASON!!!!! 5,888,349.0 \n", + " title Views \\\n", + "0 PyCharm/IntelliJ fast and auto change of the color theme 41.0 \n", + "1 How to add weather desklet to Linux Mint 19 291.0 \n", + "2 How to easy integrate Google Calendar to Desktop for Linux Mint 226.0 \n", + "3 Pandas use a list of values to select rows from a column 45.0 \n", + "4 Pandas count and percentage by value for a column 63.0 \n", "\n", - " Like Dislike Favorite Comment \\\n", - "0 385,260.0 4,080.0 0.0 29,859.0 \n", - "1 378,460.0 3,950.0 0.0 38,075.0 \n", - "2 595,577.0 7,899.0 0.0 53,664.0 \n", - "3 218,517.0 3,125.0 0.0 17,595.0 \n", - "4 569,878.0 7,822.0 0.0 29,373.0 \n", + " Like Dislike Favorite Comment \\\n", + "0 0.0 0.0 0.0 2.0 \n", + "1 0.0 0.0 0.0 0.0 \n", + "2 1.0 0.0 0.0 0.0 \n", + "3 3.0 0.0 0.0 10.0 \n", + "4 3.0 0.0 0.0 0.0 \n", "\n", - " videoID \\\n", - "0 https://www.youtube.com/watch?v=B67OBHNCopk \n", - "1 https://www.youtube.com/watch?v=kLM_9gBZIqY \n", - "2 https://www.youtube.com/watch?v=d1tAfXKc7-c \n", - "3 https://www.youtube.com/watch?v=bMLdNrB5hAo \n", - "4 https://www.youtube.com/watch?v=Zgm_iM3f_ME \n", + " videoID \\\n", + "0 https://www.youtube.com/embed/SsX9Fl958W0 \n", + "1 https://www.youtube.com/embed/-FPY_e0BdJs \n", + "2 https://www.youtube.com/embed/2evIujisdD0 \n", + "3 https://www.youtube.com/embed/jlSbo5wmTPQ \n", + "4 https://www.youtube.com/embed/P5pxJkv71BU \n", "\n", - " tags \\\n", - "0 [SATIRE, reddit, you had one job, onejob] \n", - "1 [SATIRE] \n", - "2 [SATIRE] \n", - "3 [SATIRE] \n", - "4 [SATIRE] \n", + " tags \\\n", + "0 https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg \n", + "1 https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg \n", + "2 https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg \n", + "3 https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg \n", + "4 https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg \n", "\n", - " nameurl \\\n", - "0 XXXXX \n", - "1 XXXXX \n", - "2 XXXXX \n", - "3 XXXXX \n", - "4 XXXXX \n", + " nameurl \\\n", + "0 XXXXX \n", + "1 XXXXX \n", + "2 XXXXX \n", + "3 XXXXX \n", + "4 XXXXX \n", "\n", " title_short \n", - "0 YOU HAD ONE JOB! - w \n", - "1 Demi Lovato DID a WH \n", - "2 We broke another WOR \n", - "3 FLOSSING in VR with \n", - "4 Don't Laugh Challeng " + "0 PyCharm/IntelliJ fas \n", + "1 How to add weather d \n", + "2 How to easy integrat \n", + "3 Pandas use a list of \n", + "4 Pandas count and per " ] }, - "execution_count": 11, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } @@ -5577,7 +1653,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 18, "metadata": {}, "outputs": [], "source": [ @@ -5586,22 +1662,22 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "" + "" ] }, - "execution_count": 13, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEICAYAAABYoZ8gAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAEg5JREFUeJzt3X+w3XWd3/HnCxLNYiJqcscpBgwrO6tZQiJegYUNkOIAamvKLKtkgK4Bhtku9de0WWx1klb+2Z3aXXbcAqaYpu46BGWpQxcDYbpOSY1YbgKTROKG7ZLVS2hzN/HHFkUSffePe2IvIfdHck9ycvN5PmYY7/l8vud734cZn/nyPefmpqqQJLXjlF4PIEk6vgy/JDXG8EtSYwy/JDXG8EtSYwy/JDXmhA1/kjVJ9iTZPoFjz0ry9SRPJdma5H3HY0ZJmopO2PADa4GrJ3jsp4EvV9U7geuAu47VUJI01Z2w4a+qx4F9I9eSvC3JI0k2J9mY5O0HDwde3/n6dGD3cRxVkqaUab0e4AitBn6nqp5NciHDV/b/EPg3wIYkHwFeB7yndyNK0oltyoQ/yUzgYuArSQ4uv7bzv8uAtVX175P8OvCnSc6tqp/3YFRJOqFNmfAzfFvqB1W16DB7N9N5P6CqvplkBjAH2HMc55OkKeGEvcd/qKr6EfBckt8CyLCFne3vAld01t8BzACGejKoJJ3gcqL+7ZxJ7gMuZ/jK/f8Aq4C/BO4G/gEwHVhXVZ9JMh/4j8BMht/o/b2q2tCLuSXpRHfChl+SdGxMmVs9kqTuOCHf3J0zZ07Nmzev12NI0pSxefPmv6uqvokce0KGf968eQwMDPR6DEmaMpL87USP9VaPJDXG8EtSYwy/JDXmhLzHL0mj2b9/P4ODg7z00ku9HqUnZsyYwdy5c5k+ffpRn8PwS5pSBgcHmTVrFvPmzWPE39vVhKpi7969DA4OcvbZZx/1ebzVI2lKeemll5g9e3Zz0QdIwuzZsyf9XzuGX9KU02L0D+rGazf8ktQYwy9JR2DJkiU8+uijr1i78847Wb58Oddee22Ppjoyhl+SjsCyZctYt27dK9bWrVvH8uXLeeCBB3o01ZEx/JJ0BK699loefvhhXn75ZQB27drF7t27OfPMMzn33HMB+NnPfsaKFSt497vfzXnnncfnP/95AG677TYeeughAK655hpuuukmANasWcOnPvUpXnzxRd7//vezcOFCzj33XO6///5j8hr8OKekKevf/tdv88zuH3X1nPPPeD2r/vGvjbr/pje9iQsuuID169ezdOlS1q1bxwc/+MFXvOn6hS98gdNPP50nn3ySn/70p1xyySVceeWVLF68mI0bN/KBD3yA559/nhdeeAGAjRs3ct111/HII49wxhln8PDDDwPwwx/+sKuv7SCv+CXpCI283bNu3TqWLVv2iv0NGzbwxS9+kUWLFnHhhReyd+9enn322V+E/5lnnmH+/Pm8+c1v5oUXXuCb3/wmF198MQsWLOCxxx7j9ttvZ+PGjZx++unHZH6v+CVNWWNdmR9LS5cu5ROf+ARbtmzhxz/+Me9617vYtWvXL/aris997nNcddVVr3ruD37wAx555BEuvfRS9u3bx5e//GVmzpzJrFmzmDVrFlu2bOFrX/san/70p7niiitYuXJl1+f3il+SjtDMmTNZsmQJN91006uu9gGuuuoq7r77bvbv3w/Azp07efHFFwG46KKLuPPOO7n00ktZvHgxn/3sZ1m8eDEAu3fv5rTTTuOGG25gxYoVbNmy5ZjM7xW/JB2FZcuWcc0117zqEz4At9xyC7t27eL888+nqujr6+OrX/0qAIsXL2bDhg2cc845vPWtb2Xfvn2/CP+2bdtYsWIFp5xyCtOnT+fuu+8+JrOfkL9zt7+/v/xFLJIOZ8eOHbzjHe/o9Rg9dbh/B0k2V1X/RJ7vrR5Jaozhl6TGGH5JU86JeIv6eOnGazf8kqaUGTNmsHfv3ibjf/Dv458xY8akzuOneiRNKXPnzmVwcJChoaFej9ITB38D12QYfklTyvTp0yf126fkrR5Jao7hl6TGTCj8SdYk2ZNk+yj7S5NsTfJ0koEkvzFi77eTPNv557e7Nbgk6ehM9Ip/LXD1GPv/DVhYVYuAm4B7AZK8CVgFXAhcAKxK8sajnlaSNGkTCn9VPQ7sG2P//9b//2zV64CDX18FPFZV+6rq+8BjjP0HiCTpGOvaPf4k1yT5DvAww1f9AG8BvjfisMHO2uGef2vnNtFAqx/TkqTjoWvhr6r/UlVvB/4JcMdRPH91VfVXVX9fX1+3xpIkHaLrn+rp3Bb65SRzgOeBM0dsz+2sSZJ6pCvhT3JOOr9wMsn5wGuBvcCjwJVJ3th5U/fKzpokqUcm9JO7Se4DLgfmJBlk+JM60wGq6h7gN4F/mmQ/8BPgQ503e/cluQN4snOqz1TVqG8SS5KOPX8RiySdBPxFLJKkURl+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWrMuOFPsibJniTbR9m/PsnWJNuSbEqycMTeJ5J8O8n2JPclmdHN4SVJR24iV/xrgavH2H8OuKyqFgB3AKsBkrwF+CjQX1XnAqcC101qWknSpE0b74CqejzJvDH2N414+AQw95Dz/1KS/cBpwO6jG1OS1C3dvsd/M7AeoKqeBz4LfBd4AfhhVW3o8veTJB2hroU/yRKGw3975/EbgaXA2cAZwOuS3DDG829NMpBkYGhoqFtjSZIO0ZXwJzkPuBdYWlV7O8vvAZ6rqqGq2g88CFw82jmqanVV9VdVf19fXzfGkiQdxqTDn+QshqN+Y1XtHLH1XeCiJKclCXAFsGOy30+SNDnjvrmb5D7gcmBOkkFgFTAdoKruAVYCs4G7hvvOgc6V+7eSPABsAQ4AT9H5xI8kqXdSVb2e4VX6+/trYGCg12NI0pSRZHNV9U/kWH9yV5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaM274k6xJsifJ9lH2r0+yNcm2JJuSLByx94YkDyT5TpIdSX69m8NLko7cRK741wJXj7H/HHBZVS0A7gBWj9j7Y+CRqno7sBDYcZRzSpK6ZNp4B1TV40nmjbG/acTDJ4C5AElOBy4FPtw57mXg5aMfVZLUDd2+x38zsL7z9dnAEPCfkjyV5N4krxvtiUluTTKQZGBoaKjLY0mSDupa+JMsYTj8t3eWpgHnA3dX1TuBF4FPjvb8qlpdVf1V1d/X19etsSRJh+hK+JOcB9wLLK2qvZ3lQWCwqr7VefwAw38QSJJ6aNLhT3IW8CBwY1XtPLheVf8b+F6SX+0sXQE8M9nvJ0manHHf3E1yH3A5MCfJILAKmA5QVfcAK4HZwF1JAA5UVX/n6R8BvpTkNcDfAMu7/QIkSUdmIp/qWTbO/i3ALaPsPQ30H25PktQb/uSuJDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDVm3PAnWZNkT5Lto+xfn2Rrkm1JNiVZeMj+qUmeSvIX3RpaknT0JnLFvxa4eoz954DLqmoBcAew+pD9jwE7jmo6SVLXjRv+qnoc2DfG/qaq+n7n4RPA3IN7SeYC7wfuneSckqQu6fY9/puB9SMe3wn8HvDz8Z6Y5NYkA0kGhoaGujyWJOmgroU/yRKGw3975/E/AvZU1eaJPL+qVldVf1X19/X1dWssSdIhpnXjJEnOY/h2znuram9n+RLgA0neB8wAXp/kz6rqhm58T0nS0Zn0FX+Ss4AHgRuraufB9ar6V1U1t6rmAdcBf2n0Jan3xr3iT3IfcDkwJ8kgsAqYDlBV9wArgdnAXUkADlRV/7EaWJI0OamqXs/wKv39/TUwMNDrMSRpykiyeaIX3f7kriQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmPGDX+SNUn2JNk+yv71SbYm2ZZkU5KFnfUzk3w9yTNJvp3kY90eXpJ05CZyxb8WuHqM/eeAy6pqAXAHsLqzfgD4F1U1H7gIuC3J/EnMKknqgnHDX1WPA/vG2N9UVd/vPHwCmNtZf6GqtnS+/ntgB/CWSU8sSZqUbt/jvxlYf+hiknnAO4Fvdfn7SZKO0LRunSjJEobD/xuHrM8E/hz4eFX9aIzn3wrcCnDWWWd1ayxJ0iG6csWf5DzgXmBpVe0dsT6d4eh/qaoeHOscVbW6qvqrqr+vr68bY0mSDmPS4U9yFvAgcGNV7RyxHuALwI6q+sPJfh9JUneMe6snyX3A5cCcJIPAKmA6QFXdA6wEZgN3DbeeA1XVD1wC3AhsS/J053T/uqq+1u0XIUmauHHDX1XLxtm/BbjlMOv/A8jRjyZJOhb8yV1Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jasy44U+yJsmeJNtH2b8+ydYk25JsSrJwxN7VSf4qyV8n+WQ3B5ckHZ2JXPGvBa4eY/854LKqWgDcAawGSHIq8B+A9wLzgWVJ5k9qWknSpI0b/qp6HNg3xv6mqvp+5+ETwNzO1xcAf11Vf1NVLwPrgKWTnFeSNEndvsd/M7C+8/VbgO+N2BvsrB1WkluTDCQZGBoa6vJYkqSDuhb+JEsYDv/tR/P8qlpdVf1V1d/X19etsSRJh5jWjZMkOQ+4F3hvVe3tLD8PnDnisLmdNUlSD036ij/JWcCDwI1VtXPE1pPAryQ5O8lrgOuAhyb7/SRJkzPuFX+S+4DLgTlJBoFVwHSAqroHWAnMBu5KAnCgc8vmQJJ/DjwKnAqsqapvH5NXIUmasFRVr2d4lf7+/hoYGOj1GJI0ZSTZXFX9EznWn9yVpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqTKqq1zO8SpIh4G97PccRmgP8Xa+HOM58zW3wNU8Nb62qvokceEKGfypKMlBV/b2e43jyNbfB13zy8VaPJDXG8EtSYwx/96zu9QA94Gtug6/5JOM9fklqjFf8ktQYwy9JjTH8ktQYwy9JjTH8ktQYwy9JjTH8ktQYw6+TRpI3JPndztdnJHmg8/WiJO8bcdyHk/xJl77n5Un+YpLn+HCSM7oxjzQRhl8nkzcAvwtQVbur6trO+iLgfaM+q4eSnAp8GDD8Om4Mv04mvw+8LcnTSb6SZHuS1wCfAT7UWf/QyCck6Uvy50me7PxzyWgnT3JZ5xxPJ3kqyazO1swkDyT5TpIvJUnn+Cs6x21LsibJazvru5L8QZItwDKgH/hS57y/dAz+vUivYPh1Mvkk8L+qahGwAqCqXgZWAvdX1aKquv+Q5/wx8EdV9W7gN4F7xzj/vwRu65x/MfCTzvo7gY8D84FfBi5JMgNYC3yoqhYA04B/NuJce6vq/Kr6M2AAuL4z30+QjjHDr9a9B/iTJE8DDwGvTzJzlGO/Afxhko8Cb6iqA531/1lVg1X1c+BpYB7wq8BzVbWzc8x/Bi4dca5D/wCSjptpvR5A6rFTgIuq6qXxDqyq30/yMMPvF3wjyVWdrZ+OOOxnTOz/Vy8e8aRSl3jFr5PJ3wOzjmAdYAPwkYMPkiwa7eRJ3lZV26rqD4AngbePMctfAfOSnNN5fCPw349wbumYMPw6aVTVXoavxLcD/27E1teB+Yd7cxf4KNCfZGuSZ4DfGeNbfLzzhvFWYD+wfoxZXgKWA19Jsg34OXDPKIevBe7xzV0dL/59/JLUGK/4JakxvrkrHSLJcuBjhyx/o6pu68U8Urd5q0eSGuOtHklqjOGXpMYYfklqjOGXpMb8P+PZ1UPntR/oAAAAAElFTkSuQmCC\n", + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEBCAYAAACT92m7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAE2hJREFUeJzt3X+0XWV95/H3BxOMNgEE77CEgKGFoUZQRo6IMqHiD4J11RAblHQcMaBMR5xOdZWBLl3YH/9Ix1HGH6WyhCFMuwiUmZF0IgSG5ZpkudByk1KTkCJRqV5AExPREcvv7/xxd5zDfRJyk5xw5ibv11pnnb2/+9nPfZ6slXzu3s8+OakqJEnqd9CwByBJ+v+P4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqTGtGEPYE+9/OUvrzlz5gx7GJI0paxZs+bHVTWyq3ZTNhzmzJnD6OjosIchSVNKkn+cTDtvK0mSGoaDJKlhOEiSGlN2zUGSduapp55ibGyMxx9/fNhDGZoZM2Ywe/Zspk+fvkfnGw6S9jtjY2PMmjWLOXPmkGTYw3nBVRVbt25lbGyM4447bo/68LaSpP3O448/zhFHHHFABgNAEo444oi9unIyHCTtlw7UYNhub+dvOEiSGoaDJA3YWWedxcqVK59Tu+qqq1iyZAmLFi0a0qh2j+EgSQO2ePFili1b9pzasmXLWLJkCbfccsuQRrV7DAdJGrBFixaxYsUKnnzySQAefPBBHn74YY455hhOOukkAJ555hkuvfRSXv/61/Oa17yGL33pSwBccsklLF++HICFCxdy4YUXAnDdddfx8Y9/nMcee4x3vvOdvPa1r+Wkk07ipptu2idz8FFWSfu1P/6bDdz38M8G2ufcow7hk7/16p0eP/zwwznttNO47bbbWLBgAcuWLeM973nPcxaJr732Wg499FDuuecennjiCc444wzOPvts5s2bx+rVq3nXu97FQw89xCOPPALA6tWrOf/887n99ts56qijWLFiBQA//elPBzq37bxykKR9oP/W0rJly1i8ePFzjt9xxx3ccMMNnHLKKbzhDW9g69atPPDAA78Mh/vuu4+5c+dy5JFH8sgjj3D33Xfzpje9iZNPPpk777yTyy67jNWrV3PooYfuk/F75SBpv/Z8v+HvSwsWLOCjH/0oa9eu5Re/+AWnnnoqDz744C+PVxWf//znmT9/fnPuo48+yu23386ZZ57Jtm3buPnmm5k5cyazZs1i1qxZrF27lq9+9at84hOf4K1vfStXXHHFwMfvlYMk7QMzZ87krLPO4sILL2yuGgDmz5/P1VdfzVNPPQXAt7/9bR577DEATj/9dK666irOPPNM5s2bx6c//WnmzZsHwMMPP8xLX/pS3ve+93HppZeydu3afTJ+rxwkaR9ZvHgxCxcubJ5cAvjgBz/Igw8+yOte9zqqipGREb7yla8AMG/ePO644w6OP/54XvnKV7Jt27ZfhsO6deu49NJLOeigg5g+fTpXX331Phl7qmqfdLyv9Xq98st+JO3Ixo0bedWrXjXsYQzdjv4ckqypqt6uzvW2kiSpYThIkhqGg6T90lS9ZT4oezt/w0HSfmfGjBls3br1gA2I7d/nMGPGjD3uw6eVJO13Zs+ezdjYGFu2bBn2UIZm+zfB7SnDQdJ+Z/r06Xv8DWga520lSVLDcJAkNQwHSVLDcJAkNQwHSVJjl+GQ5Lokm5Os76udl2RDkmeT9Prq05MsTbIuycYkf9h37Jwk9yfZlOTyvvpxSb7Z1W9KcvAgJyhJ2n2TuXK4HjhnQm098G5g1YT6ecCLq+pk4FTg3ySZk+RFwBeBdwBzgcVJ5nbnXAl8tqqOB34CXLQnE5EkDc4uw6GqVgHbJtQ2VtX9O2oO/EqSacBLgCeBnwGnAZuq6rtV9SSwDFiQ8e/Mewuw/Ru3lwLn7ulkJEmDMeg1h1uAx4BHgO8Dn66qbcDRwA/62o11tSOAR6vq6Ql1SdIQDfoT0qcBzwBHAS8DVif5X4PqPMnFwMUAxx577KC6lSRNMOgrh98Bbq+qp6pqM/B1oAc8BBzT1252V9sKHNbdhuqv71BVXVNVvarqjYyMDHjokqTtBh0O32d8DYEkvwKcDvwDcA9wQvdk0sHA+cDyGv8vE78GLOrOvwC4dcBjkiTtpsk8ynojcDdwYpKxJBclWZhkDHgjsCLJyq75F4GZSTYwHgj/paq+1a0pfARYCWwEbq6qDd05lwEfS7KJ8TWIawc5QUnS7vM7pCXpAOJ3SEuS9pjhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElq7DIcklyXZHOS9X2185JsSPJskt6E9q9Jcnd3fF2SGV391G5/U5LPJUlXPzzJnUke6N5fNuhJSpJ2z2SuHK4HzplQWw+8G1jVX0wyDfhL4Her6tXAm4GnusNXAx8CTuhe2/u8HLirqk4A7ur2JUlDtMtwqKpVwLYJtY1Vdf8Omp8NfKuq/r5rt7WqnknyCuCQqvpGVRVwA3Bud84CYGm3vbSvLkkakkGvOfxzoJKsTLI2yX/o6kcDY33txroawJFV9Ui3/UPgyAGPSZK0m6btg/7+JfB64BfAXUnWAD+dzMlVVUlqZ8eTXAxcDHDsscfu/WglSTs06CuHMWBVVf24qn4BfBV4HfAQMLuv3eyuBvCj7rYT3fvmnXVeVddUVa+qeiMjIwMeuiRpu0GHw0rg5CQv7RanfwO4r7tt9LMkp3dPKb0fuLU7ZzlwQbd9QV9dkjQkk3mU9UbgbuDEJGNJLkqyMMkY8EZgRZKVAFX1E+AzwD3AvcDaqlrRdfVh4MvAJuA7wG1d/VPA25M8ALyt25ckDVHGHx6aenq9Xo2Ojg57GJI0pSRZU1W9XbXzE9KSpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElq7DIcklyXZHOS9X2185JsSPJskt4Ozjk2yc+T/EFf7Zwk9yfZlOTyvvpxSb7Z1W9KcvAgJiZJ2nOTuXK4HjhnQm098G5g1U7O+Qxw2/adJC8Cvgi8A5gLLE4ytzt8JfDZqjoe+Alw0WQHL0naN3YZDlW1Ctg2obaxqu7fUfsk5wLfAzb0lU8DNlXVd6vqSWAZsCBJgLcAt3TtlgLn7vYsJEkDNdA1hyQzgcuAP55w6GjgB337Y13tCODRqnp6Qn1n/V+cZDTJ6JYtWwY3cEnScwx6QfqPGL9F9PMB9wtAVV1TVb2q6o2MjOyLHyFJAqYNuL83AIuS/BlwGPBskseBNcAxfe1mAw8BW4HDkkzrrh621yVJQzTQcKiqedu3k/wR8POq+kKSacAJSY5j/B//84HfqapK8jVgEePrEBcAtw5yTJKk3TeZR1lvBO4GTkwyluSiJAuTjAFvBFYkWfl8fXRXBR8BVgIbgZuravuC9WXAx5JsYnwN4to9n44kaRBSVcMewx7p9Xo1Ojo67GFI0pSSZE1VNZ9Pm8hPSEuSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGrsMhyTXJdmcZH1f7bwkG5I8m6TXV397kjVJ1nXvb+k7dmpX35Tkc0nS1Q9PcmeSB7r3lw16kpKk3TOZK4frgXMm1NYD7wZWTaj/GPitqjoZuAD4r33HrgY+BJzQvbb3eTlwV1WdANzV7UuShmiX4VBVq4BtE2obq+r+HbT9u6p6uNvdALwkyYuTvAI4pKq+UVUF3ACc27VbACzttpf21SVJQ7Iv1xx+G1hbVU8ARwNjfcfGuhrAkVX1SLf9Q+DIfTgmSdIkTNsXnSZ5NXAlcPbunFdVlaSep9+LgYsBjj322L0aoyRp5wZ+5ZBkNvA/gPdX1Xe68kPA7L5ms7sawI+6205075t31ndVXVNVvarqjYyMDHrokqTOQMMhyWHACuDyqvr69np32+hnSU7vnlJ6P3Brd3g544vXdO+3Ikkaqsk8ynojcDdwYpKxJBclWZhkDHgjsCLJyq75R4DjgSuS3Nu9/ll37MPAl4FNwHeA27r6p4C3J3kAeFu3L0kaoow/PDT19Hq9Gh0dHfYwJGlKSbKmqnq7aucnpCVJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktTYZTgkuS7J5iTr+2rnJdmQ5NkkvQnt/zDJpiT3J5nfVz+nq21Kcnlf/bgk3+zqNyU5eFCTkyTtmclcOVwPnDOhth54N7Cqv5hkLnA+8OrunD9P8qIkLwK+CLwDmAss7toCXAl8tqqOB34CXLRnU5EkDcouw6GqVgHbJtQ2VtX9O2i+AFhWVU9U1feATcBp3WtTVX23qp4ElgELkgR4C3BLd/5S4Nw9no0kaSAGveZwNPCDvv2xrraz+hHAo1X19IS6JGmIptSCdJKLk4wmGd2yZcuwhyNJ+61Bh8NDwDF9+7O72s7qW4HDkkybUN+hqrqmqnpV1RsZGRnowCVJ/8+gw2E5cH6SFyc5DjgB+FvgHuCE7smkgxlftF5eVQV8DVjUnX8BcOuAxyRJ2k2TeZT1RuBu4MQkY0kuSrIwyRjwRmBFkpUAVbUBuBm4D7gduKSqnunWFD4CrAQ2Ajd3bQEuAz6WZBPjaxDXDnaKkqTdlfFf3qeeXq9Xo6Ojwx6GJE0pSdZUVW9X7abUgrQk6YVhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGpMKhyTXJdmcZH1f7fAkdyZ5oHt/WVc/NMnfJPn7JBuSLOk754Ku/QNJLuirn5pkXZJNST6XJIOcpCRp90z2yuF64JwJtcuBu6rqBOCubh/gEuC+qnot8GbgPyU5OMnhwCeBNwCnAZ/cHijA1cCHgBO618SfJUl6AU0qHKpqFbBtQnkBsLTbXgqcu705MKv77X9md97TwHzgzqraVlU/Ae4EzknyCuCQqvpGVRVwQ19fkqQhmLYX5x5ZVY902z8Ejuy2vwAsBx4GZgHvrapnkxwN/KDv/DHg6O41toO6JGlIBrIg3f3GX93ufOBe4CjgFOALSQ4ZxM9JcnGS0SSjW7ZsGUSXkqQd2Jtw+FF3S4jufXNXXwL89xq3Cfge8OvAQ8AxfefP7moPddsT642quqaqelXVGxkZ2YuhS5Kez96Ew3Jg+xNHFwC3dtvfB94KkORI4ETgu8BK4OwkL+sWos8GVna3pn6W5PRuneL9fX1JkoZgUmsOSW5k/MmjlycZY/ypo08BNye5CPhH4D1d8z8Frk+yDghwWVX9uOvnT4F7unZ/UlXbF7k/zPgTUS8BbutekqQhyfhywdTT6/VqdHR02MOQpCklyZqq6u2qnZ+QliQ1DAdJUsNwkCQ1DAdJUsNwkCQ1puzTSkm2MP4I7VTycuDHwx7EC8w5Hxic89Txyqra5aeIp2w4TEVJRifzCNn+xDkfGJzz/sfbSpKkhuEgSWoYDi+sa4Y9gCFwzgcG57yfcc1BktTwykGS1DAcJEkNw0GS1DAcJEkNw0GS1DAcJEkNw0EHlCSHJflwt31Uklu67VOS/GZfuw8k+cKAfuabk/zPvezjA0mOGsR4pMkwHHSgOYzx7yynqh6uqkVd/RTgN3d61hAleRHwAcBw0AvGcNCB5lPAryW5N8lfJ1mf5GDgT4D3dvX39p+QZCTJf0tyT/c6Y2edJ/mNro97k/xdklndoZlJbknyD0n+Kkm69m/t2q1Lcl2SF3f1B5NcmWQtsBjoAX/V9fuSffDnIj2H4aADzeXAd6rqFOBSgKp6ErgCuKmqTqmqmyac85+Bz1bV64HfBr78PP3/AXBJ1/884J+6+r8Afh+YC/wqcEaSGcD1wHur6mRgGvBv+/raWlWvq6q/BEaBf9WN75+Q9jHDQdq1twFfSHIvsBw4JMnMnbT9OvCZJL8HHFZVT3f1v62qsap6FrgXmAOcCHyvqr7dtVkKnNnX18SQkl4w04Y9AGkKOAg4vaoe31XDqvpUkhWMr198Pcn87tATfc2eYXJ/9x7b7ZFKA+KVgw40/weYtRt1gDuAf7d9J8kpO+s8ya9V1bqquhK4B/j15xnL/cCcJMd3+/8a+N+7OW5pnzAcdECpqq2M/0a/HviPfYe+Bszd0YI08HtAL8m3ktwH/O7z/Ijf7xa5vwU8Bdz2PGN5HFgC/HWSdcCzwF/spPn1wF+4IK0Xiv9ltySp4ZWDJKnhgrS0B5IsAf79hPLXq+qSYYxHGjRvK0mSGt5WkiQ1DAdJUsNwkCQ1DAdJUsNwkCQ1/i875WAmq/t/RQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] @@ -5618,7 +1694,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 20, "metadata": {}, "outputs": [], "source": [ @@ -5627,12 +1703,12 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 21, "metadata": {}, "outputs": [ { "data": { - "image/png": "\n", + "image/png": "\n", "text/plain": [ "
" ] @@ -5652,12 +1728,12 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 22, "metadata": {}, "outputs": [ { "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEMCAYAAAA/Jfb8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAFTRJREFUeJzt3X2QXfV93/H3ByQgDjK2pY1rkIwYG8doMMawBmJXMgR7ENCi4BAbTUnaAmbaGjfjZFQrYwbbtJ3xA2lJXSAWhVDcCTLGLtEU8dQa15rUdrUI8yQClkExK3BYS/gJSnjwt3/cK7TIK+2VdLVHe+77NbOz95zz23u/Otr72d/9/c5DqgpJUrvs13QBkqT+M9wlqYUMd0lqIcNdklrIcJekFjLcJamFGg33JNcleTrJgz20fXOSu5Pcm+T+JGdMRY2SNB013XO/HljcY9tLgJuq6l3AucBVe6soSZruGg33qvoWsGX8uiRvSXJ7knuSrEny9q3Ngdd2Hx8CPDmFpUrStDKj6QImsAL4F1X1/SQn0umh/zbwaeDOJB8Dfh14f3MlStK+bZ8K9yQHA+8Bvppk6+oDu9+XAtdX1Z8m+S3gy0mOrqpfNlCqJO3T9qlwpzNM9JOqOnaCbRfQHZ+vqm8nOQiYAzw9hfVJ0rTQ9ITqq1TVz4DHk/weQDre2d38Q+DU7vqjgIOAsUYKlaR9XJq8KmSSG4GT6fTA/w74FPAN4GrgTcBMYGVVXZZkAXANcDCdydV/U1V3NlG3JO3rGg13SdLesU8Ny0iS+qOxCdU5c+bU/Pnzm3p5SZqW7rnnnh9X1dBk7RoL9/nz5zMyMtLUy0vStJTkb3tp57CMJLWQ4S5JLWS4S1IL7WtnqEoSAC+++CKjo6M8//zzTZfSiIMOOoi5c+cyc+bM3fp5w13SPml0dJRZs2Yxf/58xl1raiBUFZs3b2Z0dJQjjjhit57DYRlJ+6Tnn3+e2bNnD1ywAyRh9uzZe/SpxXCXtM8axGDfak//7Ya7JLWQY+6SpoX5y2/t6/Nt/OyZO91+yimnsHz5ck477bRX1l1xxRXcd999/PznP+fmm2/uaz39Nq3Dvd//2btjsl8QSdPT0qVLWbly5avCfeXKlXz+859n0aJFDVbWG4dlJGkC55xzDrfeeisvvPACABs3buTJJ59k3rx5HH300QC8/PLLLFu2jHe/+90cc8wxfOlLXwLgox/9KKtWrQLg7LPP5vzzzwfguuuu45Of/CTPPvssZ555Ju985zs5+uij+cpXvtL3+g13SZrAG97wBk444QRuu+02oNNr/9CHPvSqic5rr72WQw45hLVr17J27VquueYaHn/8cRYuXMiaNWsA2LRpE+vXrwdgzZo1LFq0iNtvv51DDz2U++67jwcffJDFixf3vX7DXZJ2YOvQDHTCfenSpa/afuedd3LDDTdw7LHHcuKJJ7J582a+//3vvxLu69evZ8GCBbzxjW/kqaee4tvf/jbvec97eMc73sFdd93FJz7xCdasWcMhhxzS99qn9Zi7JO1NS5Ys4eMf/zjr1q3jueee4/jjj2fjxo2vbK8qvvjFL75qXH6rn/zkJ9x+++0sWrSILVu2cNNNN3HwwQcza9YsZs2axbp161i9ejWXXHIJp556Kpdeemlfa5+0557kuiRPJ3lwB9v/SZL7kzyQ5P+Mu+epJE1rBx98MKeccgrnn3/+r/TaAU477TSuvvpqXnzxRQAeffRRnn32WQBOOukkrrjiChYtWsTChQu5/PLLWbhwIQBPPvkkr3nNazjvvPNYtmwZ69at63vtvfTcrwf+M3DDDrY/Dryvqp5JcjqwAjixP+VJUkdTR6YtXbqUs88++5XhmfEuvPBCNm7cyHHHHUdVMTQ0xC233ALAwoULufPOO3nrW9/K4YcfzpYtW14J9wceeIBly5ax3377MXPmTK6++uq+193TPVSTzAf+R1UdPUm71wMPVtVhkz3n8PBw7enNOjwUUmqvhx9+mKOOOqrpMho10T5Ick9VDU/2s/2eUL0AuG1HG5NclGQkycjY2FifX1qStFXfwj3JKXTC/RM7alNVK6pquKqGh4YmvQWgJGk39eVomSTHAP8FOL2qNvfjOSWpqgb24mG9DJnvzB733JO8Gfg68PtV9eiePp8kQedmFZs3b97jkJuOtl7P/aCDDtrt55i0557kRuBkYE6SUeBTwMxuAX8OXArMBq7q/oV9qZfBfknamblz5zI6Osqgzs9tvRPT7po03KvqVw/ufPX2C4ELd7sCSZrAzJkzd/suRPLyA5LUSoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgtNGu5JrkvydJIHd7A9Sf5Tkg1J7k9yXP/LlCTtil567tcDi3ey/XTgyO7XRcDVe16WJGlPTBruVfUtYMtOmiwBbqiO7wCvS/KmfhUoSdp1/RhzPwx4YtzyaHfdr0hyUZKRJCNjY2N9eGlJ0kSmdEK1qlZU1XBVDQ8NDU3lS0vSQOlHuG8C5o1bnttdJ0lqSD/CfRXwB92jZk4CflpVT/XheSVJu2nGZA2S3AicDMxJMgp8CpgJUFV/DqwGzgA2AM8B/3xvFStJ6s2k4V5VSyfZXsBH+1aRJGmPeYaqJLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQj2Fe5LFSR5JsiHJ8gm2vznJ3UnuTXJ/kjP6X6okqVeThnuS/YErgdOBBcDSJAu2a3YJcFNVvQs4F7iq34VKknrXS8/9BGBDVT1WVS8AK4El27Up4LXdx4cAT/avREnSruol3A8Dnhi3PNpdN96ngfOSjAKrgY9N9ERJLkoykmRkbGxsN8qVJPWiXxOqS4Hrq2oucAbw5SS/8txVtaKqhqtqeGhoqE8vLUnaXi/hvgmYN255bnfdeBcANwFU1beBg4A5/ShQkrTregn3tcCRSY5IcgCdCdNV27X5IXAqQJKj6IS74y6S1JBJw72qXgIuBu4AHqZzVMxDSS5Lcla32R8DH0lyH3Aj8M+qqvZW0ZKknZvRS6OqWk1nonT8ukvHPV4PvLe/pUmSdpdnqEpSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS3UU7gnWZzkkSQbkizfQZsPJVmf5KEkf9nfMiVJu2LGZA2S7A9cCXwAGAXWJllVVevHtTkS+BPgvVX1TJLf2FsFa2Lzl9/adAls/OyZTZcgqauXnvsJwIaqeqyqXgBWAku2a/MR4Mqqegagqp7ub5mSpF3RS7gfBjwxbnm0u268twFvS/LXSb6TZPFET5TkoiQjSUbGxsZ2r2JJ0qT6NaE6AzgSOBlYClyT5HXbN6qqFVU1XFXDQ0NDfXppSdL2egn3TcC8cctzu+vGGwVWVdWLVfU48CidsJckNaCXcF8LHJnkiCQHAOcCq7ZrcwudXjtJ5tAZpnmsj3VKknbBpOFeVS8BFwN3AA8DN1XVQ0kuS3JWt9kdwOYk64G7gWVVtXlvFS1J2rlJD4UEqKrVwOrt1l067nEBf9T9kiQ1zDNUJamFDHdJaiHDXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaqGeLvkrTSfzl9/adAls/OyZTZegAWfPXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklqop3BPsjjJI0k2JFm+k3a/m6SSDPevREnSrpo03JPsD1wJnA4sAJYmWTBBu1nAHwLf7XeRkqRd00vP/QRgQ1U9VlUvACuBJRO0+7fA54Dn+1ifJGk39BLuhwFPjFse7a57RZLjgHlVtdMrNiW5KMlIkpGxsbFdLlaS1Js9nlBNsh/wH4A/nqxtVa2oquGqGh4aGtrTl5Yk7UAv4b4JmDdueW533VazgKOBbybZCJwErHJSVZKa00u4rwWOTHJEkgOAc4FVWzdW1U+rak5Vza+q+cB3gLOqamSvVCxJmtSk4V5VLwEXA3cADwM3VdVDSS5LctbeLlCStOt6uhNTVa0GVm+37tIdtD15z8uS1A/elWpweZs9SQNh0P7QefkBSWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaiHDXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaiHDXZJayHCXpBYy3CWphXoK9ySLkzySZEOS5RNs/6Mk65Pcn+R/JTm8/6VKkno1abgn2R+4EjgdWAAsTbJgu2b3AsNVdQxwM/D5fhcqSepdLz33E4ANVfVYVb0ArASWjG9QVXdX1XPdxe8Ac/tbpiRpV/QS7ocBT4xbHu2u25ELgNsm2pDkoiQjSUbGxsZ6r1KStEv6OqGa5DxgGPjCRNurakVVDVfV8NDQUD9fWpI0zowe2mwC5o1bnttd9ypJ3g98EnhfVf19f8qTJO2OXnrua4EjkxyR5ADgXGDV+AZJ3gV8CTirqp7uf5mSpF0xabhX1UvAxcAdwMPATVX1UJLLkpzVbfYF4GDgq0m+l2TVDp5OkjQFehmWoapWA6u3W3fpuMfv73NdkqQ94BmqktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSC/UU7kkWJ3kkyYYkyyfYfmCSr3S3fzfJ/H4XKknq3aThnmR/4ErgdGABsDTJgu2aXQA8U1VvBf4j8Ll+FypJ6l0vPfcTgA1V9VhVvQCsBJZs12YJ8F+7j28GTk2S/pUpSdoVqaqdN0jOARZX1YXd5d8HTqyqi8e1ebDbZrS7/INumx9v91wXARd1F38TeKRf/5A9MAf48aStBoP7Yhv3xTbui232hX1xeFUNTdZoxlRUslVVrQBWTOVrTibJSFUNN13HvsB9sY37Yhv3xTbTaV/0MiyzCZg3bnlud92EbZLMAA4BNvejQEnSrusl3NcCRyY5IskBwLnAqu3arAL+affxOcA3arLxHknSXjPpsExVvZTkYuAOYH/guqp6KMllwEhVrQKuBb6cZAOwhc4fgOlinxomapj7Yhv3xTbui22mzb6YdEJVkjT9eIaqJLWQ4S5JLWS4S1ILGe6S1EJTehJTk5KsA74O3FhVP2i6nqYl2Q+gqn7ZPcT1aGBjVW1ptrKpleQ1wMVAAV+kc6TXB4G/AS6rql80WF5jkrweeLmqftZ0LU1KMkznHJ6XgUer6m8aLqlng9Rzfz3wOuDuJP83yceTHNp0UU1I8jvAU8CmJEuANcAXgPuT/ONGi5t61wNvBI4AbgWG6eyLAFc3V9bUS3JokhuS/JTOKfYPJvlhkk8nmdl0fVMpyfuSjACfBa6jc9mUa5N8M8m8nf/0PqKqBuILWDfu8ULgKuBHwN3ARU3XN8X74l7gH9AJtJ8Bv9ldfzidcxcar3EK98X3ut/T/X3IuOX7m65vivfFN4CTu48/SOcKr78O/DtgRdP1TfG+uBcY6j4+Avjv3ccfAO5sur5evgap5/7KVSqrak1V/SvgMDqXJ/6txqpqSFX9qKoeB35YVY901/0tg/Vp7hXVeeeu7n7fujxoJ4HMrqpvAlTV14FFVfVsVV0CLGq0sqm3f1WNdR//kE7Hh6q6i05u7PMGZsydCa5AWVUvA7d3vwZKkv2q6pfA+ePW7Q8c0FxVjRhJcnBV/aKqxu+LtwA/b7CuJowlOY/Op9kPAhsBupfvHrQ/+iNJrqXzaeYs4JvwyhzN/g3W1bOBOUM1yb+m89HqiaZraVqSdwMPVNXz262fD/zDqvpvTdTVlCQn0Omsr+3eiGYxnc7AKz35QZDkzcDldG7K8z1gWVU9lWQ2neGarzVa4BTqzjF8hM6+uI/OZVdeTvJrwG90P+Xu0wYp3H8KPAv8ALgR+Oq4j10DL8nsqhq4K3km+RSdu4zNAO4CTqTTc/0AcEdV/fsGy5N22yCF+73A8cD7gQ/T+ah1D52g/3pVDcxH8CSfBS6vqh93D/W6CfglMBP4g6r6340WOIWSPAAcCxxIZ0J1blX9rNtD+25VHdNogVOoe7nuC4DfYdu48ibgr4Brq+rFpmqbakleC/wJnUuc31ZVfzlu21XdObt92iCNo1VV/bKq7qyqC4BD6Rwxsxh4rNnSptyZte0uWV8APlyd+99+APjT5spqxEtV9XJVPQf8oLrHdVfV/6PzB2+QfJnOH7rPAGd0vz4DvBMYqKE64C/oHITxNeDcJF9LcmB320nNldW7QZpQfdU9Xbu9kFXAqu4kySCZkWRGVb0E/FpVrQWoqkfH/QIPiheSvKYb7sdvXZnkEAYv3I+vqrdtt24U+E6SR5soqEFvqarf7T6+JckngW8kOavJonbFIPXcP7yjDd039iC5Clid5LeB25P8Wfekjc/QmUgbJIu2/v93jx7aaibbbkAzKLYk+b2tZy9D56iqJB8GnmmwriYcOH4/dOdergG+BcxurKpdMDBj7nq1JCcD/xJ4G51PcE8AtwB/MUhjq9qme7TU54BTgJ90V7+OzgTz8u55EQMhyefpnKz0P7dbvxj4YlUd2UxlvTPcB1SSt9OZNPtujbt+SpLFVTVwx/2rI8mJdE7e+gHwdjon+K2vqtWNFtaAnbxHTq+q25qrrDeG+wDqHvP/UeBhOhNof1hVf9Xdtq6qjmuyPjVjgsNCT6Bz8s7AHRaa5GN0Lig3bd8jgzShqm0+Qmfy7Bfdj+I3J5lfVX/GdhPPGijnMPFhoZcD3wUGJtzpXChsWr9HDPfBtN/Wj5lVtbE7/n5zksOZJr+42ite6l6S47kkrzosNMmgHTk07d8jg3S0jLb5uyTHbl3o/hL/I2AO8I7GqlLTXhh3WPCgHxY67d8jjrkPoCRz6fTSfjTBtvdW1V83UJYaluTAqvr7CdbPAd5UVQ80UFYj2vAeMdwlqYUclpGkFjLcJamFDHdJaiHDXZJa6P8DEVM3h9QH/TcAAAAASUVORK5CYII=\n", + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEFCAYAAAAIZiutAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAFQVJREFUeJzt3X+Q3XV97/HnGwikkJQfYZtKkpL0kmlFVMQ1pDrJgOklQbwNOMiQqZIBbDoVrbV3cg0Xx1QsM8hYQanmihIFxxqY+INM+ZnyQ9Op0oQgAomajESyIUqaBbRQCsH3/eN8Qg75bMiyZ7PfTfb5mNnZ7/fz+Xy/+97DHl75fr6fc05kJpIktTuo6QIkScOP4SBJqhgOkqSK4SBJqhgOkqSK4SBJqhgOkqSK4SBJqhgOkqTKIU0XMFDHHntsTp48uekyJGm/8cADD/xHZnb1Z+x+Gw6TJ09mzZo1TZchSfuNiPhFf8c6rSRJqhgOkqSK4SBJquy39xwkaU9efPFFenp6eP7555supRGjR49m4sSJjBo1asDnMBwkHXB6enoYO3YskydPJiKaLmdIZSbbt2+np6eHKVOmDPg8TitJOuA8//zzjBs3bsQFA0BEMG7cuI6vmgwHSQekkRgMOw3G7244SJIq3nOQdMCbvOjWQT3fpivPetX+008/nUWLFjF79uyX26655hoeeughfvOb37B8+fJBrWdfGLHhMNh/LAO1tz8ySfufefPmsWzZsleEw7Jly7jqqquYOXNmg5X1n9NKkjTIzj33XG699VZeeOEFADZt2sQTTzzBpEmTOOmkkwB46aWXWLhwIW9729t405vexJe+9CUALrnkElasWAHAOeecw0UXXQTA0qVLueyyy3j22Wc566yzePOb38xJJ53ETTfdtE9+B8NBkgbZMcccw7Rp07j99tuB1lXDeeed94obxddffz1HHnkkq1evZvXq1Xz5y1/mscceY8aMGaxatQqALVu2sG7dOgBWrVrFzJkzueOOOzjuuON46KGHeOSRR5gzZ84++R0MB0naB3ZOLUErHObNm/eK/rvuuosbb7yRk08+mVNPPZXt27ezYcOGl8Nh3bp1nHjiiYwfP56tW7fygx/8gLe//e288Y1vZOXKlXzsYx9j1apVHHnkkfuk/hF7z0GS9qW5c+fy0Y9+lLVr1/Lcc8/x1re+lU2bNr3cn5lce+21r7gvsdPTTz/NHXfcwcyZM+nt7eXmm29mzJgxjB07lrFjx7J27Vpuu+02Pv7xjzNr1iw+8YlPDHr9XjlI0j4wZswYTj/9dC666KLqqgFg9uzZLFmyhBdffBGAn/3sZzz77LMATJ8+nWuuuYaZM2cyY8YMPvOZzzBjxgwAnnjiCQ4//HDe9773sXDhQtauXbtP6vfKQdIBr6lVgfPmzeOcc855eXqp3Qc+8AE2bdrEKaecQmbS1dXFd7/7XQBmzJjBXXfdxQknnMDxxx9Pb2/vy+Hw8MMPs3DhQg466CBGjRrFkiVL9kntkZn75MT7Wnd3d3byYT8uZZUOXOvXr+f1r39902U0qq/HICIeyMzu/hzvtJIkqWI4SJIqhoOkA9L+OmU+GAbjdzccJB1wRo8ezfbt20dkQOz8PIfRo0d3dB5XK0k64EycOJGenh62bdvWdCmN2PlJcJ0wHCQdcEaNGtXRp6DJaSVJUh8MB0lSxXCQJFUMB0lSxXCQJFUMB0lSxXCQJFX2Gg4RsTQinoyIR9rajomIlRGxoXw/urRHRHw+IjZGxI8j4pS2Y+aX8RsiYn5b+1sj4uFyzOej/XP0JEmN6M+Vw9eA3T+kdBFwd2ZOBe4u+wBnAlPL1wJgCbTCBFgMnApMAxbvDJQy5i/ajts3H4gqSeq3vYZDZn4f6N2teS5wQ9m+ATi7rf3GbPkhcFREvA6YDazMzN7MfApYCcwpfb+bmT/M1pug3Nh2LklSQwZ6z2F8Zm4t278ExpftCcDmtnE9pe3V2nv6aJckNajjG9LlX/xD8taHEbEgItZExJqR+oZakjQUBhoOvypTQpTvT5b2LcCktnETS9urtU/so71PmXldZnZnZndXV9cAS5ck7c1Aw2EFsHPF0Xzglrb2C8qqpenAM2X66U7gjIg4utyIPgO4s/T9OiKml1VKF7SdS5LUkL2+ZXdEfBM4DTg2InporTq6Erg5Ii4GfgGcV4bfBrwL2Ag8B1wIkJm9EfEpYHUZd3lm7rzJ/UFaK6J+B7i9fEmSGrTXcMjMeXvomtXH2AQu2cN5lgJL+2hfA5y0tzokSUPHV0hLkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiodhUNEfDQiHo2IRyLimxExOiKmRMT9EbExIm6KiEPL2MPK/sbSP7ntPJeW9p9GxOzOfiVJUqcGHA4RMQH4a6A7M08CDgbOBz4NXJ2ZJwBPAReXQy4GnirtV5dxRMSJ5bg3AHOAL0bEwQOtS5LUuU6nlQ4BficiDgEOB7YC7wSWl/4bgLPL9tyyT+mfFRFR2pdl5n9n5mPARmBah3VJkjow4HDIzC3AZ4DHaYXCM8ADwNOZuaMM6wEmlO0JwOZy7I4yflx7ex/HSJIa0Mm00tG0/tU/BTgOOILWtNA+ExELImJNRKzZtm3bvvxRkjSidTKt9KfAY5m5LTNfBL4NvAM4qkwzAUwEtpTtLcAkgNJ/JLC9vb2PY14hM6/LzO7M7O7q6uqgdEnSq+kkHB4HpkfE4eXewSxgHXAvcG4ZMx+4pWyvKPuU/nsyM0v7+WU10xRgKvDvHdQlSerQIXsf0rfMvD8ilgNrgR3Ag8B1wK3Asoj4+9J2fTnkeuDrEbER6KW1QonMfDQibqYVLDuASzLzpYHWJUnq3IDDASAzFwOLd2v+OX2sNsrM54H37uE8VwBXdFKLJGnw+AppSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLlkKYLUPMmL7q16RIA2HTlWU2XIKno6MohIo6KiOUR8ZOIWB8RfxIRx0TEyojYUL4fXcZGRHw+IjZGxI8j4pS288wv4zdExPxOfylJUmc6nVb6HHBHZv4x8GZgPbAIuDszpwJ3l32AM4Gp5WsBsAQgIo4BFgOnAtOAxTsDRZLUjAGHQ0QcCcwErgfIzBcy82lgLnBDGXYDcHbZngvcmC0/BI6KiNcBs4GVmdmbmU8BK4E5A61LktS5Tq4cpgDbgK9GxIMR8ZWIOAIYn5lby5hfAuPL9gRgc9vxPaVtT+2ViFgQEWsiYs22bds6KF2S9Go6CYdDgFOAJZn5FuBZdk0hAZCZCWQHP+MVMvO6zOzOzO6urq7BOq0kaTedhEMP0JOZ95f95bTC4ldluojy/cnSvwWY1Hb8xNK2p3ZJUkMGHA6Z+Utgc0T8UWmaBawDVgA7VxzNB24p2yuAC8qqpenAM2X66U7gjIg4utyIPqO0SZIa0unrHD4MfCMiDgV+DlxIK3BujoiLgV8A55WxtwHvAjYCz5WxZGZvRHwKWF3GXZ6ZvR3WJUnqQEfhkJk/Arr76JrVx9gELtnDeZYCSzupRZI0eHyFtNTGV4tLLb63kiSpYjhIkiqGgySpYjhIkirekJbUJ2/Oj2xeOUiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKnih/1I0l6MxA8+8spBklQxHCRJFcNBklQxHCRJlY7DISIOjogHI+Kfy/6UiLg/IjZGxE0RcWhpP6zsbyz9k9vOcWlp/2lEzO60JklSZwbjyuEjwPq2/U8DV2fmCcBTwMWl/WLgqdJ+dRlHRJwInA+8AZgDfDEiDh6EuiRJA9RROETEROAs4CtlP4B3AsvLkBuAs8v23LJP6Z9Vxs8FlmXmf2fmY8BGYFondUmSOtPplcM1wP8Bflv2xwFPZ+aOst8DTCjbE4DNAKX/mTL+5fY+jpEkNWDA4RAR7waezMwHBrGevf3MBRGxJiLWbNu2bah+rCSNOJ1cObwD+LOI2AQsozWd9DngqIjY+crricCWsr0FmARQ+o8Etre393HMK2TmdZnZnZndXV1dHZQuSXo1Aw6HzLw0Mydm5mRaN5Tvycw/B+4Fzi3D5gO3lO0VZZ/Sf09mZmk/v6xmmgJMBf59oHVJkjq3L95b6WPAsoj4e+BB4PrSfj3w9YjYCPTSChQy89GIuBlYB+wALsnMl/ZBXZKkfhqUcMjM+4D7yvbP6WO1UWY+D7x3D8dfAVwxGLVIkjrnK6QlSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUGXA4RMSkiLg3ItZFxKMR8ZHSfkxErIyIDeX70aU9IuLzEbExIn4cEae0nWt+Gb8hIuZ3/mtJkjrRyZXDDuB/Z+aJwHTgkog4EVgE3J2ZU4G7yz7AmcDU8rUAWAKtMAEWA6cC04DFOwNFktSMAYdDZm7NzLVl+zfAemACMBe4oQy7ATi7bM8FbsyWHwJHRcTrgNnAyszszcyngJXAnIHWJUnq3KDcc4iIycBbgPuB8Zm5tXT9EhhfticAm9sO6ylte2rv6+csiIg1EbFm27Ztg1G6JKkPHYdDRIwBvgX8TWb+ur0vMxPITn9G2/muy8zuzOzu6uoarNNKknbTUThExChawfCNzPx2af5VmS6ifH+ytG8BJrUdPrG07aldktSQTlYrBXA9sD4zP9vWtQLYueJoPnBLW/sFZdXSdOCZMv10J3BGRBxdbkSfUdokSQ05pINj3wG8H3g4In5U2v4vcCVwc0RcDPwCOK/03Qa8C9gIPAdcCJCZvRHxKWB1GXd5ZvZ2UJckqUMDDofM/Fcg9tA9q4/xCVyyh3MtBZYOtBZJ0uDyFdKSpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpMqwCYeImBMRP42IjRGxqOl6JGkkGxbhEBEHA18AzgROBOZFxInNViVJI9ewCAdgGrAxM3+emS8Ay4C5DdckSSPWcAmHCcDmtv2e0iZJakBkZtM1EBHnAnMy8wNl//3AqZn5od3GLQAWlN0/An46pIXWjgX+o+Eahgsfi118LHbxsdhlODwWx2dmV38GHrKvK+mnLcCktv2Jpe0VMvM64LqhKmpvImJNZnY3Xcdw4GOxi4/FLj4Wu+xvj8VwmVZaDUyNiCkRcShwPrCi4ZokacQaFlcOmbkjIj4E3AkcDCzNzEcbLkuSRqxhEQ4AmXkbcFvTdbxGw2aKaxjwsdjFx2IXH4td9qvHYljckJYkDS/D5Z6DJGkYMRwkSRXDQZJUMRwkSRXDYRBExO1N19C0iPhg0zU0ISJ+PyKWRMQXImJcRPxdRDwcETdHxOuarm+oRMSHIuLYsn1CRHw/Ip6OiPsj4o1N1zccRMSfNV3DazFslrIOdxFxyp66gJOHspamRcTf7t4EXBoRowEy87NDX1VjvgbcChwB3At8A3gXcDbw/xg5byD5V5n5j2X7c8DVmfmdiDiN1uPwjsYqa0BEvGf3JuALEXEIQGZ+e+irem0Mh/5bDXyP1n/k3R01xLU07ZO0XpPyKLsej4OBsY1V1JzxmXkttK6eMvPTpf3aiLi4wbqGWvv/S34vM78DkJn3RcRI/Lu4idaLep9k13PkCOB/AQkYDgeQ9cBfZuaG3TsiYnMf4w9kbwD+gdYf+ycz87mImJ+Zn2y4ria0T83euFvfwUNZSMOWR8TXgMuB70TE3wDfAd4JPN5kYQ15O3AlsDozlwBExGmZeWGzZfWf9xz67+/Y8+P14SGso3GZ+Xhmvhf4N2BleVfdkeqWiBgDkJkf39kYESfQ/LsGD5nMvIzWlfU3gb8FPgXcDkwF/rzB0hqRmauB/wkcGhH3RsQ0WlcM+w1fIf0aRMQfAu+h9Q6yLwE/A/4pM3/daGENiogjaAXnqZk5s+FyGhERf0zr80fuz8z/bGufk5l3NFdZsyLi65n5/qbraFpEHAdcA3Rn5h82XU9/GQ79FBF/Dbwb+D6tG44PAk8D5wAfzMz7mqtOTYmIDwMfojXteDLwkcy8pfStzcw9LWQ4oEREX++i/E7gHoDM3K9W6shw6LeIeBg4OTNfiojDgdsy87SI+APglsx8S8MlDpmI+F3gUlqfu3F7Zv5TW98XM3PELGstfxd/kpn/GRGTgeXA1zPzcxHx4Ej5u4iItcA64Cu0pk+C1hTT+QCZ+b3mqht6EfH7wGLgt8AnaE09vwf4Ca1/QGxtsLx+8Z7Da7PzBv5hwM555seBUY1V1Iyv0nryfws4PyK+FRGHlb7pzZXViIN2TiVl5ibgNODMiPgsfa9sO1B1Aw8AlwHPlCvp/8rM7420YCi+RissN9Na4vxfwFnAKlpLe4c9w6H/vgKsjogvAz8AvgAQEV1Ab5OFNeB/ZOaizPxumS5YC9wTEeOaLqwBv4qIl1/nUoLi3bQ+EnLEvPgrM3+bmVcDFwKXRcQ/MrJXQ47PzGsz80rgqMz8dGZuLsuej2+6uP4Yyf/xXpMyTfAvwOuBf8jMn5T2bcBIuxF7WEQclJm/BcjMKyJiC637MWOaLW3IXQDsaG/IzB3ABRHxpWZKak5m9gDvjYizgBG7UIMDYImz9xz0mkXEVcBdmfkvu7XPAa7NzKnNVCYNDxFxOXBV++q10n4CcGVmDvvl34aDBlVEXJiZX226Dmm42l+eI4aDBlVEPJ6Zf9B0HdJwtb88R7znoNcsIn68py5g/FDWIg1HB8JzxHDQQIwHZgNP7dYetN5SQxrp9vvniOGggfhnYExm/mj3joi4b+jLkYad/f454j0HSVLFF8FJkiqGgySpYjhIkiqGgySpYjhIkir/H+DWFFYWxutlAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] @@ -5675,6 +1751,13 @@ "plt.show()" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "code", "execution_count": null, @@ -5699,7 +1782,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.9" } }, "nbformat": 4, diff --git a/scripts/1.python_wrap_lines.py b/scripts/1.python_wrap_lines.py new file mode 100644 index 0000000..2db7976 --- /dev/null +++ b/scripts/1.python_wrap_lines.py @@ -0,0 +1,42 @@ +import os + +size = 80 +file = 'budo' +folder = os.path.expanduser('~/Documents/Fortunes/') + +# Read and store the entire file line by line +with open(f'{folder}{file}.txt') as reader: + provers = reader.readlines() + +# wrap/collate lines by separators [",", " ", "."] +def collate(text, size): + new_text = [] + split_char = 1 + while split_char > 0: + comma = str.find(text, ',', size) + space = str.find(text, ' ', size) + dot = str.find(text, '.', size) + + split_char = min(max(comma, dot), max(comma, space), max(dot, space)) + + if text[:split_char]: + new_text.append(text[:split_char]) + text = text[split_char+1:].replace('\n', "") + + return new_text + +# write collated information to new(same) file +with open(f'{folder}{file}.txt', 'w') as writer: + for wisdom in provers: + if len(wisdom) > size: + collated = collate(wisdom, size) + for short in collated: + writer.write(short) + writer.write('\n') + else: + writer.write(wisdom) + +# Executing Shell Commands with Python +import os +myCmd = f'strfile -c % {folder}{file}.txt {folder}{file}.txt.dat' +os.system(myCmd) \ No newline at end of file diff --git a/scripts/__init__.py b/scripts/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/test.py b/test.py index e31890f..9515637 100644 --- a/test.py +++ b/test.py @@ -1,5 +1,5 @@ import urllib.parse -f = 'Pandas count values in a column of type list' +f = '25 Pandas Create A Matplotlib Scatterplot From A Dataframe ' ff = urllib.parse.quote_plus(f) print(ff.replace('+', '_')) \ No newline at end of file