From 5c79ef5507491518c1b35c4347dfcd33125a560e Mon Sep 17 00:00:00 2001 From: VanX Date: Mon, 28 Jan 2019 19:32:17 +0200 Subject: [PATCH 01/76] add Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2 --- notebooks/Python Extract Table from PDF.ipynb | 3410 +++++++++++++++++ 1 file changed, 3410 insertions(+) create mode 100644 notebooks/Python Extract Table from PDF.ipynb diff --git a/notebooks/Python Extract Table from PDF.ipynb b/notebooks/Python Extract Table from PDF.ipynb new file mode 100644 index 0000000..47add32 --- /dev/null +++ b/notebooks/Python Extract Table from PDF.ipynb @@ -0,0 +1,3410 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Python Extract Table from PDF" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Example PDFs\n", + "\n", + "* McKinsey Global Institute Disruptive technologies\n", + "\n", + "https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Digital/Our%20Insights/Disruptive%20technologies/MGI_Disruptive_technologies_Full_report_May2013.ashx\n", + "\n", + "* Food Calories List\n", + "\n", + "http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## With tabula-py\n", + "\n", + "#### Installation\n", + "\n", + "https://pypi.org/project/tabula-py/\n", + "\n", + "`pip install tabula-py`\n", + "\n", + "#### tabula-py docs\n", + "\n", + "https://www.pydoc.io/pypi/tabula-py-0.9.0/autoapi/wrapper/index.html" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "from tabula import read_pdf\n", + "from tabulate import tabulate" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
BREADS & CEREALSPortion size *per 100 grams (3.5 oz)Unnamed: 3energy content
0Bagel ( 1 average )140 cals (45g)310 calsNaNMedium
1Biscuit digestives86 cals (per biscuit)480 calsNaNHigh
2Jaffa cake48 cals (per biscuit)370 calsNaNMed-High
3Bread white (thick slice)96 cals (1 slice 40g)240 calsNaNMedium
4Bread wholemeal (thick)88 cals (1 slice 40g)220 calsNaNLow-med
5Chapatis250 cals300 calsNaNMedium
6Cornflakes130 cals (35g)370 calsNaNMed-High
7Crackerbread17 cals per slice325 calsNaNLow Calorie
8Cream crackers35 cals (per cracker)440 calsNaNLow / portion
9Crumpets93 cals (per crumpet)198 calsNaNLow-Med
10Flapjacks basic fruit mix320 cals500 calsNaNHigh
11Macaroni (boiled)238 cals (250g)95 calsNaNLow calorie
12Muesli195 cals (50g)390 calsNaNMed-high
13Naan bread (normal)300 cals (small plate size)320 calsNaNMedium
14Noodles (boiled)175 cals (250g)70 calsNaNLow calorie
15Pasta ( normal boiled )330 cals (300g)110 calsNaNLow calorie
16Pasta (wholemeal boiled )315 cals (300g)105 calsNaNLow calorie
17Porridge oats (with water)193 cals (350g)55 calsNaNLow calorie
18Potatoes** (boiled)210 cals (300g)70 calsNaNLow calorie
19Potatoes** (roast)420 cals (300g)140 calsNaNMedium
\n", + "
" + ], + "text/plain": [ + " BREADS & CEREALS Portion size * \\\n", + "0 Bagel ( 1 average ) 140 cals (45g) \n", + "1 Biscuit digestives 86 cals (per biscuit) \n", + "2 Jaffa cake 48 cals (per biscuit) \n", + "3 Bread white (thick slice) 96 cals (1 slice 40g) \n", + "4 Bread wholemeal (thick) 88 cals (1 slice 40g) \n", + "5 Chapatis 250 cals \n", + "6 Cornflakes 130 cals (35g) \n", + "7 Crackerbread 17 cals per slice \n", + "8 Cream crackers 35 cals (per cracker) \n", + "9 Crumpets 93 cals (per crumpet) \n", + "10 Flapjacks basic fruit mix 320 cals \n", + "11 Macaroni (boiled) 238 cals (250g) \n", + "12 Muesli 195 cals (50g) \n", + "13 Naan bread (normal) 300 cals (small plate size) \n", + "14 Noodles (boiled) 175 cals (250g) \n", + "15 Pasta ( normal boiled ) 330 cals (300g) \n", + "16 Pasta (wholemeal boiled ) 315 cals (300g) \n", + "17 Porridge oats (with water) 193 cals (350g) \n", + "18 Potatoes** (boiled) 210 cals (300g) \n", + "19 Potatoes** (roast) 420 cals (300g) \n", + "\n", + " per 100 grams (3.5 oz) Unnamed: 3 energy content \n", + "0 310 cals NaN Medium \n", + "1 480 cals NaN High \n", + "2 370 cals NaN Med-High \n", + "3 240 cals NaN Medium \n", + "4 220 cals NaN Low-med \n", + "5 300 cals NaN Medium \n", + "6 370 cals NaN Med-High \n", + "7 325 cals NaN Low Calorie \n", + "8 440 cals NaN Low / portion \n", + "9 198 cals NaN Low-Med \n", + "10 500 cals NaN High \n", + "11 95 cals NaN Low calorie \n", + "12 390 cals NaN Med-high \n", + "13 320 cals NaN Medium \n", + "14 70 cals NaN Low calorie \n", + "15 110 cals NaN Low calorie \n", + "16 105 cals NaN Low calorie \n", + "17 55 cals NaN Low calorie \n", + "18 70 cals NaN Low calorie \n", + "19 140 cals NaN Medium " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\")\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
BREADS & CEREALSPortion size *per 100 grams (3.5 oz)energy content
0Bagel ( 1 average )140 cals (45g)310 calsMedium
1Biscuit digestives86 cals (per biscuit)480 calsHigh
2Jaffa cake48 cals (per biscuit)370 calsMed-High
3Bread white (thick slice)96 cals (1 slice 40g)240 calsMedium
4Bread wholemeal (thick)88 cals (1 slice 40g)220 calsLow-med
5Chapatis250 cals300 calsMedium
6Cornflakes130 cals (35g)370 calsMed-High
7Crackerbread17 cals per slice325 calsLow Calorie
8Cream crackers35 cals (per cracker)440 calsLow / portion
9Crumpets93 cals (per crumpet)198 calsLow-Med
10Flapjacks basic fruit mix320 cals500 calsHigh
11Macaroni (boiled)238 cals (250g)95 calsLow calorie
12Muesli195 cals (50g)390 calsMed-high
13Naan bread (normal)300 cals (small plate size)320 calsMedium
14Noodles (boiled)175 cals (250g)70 calsLow calorie
15Pasta ( normal boiled )330 cals (300g)110 calsLow calorie
16Pasta (wholemeal boiled )315 cals (300g)105 calsLow calorie
17Porridge oats (with water)193 cals (350g)55 calsLow calorie
18Potatoes** (boiled)210 cals (300g)70 calsLow calorie
19Potatoes** (roast)420 cals (300g)140 calsMedium
\n", + "
" + ], + "text/plain": [ + " BREADS & CEREALS Portion size * \\\n", + "0 Bagel ( 1 average ) 140 cals (45g) \n", + "1 Biscuit digestives 86 cals (per biscuit) \n", + "2 Jaffa cake 48 cals (per biscuit) \n", + "3 Bread white (thick slice) 96 cals (1 slice 40g) \n", + "4 Bread wholemeal (thick) 88 cals (1 slice 40g) \n", + "5 Chapatis 250 cals \n", + "6 Cornflakes 130 cals (35g) \n", + "7 Crackerbread 17 cals per slice \n", + "8 Cream crackers 35 cals (per cracker) \n", + "9 Crumpets 93 cals (per crumpet) \n", + "10 Flapjacks basic fruit mix 320 cals \n", + "11 Macaroni (boiled) 238 cals (250g) \n", + "12 Muesli 195 cals (50g) \n", + "13 Naan bread (normal) 300 cals (small plate size) \n", + "14 Noodles (boiled) 175 cals (250g) \n", + "15 Pasta ( normal boiled ) 330 cals (300g) \n", + "16 Pasta (wholemeal boiled ) 315 cals (300g) \n", + "17 Porridge oats (with water) 193 cals (350g) \n", + "18 Potatoes** (boiled) 210 cals (300g) \n", + "19 Potatoes** (roast) 420 cals (300g) \n", + "\n", + " per 100 grams (3.5 oz) energy content \n", + "0 310 cals Medium \n", + "1 480 cals High \n", + "2 370 cals Med-High \n", + "3 240 cals Medium \n", + "4 220 cals Low-med \n", + "5 300 cals Medium \n", + "6 370 cals Med-High \n", + "7 325 cals Low Calorie \n", + "8 440 cals Low / portion \n", + "9 198 cals Low-Med \n", + "10 500 cals High \n", + "11 95 cals Low calorie \n", + "12 390 cals Med-high \n", + "13 320 cals Medium \n", + "14 70 cals Low calorie \n", + "15 110 cals Low calorie \n", + "16 105 cals Low calorie \n", + "17 55 cals Low calorie \n", + "18 70 cals Low calorie \n", + "19 140 cals Medium " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\")\n", + "df = df.dropna(axis='columns')\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "-- ------------------------ ----------------- -------- -----------\n", + " 0 Fish fingers 50 cals per piece 220 cals Medium\n", + " 1 Gammon 320 cals 280 cals Med-High\n", + " 2 Haddock fresh 200 cals 110 cals Low calorie\n", + " 3 Halibut fresh 220 cals 125 cals Low calorie\n", + " 4 Ham 6 cals 240 cals Medium\n", + " 5 Herring fresh grilled 300 cals 200 cals Medium\n", + " 6 Kidney 200 cals 160 cals Medium\n", + " 7 Kipper 200 cals 120 cals Low calorie\n", + " 8 Liver 200 cals 150 cals Medium\n", + " 9 Liver pate 150 cals 300 cals Medium\n", + "10 Lamb (roast) 300 cals 300 cals Med-High\n", + "11 Lobster boiled 200 cals 100 cals Low calorie\n", + "12 Luncheon meat 300 cals 400 cals High\n", + "13 Mackeral 320 cals 300 cals Medium\n", + "14 Mussels 90 cals 90 cals Low-Med\n", + "15 Pheasant roast 200 cals 200 cals Medium\n", + "16 Pilchards (tinned) 140 cals 140 cals Medium\n", + "17 Prawns 180 cals 100 cals Low- Med\n", + "18 Pork 320 cals 290 cals Med-High\n", + "19 Pork pie 320 cals 450 cals High\n", + "20 Rabbit 200 cals 180 cals Medium\n", + "21 Salmon fresh 220 cals 180 cals Medium\n", + "22 Sardines tinned in oil 220 cals 220 cals Medium\n", + "23 Sardines in tomato sauce 180 cals 180 cals Medium\n", + "24 Sausage pork fried 250 cals 320 cals High\n", + "25 Sausage pork grilled 220 cals 280 cals Med-High\n", + "26 Sausage roll 290 cals 480 cals High\n", + "27 Scampi fried in oil 400 cals 340 cals High\n", + "28 Steak & kidney pie 400 cals 350 cals High\n", + "-- ------------------------ ----------------- -------- -----------\n" + ] + } + ], + "source": [ + "df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\", pages=3)\n", + "print (tabulate(df))" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'extraction_method': 'stream',\n", + " 'top': 0.0,\n", + " 'left': 0.0,\n", + " 'width': 524.6400146484375,\n", + " 'height': 725.6300048828125,\n", + " 'data': [[{'top': 65.19,\n", + " 'left': 120.24,\n", + " 'width': 48.599998474121094,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Fish cake'},\n", + " {'top': 65.19,\n", + " 'left': 241.2,\n", + " 'width': 79.91999816894531,\n", + " 'height': 7.880000114440918,\n", + " 'text': '90 cals per cake'},\n", + " {'top': 65.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '200 cals'},\n", + " {'top': 65.19,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 87.75,\n", + " 'left': 114.6,\n", + " 'width': 60.00000762939453,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Fish fingers'},\n", + " {'top': 87.75,\n", + " 'left': 239.52,\n", + " 'width': 83.27998352050781,\n", + " 'height': 7.880000114440918,\n", + " 'text': '50 cals per piece'},\n", + " {'top': 87.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '220 cals'},\n", + " {'top': 87.75,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 110.19,\n", + " 'left': 120.72,\n", + " 'width': 47.63999938964844,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Gammon'},\n", + " {'top': 110.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '320 cals'},\n", + " {'top': 110.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '280 cals'},\n", + " {'top': 110.19,\n", + " 'left': 467.76,\n", + " 'width': 53.03997802734375,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Med-High'}],\n", + " [{'top': 132.75,\n", + " 'left': 107.88,\n", + " 'width': 73.31999969482422,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Haddock fresh'},\n", + " {'top': 132.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '200 cals'},\n", + " {'top': 132.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '110 cals'},\n", + " {'top': 132.75,\n", + " 'left': 464.04,\n", + " 'width': 60.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Low calorie'}],\n", + " [{'top': 155.19,\n", + " 'left': 111.6,\n", + " 'width': 66.00000762939453,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Halibut fresh'},\n", + " {'top': 155.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '220 cals'},\n", + " {'top': 155.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '125 cals'},\n", + " {'top': 155.19,\n", + " 'left': 464.04,\n", + " 'width': 60.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Low calorie'}],\n", + " [{'top': 177.75,\n", + " 'left': 131.4,\n", + " 'width': 26.279998779296875,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Ham'},\n", + " {'top': 177.75,\n", + " 'left': 265.92,\n", + " 'width': 30.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '6 cals'},\n", + " {'top': 177.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '240 cals'},\n", + " {'top': 177.75,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 200.19,\n", + " 'left': 93.72,\n", + " 'width': 101.63999938964844,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Herring fresh grilled'},\n", + " {'top': 200.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '300 cals'},\n", + " {'top': 200.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '200 cals'},\n", + " {'top': 200.19,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 222.75,\n", + " 'left': 125.4,\n", + " 'width': 38.279991149902344,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Kidney'},\n", + " {'top': 222.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '200 cals'},\n", + " {'top': 222.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '160 cals'},\n", + " {'top': 222.75,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 245.19,\n", + " 'left': 126.36,\n", + " 'width': 36.36000061035156,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Kipper'},\n", + " {'top': 245.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '200 cals'},\n", + " {'top': 245.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '120 cals'},\n", + " {'top': 245.19,\n", + " 'left': 464.04,\n", + " 'width': 60.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Low calorie'}],\n", + " [{'top': 267.75,\n", + " 'left': 130.08,\n", + " 'width': 29.039993286132812,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Liver'},\n", + " {'top': 267.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '200 cals'},\n", + " {'top': 267.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '150 cals'},\n", + " {'top': 267.75,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 290.19,\n", + " 'left': 118.56,\n", + " 'width': 51.96000671386719,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Liver pate'},\n", + " {'top': 290.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '150 cals'},\n", + " {'top': 290.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '300 cals'},\n", + " {'top': 290.19,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 312.75,\n", + " 'left': 111.96,\n", + " 'width': 65.2800064086914,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Lamb (roast)'},\n", + " {'top': 312.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '300 cals'},\n", + " {'top': 312.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '300 cals'},\n", + " {'top': 312.75,\n", + " 'left': 467.76,\n", + " 'width': 53.03997802734375,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Med-High'}],\n", + " [{'top': 335.19,\n", + " 'left': 108.24,\n", + " 'width': 72.5999984741211,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Lobster boiled'},\n", + " {'top': 335.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '200 cals'},\n", + " {'top': 335.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '100 cals'},\n", + " {'top': 335.19,\n", + " 'left': 464.04,\n", + " 'width': 60.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Low calorie'}],\n", + " [{'top': 357.75,\n", + " 'left': 105.96,\n", + " 'width': 77.2800064086914,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Luncheon meat'},\n", + " {'top': 357.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '300 cals'},\n", + " {'top': 357.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '400 cals'},\n", + " {'top': 357.75,\n", + " 'left': 480.84,\n", + " 'width': 27.0,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'High'}],\n", + " [{'top': 380.19,\n", + " 'left': 120.36,\n", + " 'width': 48.36000061035156,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Mackeral'},\n", + " {'top': 380.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '320 cals'},\n", + " {'top': 380.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '300 cals'},\n", + " {'top': 380.19,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 402.75,\n", + " 'left': 123.36,\n", + " 'width': 42.36000061035156,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Mussels'},\n", + " {'top': 402.75,\n", + " 'left': 262.92,\n", + " 'width': 36.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '90 cals'},\n", + " {'top': 402.75,\n", + " 'left': 373.08,\n", + " 'width': 36.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '90 cals'},\n", + " {'top': 402.75,\n", + " 'left': 468.84,\n", + " 'width': 51.000030517578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Low-Med'}],\n", + " [{'top': 425.19,\n", + " 'left': 108.6,\n", + " 'width': 72.00000762939453,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Pheasant roast'},\n", + " {'top': 425.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '200 cals'},\n", + " {'top': 425.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '200 cals'},\n", + " {'top': 425.19,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 447.75,\n", + " 'left': 100.2,\n", + " 'width': 88.68000793457031,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Pilchards (tinned)'},\n", + " {'top': 447.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '140 cals'},\n", + " {'top': 447.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '140 cals'},\n", + " {'top': 447.75,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 470.19,\n", + " 'left': 125.4,\n", + " 'width': 38.279991149902344,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Prawns'},\n", + " {'top': 470.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '180 cals'},\n", + " {'top': 470.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '100 cals'},\n", + " {'top': 470.19,\n", + " 'left': 467.28,\n", + " 'width': 54.000030517578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Low- Med'}],\n", + " [{'top': 492.75,\n", + " 'left': 131.76,\n", + " 'width': 28.680007934570312,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Pork'},\n", + " {'top': 492.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '320 cals'},\n", + " {'top': 492.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '290 cals'},\n", + " {'top': 492.75,\n", + " 'left': 467.76,\n", + " 'width': 53.03997802734375,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Med-High'}],\n", + " [{'top': 515.19,\n", + " 'left': 122.88,\n", + " 'width': 43.31999969482422,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Pork pie'},\n", + " {'top': 515.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '320 cals'},\n", + " {'top': 515.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '450 cals'},\n", + " {'top': 515.19,\n", + " 'left': 480.84,\n", + " 'width': 27.0,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'High'}],\n", + " [{'top': 537.75,\n", + " 'left': 127.08,\n", + " 'width': 35.03999328613281,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Rabbit'},\n", + " {'top': 537.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '200 cals'},\n", + " {'top': 537.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '180 cals'},\n", + " {'top': 537.75,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 560.19,\n", + " 'left': 111.24,\n", + " 'width': 66.72000885009766,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Salmon fresh'},\n", + " {'top': 560.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '220 cals'},\n", + " {'top': 560.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '180 cals'},\n", + " {'top': 560.19,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 582.75,\n", + " 'left': 91.92,\n", + " 'width': 105.36000061035156,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Sardines tinned in oil'},\n", + " {'top': 582.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '220 cals'},\n", + " {'top': 582.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '220 cals'},\n", + " {'top': 582.75,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 605.19,\n", + " 'left': 83.28,\n", + " 'width': 122.63999938964844,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Sardines in tomato sauce'},\n", + " {'top': 605.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '180 cals'},\n", + " {'top': 605.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '180 cals'},\n", + " {'top': 605.19,\n", + " 'left': 472.44,\n", + " 'width': 43.67999267578125,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Medium'}],\n", + " [{'top': 627.75,\n", + " 'left': 98.04,\n", + " 'width': 92.99999237060547,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Sausage pork fried'},\n", + " {'top': 627.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '250 cals'},\n", + " {'top': 627.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '320 cals'},\n", + " {'top': 627.75,\n", + " 'left': 480.84,\n", + " 'width': 27.0,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'High'}],\n", + " [{'top': 650.19,\n", + " 'left': 93.72,\n", + " 'width': 101.63999938964844,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Sausage pork grilled'},\n", + " {'top': 650.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '220 cals'},\n", + " {'top': 650.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '280 cals'},\n", + " {'top': 650.19,\n", + " 'left': 467.76,\n", + " 'width': 53.03997802734375,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Med-High'}],\n", + " [{'top': 672.75,\n", + " 'left': 113.52,\n", + " 'width': 62.040000915527344,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Sausage roll'},\n", + " {'top': 672.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '290 cals'},\n", + " {'top': 672.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '480 cals'},\n", + " {'top': 672.75,\n", + " 'left': 480.84,\n", + " 'width': 27.0,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'High'}],\n", + " [{'top': 695.19,\n", + " 'left': 98.28,\n", + " 'width': 92.63999938964844,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Scampi fried in oil'},\n", + " {'top': 695.19,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '400 cals'},\n", + " {'top': 695.19,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '340 cals'},\n", + " {'top': 695.19,\n", + " 'left': 480.84,\n", + " 'width': 27.0,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'High'}],\n", + " [{'top': 717.75,\n", + " 'left': 96.96,\n", + " 'width': 95.2800064086914,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'Steak & kidney pie'},\n", + " {'top': 717.75,\n", + " 'left': 259.92,\n", + " 'width': 42.5999755859375,\n", + " 'height': 7.880000114440918,\n", + " 'text': '400 cals'},\n", + " {'top': 717.75,\n", + " 'left': 370.08,\n", + " 'width': 42.600006103515625,\n", + " 'height': 7.880000114440918,\n", + " 'text': '350 cals'},\n", + " {'top': 717.75,\n", + " 'left': 480.84,\n", + " 'width': 27.0,\n", + " 'height': 7.880000114440918,\n", + " 'text': 'High'}]]}]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\", pages=3, output_format=\"json\")\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[ 0 1 \\\n", + " 0 BREADS & CEREALS Portion size * \n", + " 1 Bagel ( 1 average ) 140 cals (45g) \n", + " 2 Biscuit digestives 86 cals (per biscuit) \n", + " 3 Jaffa cake 48 cals (per biscuit) \n", + " 4 Bread white (thick slice) 96 cals (1 slice 40g) \n", + " 5 Bread wholemeal (thick) 88 cals (1 slice 40g) \n", + " 6 Chapatis 250 cals \n", + " 7 Cornflakes 130 cals (35g) \n", + " 8 Crackerbread 17 cals per slice \n", + " 9 Cream crackers 35 cals (per cracker) \n", + " 10 Crumpets 93 cals (per crumpet) \n", + " 11 Flapjacks basic fruit mix 320 cals \n", + " 12 Macaroni (boiled) 238 cals (250g) \n", + " 13 Muesli 195 cals (50g) \n", + " 14 Naan bread (normal) 300 cals (small plate size) \n", + " 15 Noodles (boiled) 175 cals (250g) \n", + " 16 Pasta ( normal boiled ) 330 cals (300g) \n", + " 17 Pasta (wholemeal boiled ) 315 cals (300g) \n", + " 18 Porridge oats (with water) 193 cals (350g) \n", + " 19 Potatoes** (boiled) 210 cals (300g) \n", + " 20 Potatoes** (roast) 420 cals (300g) \n", + " \n", + " 2 3 4 \n", + " 0 per 100 grams (3.5 oz) NaN energy content \n", + " 1 310 cals NaN Medium \n", + " 2 480 cals NaN High \n", + " 3 370 cals NaN Med-High \n", + " 4 240 cals NaN Medium \n", + " 5 220 cals NaN Low-med \n", + " 6 300 cals NaN Medium \n", + " 7 370 cals NaN Med-High \n", + " 8 325 cals NaN Low Calorie \n", + " 9 440 cals NaN Low / portion \n", + " 10 198 cals NaN Low-Med \n", + " 11 500 cals NaN High \n", + " 12 95 cals NaN Low calorie \n", + " 13 390 cals NaN Med-high \n", + " 14 320 cals NaN Medium \n", + " 15 70 cals NaN Low calorie \n", + " 16 110 cals NaN Low calorie \n", + " 17 105 cals NaN Low calorie \n", + " 18 55 cals NaN Low calorie \n", + " 19 70 cals NaN Low calorie \n", + " 20 140 cals NaN Medium ,\n", + " 0 1 2 3\n", + " 0 Rice (white boiled) 420 cals (300g) 140 cals Low calorie\n", + " 1 Rice (egg-fried) 500 cals 200 cals High in portion\n", + " 2 Rice ( Brown ) 405 cals (300g) 135 cals Low calorie\n", + " 3 Rice cakes 28 Cals = 1 slice 373 Cals Medium\n", + " 4 Ryvita Multi grain 37 Cals per slice 331 Cals Medium\n", + " 5 Ryvita + seed & Oats 180 Cals 4 slices 362 Cals Medium\n", + " 6 Spaghetti (boiled) 303 cals (300g) 101 cals Low calorie,\n", + " 0 1 2 3 \\\n", + " 0 Meats & Fish Portion size * per 100 grams (3.5 oz) NaN \n", + " 1 Anchovies tinned 300 cals 300 cals NaN \n", + " 2 Bacon average fried 250 cals (2 rashers) 500 cals NaN \n", + " 3 Bacon average grilled 150 cals 380 cals NaN \n", + " 4 Beef (roast) 300 cals 280 cals NaN \n", + " 5 Beef burgers frozen 320 cals 280 cals NaN \n", + " 6 Chicken 220 cals 200 cals NaN \n", + " 7 Cockles 50 cals 50 cals NaN \n", + " 8 Cod fresh 150 cals 100 cals NaN \n", + " 9 Cod chip shop food 400 cals 200 cals NaN \n", + " 10 Crab fresh 200 cals 110 cals NaN \n", + " 11 Duck roast 400 cals 430 cals NaN \n", + " \n", + " 4 \n", + " 0 energy content \n", + " 1 Medium \n", + " 2 High \n", + " 3 Med-High \n", + " 4 Medium \n", + " 5 Med-High \n", + " 6 Medium \n", + " 7 Low \n", + " 8 Low calorie \n", + " 9 Med-High \n", + " 10 low calorie \n", + " 11 High ,\n", + " 0 1 2 3\n", + " 0 Fish cake 90 cals per cake 200 cals Medium\n", + " 1 Fish fingers 50 cals per piece 220 cals Medium\n", + " 2 Gammon 320 cals 280 cals Med-High\n", + " 3 Haddock fresh 200 cals 110 cals Low calorie\n", + " 4 Halibut fresh 220 cals 125 cals Low calorie\n", + " 5 Ham 6 cals 240 cals Medium\n", + " 6 Herring fresh grilled 300 cals 200 cals Medium\n", + " 7 Kidney 200 cals 160 cals Medium\n", + " 8 Kipper 200 cals 120 cals Low calorie\n", + " 9 Liver 200 cals 150 cals Medium\n", + " 10 Liver pate 150 cals 300 cals Medium\n", + " 11 Lamb (roast) 300 cals 300 cals Med-High\n", + " 12 Lobster boiled 200 cals 100 cals Low calorie\n", + " 13 Luncheon meat 300 cals 400 cals High\n", + " 14 Mackeral 320 cals 300 cals Medium\n", + " 15 Mussels 90 cals 90 cals Low-Med\n", + " 16 Pheasant roast 200 cals 200 cals Medium\n", + " 17 Pilchards (tinned) 140 cals 140 cals Medium\n", + " 18 Prawns 180 cals 100 cals Low- Med\n", + " 19 Pork 320 cals 290 cals Med-High\n", + " 20 Pork pie 320 cals 450 cals High\n", + " 21 Rabbit 200 cals 180 cals Medium\n", + " 22 Salmon fresh 220 cals 180 cals Medium\n", + " 23 Sardines tinned in oil 220 cals 220 cals Medium\n", + " 24 Sardines in tomato sauce 180 cals 180 cals Medium\n", + " 25 Sausage pork fried 250 cals 320 cals High\n", + " 26 Sausage pork grilled 220 cals 280 cals Med-High\n", + " 27 Sausage roll 290 cals 480 cals High\n", + " 28 Scampi fried in oil 400 cals 340 cals High\n", + " 29 Steak & kidney pie 400 cals 350 cals High,\n", + " 0 1 2 3\n", + " 0 Taramasalata 130 cals 490 cals High\n", + " 1 Trout fresh 200 cals 120 cals Low calorie\n", + " 2 Tuna tinned water 100 cals 100 cals Low calorie\n", + " 3 Tuna tinned oil 180 cals 180 cals Medium\n", + " 4 Turkey 200 cals 160 cals Medium\n", + " 5 Veal 300 cals 240 cals Medium,\n", + " 0 1 2 3 \\\n", + " 0 Fruits & Vegetables Portion size * per 100 grams (3.5 oz) NaN \n", + " 1 Apple 44 calories 44 calories NaN \n", + " 2 Banana 107 cals 65 calories NaN \n", + " 3 Beans baked beans 170 cals 80 calories NaN \n", + " 4 Beans dried (boiled) 180 cals 130 calories NaN \n", + " 5 Blackberries 25 cals 25 calories NaN \n", + " 6 Blackcurrant 30 cals 30 calories NaN \n", + " 7 Broccoli 27 cals 32 cals NaN \n", + " 8 Cabbage (boiled) 15 calories 20 calories NaN \n", + " 9 Carrot (boiled) 16 calories 25 calories NaN \n", + " 10 Cauliflower (boiled) 20 calories 30 calories NaN \n", + " 11 Celery (boiled) 5 calories 10 calories NaN \n", + " 12 Cherry 35 calories 50 calories NaN \n", + " 13 Courgette 8 cals 20 cals NaN \n", + " 14 Cucumber 3 calories 10 calories NaN \n", + " 15 Dates 100 calories 235 calories NaN \n", + " 16 Grapes 55 calories 62 calories NaN \n", + " 17 Grapefruit 32 calories 32 calories NaN \n", + " 18 Kiwi 40 calories 50 calories NaN \n", + " 19 Leek (boiled) 10 calories 20 calories NaN \n", + " \n", + " 4 \n", + " 0 energy content \n", + " 1 Low calorie \n", + " 2 Low calorie \n", + " 3 Low calorie \n", + " 4 Low calorie \n", + " 5 Low calorie \n", + " 6 Low calorie \n", + " 7 Very low \n", + " 8 Low calorie \n", + " 9 Low calorie \n", + " 10 Low calorie \n", + " 11 Low calorie \n", + " 12 Low calorie \n", + " 13 Very low cal \n", + " 14 Low calorie \n", + " 15 Med-High \n", + " 16 Low calorie \n", + " 17 Low calorie \n", + " 18 Low calorie \n", + " 19 Low calorie ,\n", + " 0 1 2 3\n", + " 0 Lentils (boiled) 150 calories 100 calories Medium\n", + " 1 Lettuce 4 calories 15 calories Very Low\n", + " 2 Melon 14 calories 28 calories Medium\n", + " 3 Mushrooms raw one NaN NaN NaN\n", + " 4 average 3 cals 15 cals Very low cal\n", + " 5 Mushrooms (boiled) 12 calories 12 calories Low calorie\n", + " 6 Mushrooms (fried) 100 calories 145 calories High\n", + " 7 Olives 50 calories 80 calories Low calorie\n", + " 8 Onion (boiled) 14 calories 18 calories Low calorie\n", + " 9 One red Onion 49 cals 33 cals Low calorie\n", + " 10 Onions spring 3 cals 25 cals Very low cal\n", + " 11 Onion (fried) 86 calories 155 calories High\n", + " 12 Orange 40 calories 30 calories Low calorie\n", + " 13 Peas 210 calories 148 calories Medium\n", + " 14 Peas dried & boiled 200 calories 120 calories Low calorie\n", + " 15 Peach 35 calories 30 calories Low calorie\n", + " 16 Pear 45 calories 38 calories Low calorie\n", + " 17 Pepper yellow 6 cals 16 cals Very low\n", + " 18 Pineapple 40 calories 40 calories Low calorie\n", + " 19 Plum 30 calories 39 calories Low calorie\n", + " 20 Spinach 8 calories 8 calories Low calorie\n", + " 21 Strawberries (1 average) 10 calories 30 calories Low calorie\n", + " 22 Sweetcorn 95 calories 130 calories Medium\n", + " 23 Sweetcorn on the cob 70 calories 70 calories Low calorie\n", + " 24 Tomato 30 calories 20 calories Low calorie\n", + " 25 Tomato cherry 6 cals ( 3 toms) 17 Cals Very low cal\n", + " 26 Tomato puree 70 calories 70 calories Low-Medium\n", + " 27 Watercress 5 calories 20 calories Low calorie,\n", + " 0 1 \\\n", + " 0 Milk & Dairy produce Portion size * \n", + " 1 Cheese average 110 cals (25g) \n", + " 2 Cheddar types average reduced NaN \n", + " 3 fat 130 \n", + " 4 Cheese spreads average 90 cals \n", + " 5 Cottage cheese low fat 40 calories \n", + " 6 Cottage cheese 49 cals \n", + " 7 Cream cheese 200 cals \n", + " 8 Cream fresh half 128 cals \n", + " 9 Cream fresh single 160 cals \n", + " 10 Cream fresh double 340 cals \n", + " 11 Cream fresh clotted 480 cals \n", + " 12 Custard 210 cals \n", + " 13 Eggs ( 1 average size) 90 cals \n", + " 14 Eggs fried 120 cals \n", + " 15 Fromage frais 125 cals \n", + " 16 Ice cream 200 cals \n", + " 17 Milk whole 175 cals (250ml/half pint) \n", + " 18 Milk semi-skimmed 125 cals (250ml/half pint) \n", + " 19 Milk skimmed 95 cals (250ml/half pint) \n", + " 20 Milk Soya 90 cals \n", + " 21 Mousse flavored 120 cals \n", + " 22 Omelette with cheese 300 cals \n", + " 23 Trifle with cream 290 cals \n", + " 24 Yogurt natural 90 cals \n", + " 25 Yogurt reduced fat 70 cals \n", + " \n", + " 2 3 4 \n", + " 0 per 100 grams (3.5 oz) NaN energy content \n", + " 1 440 cals NaN High \n", + " 2 NaN NaN NaN \n", + " 3 260 calories NaN Medium \n", + " 4 270 NaN Medium \n", + " 5 80 cals NaN low - med \n", + " 6 98 cals NaN Low calorie \n", + " 7 428 cals NaN High \n", + " 8 160 cals NaN Med-High \n", + " 9 200 cals NaN Med-High \n", + " 10 430 cals NaN High \n", + " 11 600 cals NaN High \n", + " 12 100 cals NaN Medium \n", + " 13 150 cals NaN Medium \n", + " 14 180 cals NaN Med-High \n", + " 15 125 cals NaN Low calorie \n", + " 16 180 cals NaN Medium \n", + " 17 70 cals NaN Med-High \n", + " 18 50 cals NaN Medium \n", + " 19 38 cals NaN Low calorie \n", + " 20 36 cals NaN Low calorie \n", + " 21 140 cals NaN Medium \n", + " 22 266 cals NaN Medium \n", + " 23 190 cals NaN Medium \n", + " 24 60 cals NaN Low calorie \n", + " 25 45 cals NaN Low calorie ,\n", + " 0 1 \\\n", + " 0 Fats & Sugars Portion size * \n", + " 1 PURE FAT 9 cals (1 gram) \n", + " 2 Bombay mix 250 cals \n", + " 3 Butter 112 cals \n", + " 4 Chewing gum 8 cals per piece \n", + " 5 Chocolate 200 cals \n", + " 6 Cod liver oil 135 cals (1 tbspoon) \n", + " 7 Corn snack 125 cals \n", + " 8 Crisps (chips US) average 100 cals \n", + " 9 Honey 42 cals \n", + " 10 Jam 38 cals \n", + " 11 Lard 225 cals \n", + " 12 Low fat spread 50 cals \n", + " 13 Margarine 50 cals \n", + " 14 Mars bar 240 cals \n", + " 15 Mint sweets 10 cals per piece \n", + " 16 Oils -corn, sunflower, olive 135 cals (1 Tbspoon) \n", + " 17 Popcorn average 150 cals \n", + " 18 Sugar white table sugar 20 cals (1 tspoon) \n", + " 19 Sweets (boiled) 100 cals \n", + " 20 Syrup 15 cals \n", + " 21 Toffee 100 cals \n", + " \n", + " 2 3 4 \n", + " 0 per 100 grams (3.5 oz) NaN energy content \n", + " 1 900 cals NaN High \n", + " 2 500 cals NaN High \n", + " 3 750 cals NaN High \n", + " 4 - NaN Low calorie \n", + " 5 500 cals NaN High \n", + " 6 900 cals NaN High \n", + " 7 500 cals NaN High \n", + " 8 500 cals NaN High \n", + " 9 280 cals NaN Medium \n", + " 10 250 cals NaN Medium \n", + " 11 890 cals NaN High \n", + " 12 400 cals NaN High \n", + " 13 750 cals NaN High \n", + " 14 480 cals NaN Med-High \n", + " 15 - NaN High \n", + " 16 900 cals NaN High \n", + " 17 460 cals NaN High \n", + " 18 400 cals NaN Medium \n", + " 19 300 cals NaN Med-High \n", + " 20 300 cals NaN Medium \n", + " 21 400 cals NaN High ,\n", + " 0 1 2 \\\n", + " 0 Fruit Calories per piece Carbs (grams) \n", + " 1 Apple (1 average) 44 calories 10.5 \n", + " 2 Apple cooking 35 calories 9 \n", + " 3 Apricot 30 calories 6.7 \n", + " 4 Avocado 150 calories 2 \n", + " 5 Banana 107 calories 26 \n", + " 6 Blackberries each 1 calorie 0.2 \n", + " 7 Blackcurrant each 1.1 calorie 0.25 \n", + " 8 Blueberries (new) 100g 49 Cals ( 100g ) 15 g \n", + " 9 Cherry each 2.4 calories 0.6 \n", + " 10 Clementine 24 cals 5 \n", + " 11 Currants 5 calories 1.4 \n", + " 12 Damson 28 calories 7.2 \n", + " 13 One average date 5g 5 cals 1.2 \n", + " 14 Dates with inverted sugar 100g 250 calories 63 \n", + " 15 Figs 10 calories 2.4 \n", + " 16 Gooseberries 2.6 calories 0.65 \n", + " 17 Grapes 100g Seedless 50 cals 15 \n", + " 18 one average Grape 6g 3 calories 0.9 \n", + " 19 Grapefruit whole 100 calories 23 \n", + " 20 Guava 24 calories 4.4 \n", + " 21 Kiwi 34 calories 8 \n", + " 22 Lemon 20 calories 3.4 \n", + " 23 Lychees 3 calories 0.7 \n", + " 24 Mango 40 calories 9.5 \n", + " 25 Melon Honeydew (130g) 36 calories 9 \n", + " 26 Melon Canteloupe (130g) 25 cals 6 \n", + " 27 Nectarines 42 calories 9 \n", + " 28 Olives 6.8 calories trace \n", + " \n", + " 3 \n", + " 0 Water Content \n", + " 1 85 % \n", + " 2 88 % \n", + " 3 85 % \n", + " 4 60 % \n", + " 5 75 % \n", + " 6 85 % \n", + " 7 77 % \n", + " 8 81 % \n", + " 9 83 % \n", + " 10 66 % \n", + " 11 16 % \n", + " 12 70 % \n", + " 13 14 % \n", + " 14 12 % \n", + " 15 24 % \n", + " 16 80 % \n", + " 17 82 % \n", + " 18 82 % \n", + " 19 65 % \n", + " 20 85 % \n", + " 21 75 % \n", + " 22 85 % \n", + " 23 80 % \n", + " 24 80 % \n", + " 25 90 % \n", + " 26 93 % \n", + " 27 80 % \n", + " 28 63 % ,\n", + " 0 1 2 3\n", + " 0 Orange average 35 calories 8.5 73 %\n", + " 1 Orange large 350g 100 Cals 22g 75 %\n", + " 2 Papaya Diced (small handful) 67 Cals (20g) 17g -\n", + " 3 Passion Fruit 30 calories 3 50 %\n", + " 4 Paw Paw 28 calories 6 70 %\n", + " 5 Peach 35 calories 7 80 %\n", + " 6 Pear 45 calories 12 77 %\n", + " 7 Pineapple 50 calories 12 85 %\n", + " 8 Plum 25 calories 6 79 %\n", + " 9 Prunes 9 calories 2.2 37 %\n", + " 10 Raisins 5 calories 1.4 13 %\n", + " 11 Raspberries each 1.1 calories 0.2 87 %\n", + " 12 Rhubarb 8 calories 0.8 95 %\n", + " 13 Satsuma one average 112g 29 cals 6.5 88 %\n", + " 14 Satsumas 100g 35 calories 8.5 88 %\n", + " 15 Strawberries (1 average) 2.7 calories 0.6 90 %\n", + " 16 Sultanas 5 calories 1.4 16 %\n", + " 17 Tangerine 26 calories 6 60 %\n", + " 18 Tomatoes (1 average size) 9 cals 2.2 93 %\n", + " 19 Tomatoes Cherry (1 average size) 2 calories 0.5 90 %]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\", pages='all', multiple_tables=True)\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Fish cake90 cals per cake200 calsMedium
0Fish fingers50 cals per piece220 calsMedium
1Gammon320 cals280 calsMed-High
2Haddock fresh200 cals110 calsLow calorie
3Halibut fresh220 cals125 calsLow calorie
4Ham6 cals240 calsMedium
5Herring fresh grilled300 cals200 calsMedium
6Kidney200 cals160 calsMedium
7Kipper200 cals120 calsLow calorie
8Liver200 cals150 calsMedium
9Liver pate150 cals300 calsMedium
10Lamb (roast)300 cals300 calsMed-High
11Lobster boiled200 cals100 calsLow calorie
12Luncheon meat300 cals400 calsHigh
13Mackeral320 cals300 calsMedium
14Mussels90 cals90 calsLow-Med
15Pheasant roast200 cals200 calsMedium
16Pilchards (tinned)140 cals140 calsMedium
17Prawns180 cals100 calsLow- Med
18Pork320 cals290 calsMed-High
19Pork pie320 cals450 calsHigh
20Rabbit200 cals180 calsMedium
21Salmon fresh220 cals180 calsMedium
22Sardines tinned in oil220 cals220 calsMedium
23Sardines in tomato sauce180 cals180 calsMedium
24Sausage pork fried250 cals320 calsHigh
25Sausage pork grilled220 cals280 calsMed-High
26Sausage roll290 cals480 calsHigh
27Scampi fried in oil400 cals340 calsHigh
28Steak & kidney pie400 cals350 calsHigh
\n", + "
" + ], + "text/plain": [ + " Fish cake 90 cals per cake 200 cals Medium\n", + "0 Fish fingers 50 cals per piece 220 cals Medium\n", + "1 Gammon 320 cals 280 cals Med-High\n", + "2 Haddock fresh 200 cals 110 cals Low calorie\n", + "3 Halibut fresh 220 cals 125 cals Low calorie\n", + "4 Ham 6 cals 240 cals Medium\n", + "5 Herring fresh grilled 300 cals 200 cals Medium\n", + "6 Kidney 200 cals 160 cals Medium\n", + "7 Kipper 200 cals 120 cals Low calorie\n", + "8 Liver 200 cals 150 cals Medium\n", + "9 Liver pate 150 cals 300 cals Medium\n", + "10 Lamb (roast) 300 cals 300 cals Med-High\n", + "11 Lobster boiled 200 cals 100 cals Low calorie\n", + "12 Luncheon meat 300 cals 400 cals High\n", + "13 Mackeral 320 cals 300 cals Medium\n", + "14 Mussels 90 cals 90 cals Low-Med\n", + "15 Pheasant roast 200 cals 200 cals Medium\n", + "16 Pilchards (tinned) 140 cals 140 cals Medium\n", + "17 Prawns 180 cals 100 cals Low- Med\n", + "18 Pork 320 cals 290 cals Med-High\n", + "19 Pork pie 320 cals 450 cals High\n", + "20 Rabbit 200 cals 180 cals Medium\n", + "21 Salmon fresh 220 cals 180 cals Medium\n", + "22 Sardines tinned in oil 220 cals 220 cals Medium\n", + "23 Sardines in tomato sauce 180 cals 180 cals Medium\n", + "24 Sausage pork fried 250 cals 320 cals High\n", + "25 Sausage pork grilled 220 cals 280 cals Med-High\n", + "26 Sausage roll 290 cals 480 cals High\n", + "27 Scampi fried in oil 400 cals 340 cals High\n", + "28 Steak & kidney pie 400 cals 350 cals High" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=3)\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123
0Fruits & VegetablesPortion size *oz)energy content
1Apple44 calories44 caloriesLow calorie
2Banana107 cals65 caloriesLow calorie
3Beans baked beans170 cals80 caloriesLow calorie
4Beans dried (boiled)180 cals130 caloriesLow calorie
5Blackberries25 cals25 caloriesLow calorie
6Blackcurrant30 cals30 caloriesLow calorie
7Broccoli27 cals32 calsVery low
8Cabbage (boiled)15 calories20 caloriesLow calorie
9Carrot (boiled)16 calories25 caloriesLow calorie
10Cauliflower (boiled)20 calories30 caloriesLow calorie
11Celery (boiled)5 calories10 caloriesLow calorie
12Cherry35 calories50 caloriesLow calorie
13Courgette8 cals20 calsVery low cal
14Cucumber3 calories10 caloriesLow calorie
15Dates100 calories235 caloriesMed-High
16Grapes55 calories62 caloriesLow calorie
17Grapefruit32 calories32 caloriesLow calorie
18Kiwi40 calories50 caloriesLow calorie
19Leek (boiled)10 calories20 caloriesLow calorie
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3\n", + "0 Fruits & Vegetables Portion size * oz) energy content\n", + "1 Apple 44 calories 44 calories Low calorie\n", + "2 Banana 107 cals 65 calories Low calorie\n", + "3 Beans baked beans 170 cals 80 calories Low calorie\n", + "4 Beans dried (boiled) 180 cals 130 calories Low calorie\n", + "5 Blackberries 25 cals 25 calories Low calorie\n", + "6 Blackcurrant 30 cals 30 calories Low calorie\n", + "7 Broccoli 27 cals 32 cals Very low\n", + "8 Cabbage (boiled) 15 calories 20 calories Low calorie\n", + "9 Carrot (boiled) 16 calories 25 calories Low calorie\n", + "10 Cauliflower (boiled) 20 calories 30 calories Low calorie\n", + "11 Celery (boiled) 5 calories 10 calories Low calorie\n", + "12 Cherry 35 calories 50 calories Low calorie\n", + "13 Courgette 8 cals 20 cals Very low cal\n", + "14 Cucumber 3 calories 10 calories Low calorie\n", + "15 Dates 100 calories 235 calories Med-High\n", + "16 Grapes 55 calories 62 calories Low calorie\n", + "17 Grapefruit 32 calories 32 calories Low calorie\n", + "18 Kiwi 40 calories 50 calories Low calorie\n", + "19 Leek (boiled) 10 calories 20 calories Low calorie" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\", encoding = 'ISO-8859-1',\n", + " stream=True, area = [269.875, 12.75, 790.5, 961], pages = 4, guess = False, pandas_options={'header':None})\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
McKinsey Global Institute
0Disruptive technologies: Advances that will tr...
1Exhibit E2
2Speed, scope, and economic value at stake of 1...
3Illustrative rates of technology improvement I...
4and diffusion resources that could be impacted...
5Mobile $5 million vs. $40024.3 billion $1.7 tr...
6Internet Price of the fastest supercomputer in...
7an iPhone 4 today, equal in performance (MFLOP...
86x 1 billion Interaction and transaction worker
9Growth in sales of smartphones and tablets sin...
10launch of iPhone in 2007 40% of global workfor...
11Automation 100x 230+ million $9+ trillion
12of knowledge Increase in computing power from ...
13work (chess champion in 1997) to Watson (Jeopa...
14winner in 2011) Smartphone users, with potenti...
15400+ million automated digital assistance apps
16Increase in number of users of intelligent dig...
17assistants like Siri and Google Now in past 5 ...
18The Internet 300% 1 trillion $36 trillion
19of Things Increase in connected machine-to-mac...
20over past 5 years across industries such as ma...
2180–90% health care, and mining and mining)
22Price decline in MEMS (microelectromechanical ...
23systems) sensors in past 5 years Global machin...
24connections across sectors like transportation,
25security, health care, and utilities
26Cloud 18 months 2 billion $1.7 trillion
27technology Time to double server performance p...
283x like Gmail, Yahoo, and Hotmail $3 trillion
29Monthly cost of owning a server vs. renting in...
......
52storage Price decline for a lithium-ion batter...
53electric vehicle since 2009 1.2 billion gasoli...
54People without access to electricity $100 billion
55Estimated value of electricity for
56households currently without access
573D printing 90% 320 million $11 trillion
58Lower price for a home 3D printer vs. 4 years ...
594x workforce $85 billion
60Increase in additive manufacturing revenue in ...
6110 years Annual number of toys manufactured gl...
62Advanced $1,000 vs. $50 7.6 million tons $1.2 ...
63materials Difference in price of 1 gram of nan...
6410 years 45,000 metric tons sales
65115x Annual global carbon fiber consumption $4...
66Strength-to-weight ratio of carbon nanotubes v...
67Advanced 3x 22 billion $800 billion
68oil and gas Increase in efficiency of US gas w...
69exploration 2x produced globally gas
70and recovery Increase in efficiency of US oil ...
71Barrels of crude oil produced globally Revenue...
72Renewable 85% 21,000 TWh $3.5 trillion
73energy Lower price for a solar photovoltaic ce...
742000 13 billion tons $80 billion
7519x Annual CO2 emissions from electricity Valu...
76Growth in solar photovoltaic and wind generati...
77capacity since 2000 and planes
781 Not comprehensive; indicative groups, produc...
792 For CDC-7600, considered the world’s faste...
803 Baxter is a general-purpose basic manufactur...
81SOURCE: McKinsey Global Institute analysis
\n", + "

82 rows × 1 columns

\n", + "
" + ], + "text/plain": [ + " McKinsey Global Institute\n", + "0 Disruptive technologies: Advances that will tr...\n", + "1 Exhibit E2\n", + "2 Speed, scope, and economic value at stake of 1...\n", + "3 Illustrative rates of technology improvement I...\n", + "4 and diffusion resources that could be impacted...\n", + "5 Mobile $5 million vs. $40024.3 billion $1.7 tr...\n", + "6 Internet Price of the fastest supercomputer in...\n", + "7 an iPhone 4 today, equal in performance (MFLOP...\n", + "8 6x 1 billion Interaction and transaction worker\n", + "9 Growth in sales of smartphones and tablets sin...\n", + "10 launch of iPhone in 2007 40% of global workfor...\n", + "11 Automation 100x 230+ million $9+ trillion\n", + "12 of knowledge Increase in computing power from ...\n", + "13 work (chess champion in 1997) to Watson (Jeopa...\n", + "14 winner in 2011) Smartphone users, with potenti...\n", + "15 400+ million automated digital assistance apps\n", + "16 Increase in number of users of intelligent dig...\n", + "17 assistants like Siri and Google Now in past 5 ...\n", + "18 The Internet 300% 1 trillion $36 trillion\n", + "19 of Things Increase in connected machine-to-mac...\n", + "20 over past 5 years across industries such as ma...\n", + "21 80–90% health care, and mining and mining)\n", + "22 Price decline in MEMS (microelectromechanical ...\n", + "23 systems) sensors in past 5 years Global machin...\n", + "24 connections across sectors like transportation,\n", + "25 security, health care, and utilities\n", + "26 Cloud 18 months 2 billion $1.7 trillion\n", + "27 technology Time to double server performance p...\n", + "28 3x like Gmail, Yahoo, and Hotmail $3 trillion\n", + "29 Monthly cost of owning a server vs. renting in...\n", + ".. ...\n", + "52 storage Price decline for a lithium-ion batter...\n", + "53 electric vehicle since 2009 1.2 billion gasoli...\n", + "54 People without access to electricity $100 billion\n", + "55 Estimated value of electricity for\n", + "56 households currently without access\n", + "57 3D printing 90% 320 million $11 trillion\n", + "58 Lower price for a home 3D printer vs. 4 years ...\n", + "59 4x workforce $85 billion\n", + "60 Increase in additive manufacturing revenue in ...\n", + "61 10 years Annual number of toys manufactured gl...\n", + "62 Advanced $1,000 vs. $50 7.6 million tons $1.2 ...\n", + "63 materials Difference in price of 1 gram of nan...\n", + "64 10 years 45,000 metric tons sales\n", + "65 115x Annual global carbon fiber consumption $4...\n", + "66 Strength-to-weight ratio of carbon nanotubes v...\n", + "67 Advanced 3x 22 billion $800 billion\n", + "68 oil and gas Increase in efficiency of US gas w...\n", + "69 exploration 2x produced globally gas\n", + "70 and recovery Increase in efficiency of US oil ...\n", + "71 Barrels of crude oil produced globally Revenue...\n", + "72 Renewable 85% 21,000 TWh $3.5 trillion\n", + "73 energy Lower price for a solar photovoltaic ce...\n", + "74 2000 13 billion tons $80 billion\n", + "75 19x Annual CO2 emissions from electricity Valu...\n", + "76 Growth in solar photovoltaic and wind generati...\n", + "77 capacity since 2000 and planes\n", + "78 1 Not comprehensive; indicative groups, produc...\n", + "79 2 For CDC-7600, considered the world’s faste...\n", + "80 3 Baxter is a general-purpose basic manufactur...\n", + "81 SOURCE: McKinsey Global Institute analysis\n", + "\n", + "[82 rows x 1 columns]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = read_pdf(\"./tmp/pdf/output.pdf\", encoding = 'ISO-8859-1',\n", + " stream=True, guess = False)\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0Unnamed: 1over past 5 years across industries such as manufacturing,industries (manufacturing, health care,
0NaNNaN80–90% health care, and miningand mining)
1NaNNaNPrice decline in MEMS (microelectromechanical ...NaN
2NaNNaNsystems) sensors in past 5 years Global machin...NaN
3NaNNaNconnections across sectors like transportation,NaN
4NaNNaNsecurity, health care, and utilitiesNaN
5NaNCloud18 months 2 billion$1.7 trillion
6NaNtechnologyTime to double server performance per dollar G...GDP related to the Internet
7NaNNaN3x like Gmail, Yahoo, and Hotmail$3 trillion
8NaNNaNMonthly cost of owning a server vs. renting in...Enterprise IT spend
9NaNNaNthe cloud North American institutions hosting ...NaN
10NaNNaNto host critical applications on the cloudNaN
11NaNAdvanced75–85% 320 million$6 trillion
12NaNroboticsLower price for Baxter3 than a typical industr...Manufacturing worker employment
13NaNNaN170% workforcecosts, 19% of global employment costs
14NaNNaNGrowth in sales of industrial robots, 2009–1...$2–3 trillion
15NaNNaNAnnual major surgeriesCost of major surgeries
16NaNAutonomous7 1 billion$4 trillion
17NaNand near-Miles driven by top-performing driverless car ...Automobile industry revenue
18NaNautonomousDARPA Grand Challenge along a 150-mile route 4...$155 billion
19NaNvehicles1,540 Civilian, military, and general aviation...Revenue from sales of civilian, military,
20NaNNaNMiles cumulatively driven by cars competing in...and general aviation aircraft
21NaNNaNGrand ChallengeNaN
22NaNNaN300,000+NaN
23NaNNaNMiles driven by Google’s autonomous cars wit...NaN
24NaNNaN1 accident (which was human-caused)NaN
25NaNNext-10 months 26 million$6.5 trillion
26NaNgenerationTime to double sequencing speed per dollar Ann...Global health-care costs
27NaNgenomics100x disease, or type 2 diabetes$1.1 trillion
28NaNNaNIncrease in acreage of genetically modified cr...Global value of wheat, rice, maize, soy,
29NaNNaN1996–2012 People employed in agricultureand barley
30NaNEnergy40% 1 billion$2.5 trillion
31NaNstoragePrice decline for a lithium-ion battery pack i...Revenue from global consumption of
32NaNNaNelectric vehicle since 2009 1.2 billiongasoline and diesel
33NaNNaNPeople without access to electricity$100 billion
34NaNNaNNaNEstimated value of electricity for
35NaNNaNNaNhouseholds currently without access
36NaN3D printing90% 320 million$11 trillion
37NaNNaNLower price for a home 3D printer vs. 4 years ...Global manufacturing GDP
38NaNNaN4x workforce$85 billion
39NaNNaNIncrease in additive manufacturing revenue in ...Revenue from global toy sales
40NaNNaN10 years Annual number of toys manufactured gl...NaN
41NaNAdvanced$1,000 vs. $50 7.6 million tons$1.2 trillion
42NaNmaterialsDifference in price of 1 gram of nanotubes ove...Revenue from global semiconductor
43NaNNaN10 years 45,000 metric tonssales
44NaNNaN115x Annual global carbon fiber consumption$4 billion
45NaNNaNStrength-to-weight ratio of carbon nanotubes v...Revenue from global carbon fiber sales
46NaNAdvanced3x 22 billion$800 billion
47NaNoil and gasIncrease in efficiency of US gas wells, 2007â€...Revenue from global sales of natural
48NaNexploration2x produced globallygas
49NaNand recoveryIncrease in efficiency of US oil wells, 2007â€...$3.4 trillion
50NaNNaNBarrels of crude oil produced globallyRevenue from global sales of crude oil
51NaNRenewable85% 21,000 TWh$3.5 trillion
52NaNenergyLower price for a solar photovoltaic cell per ...Value of global electricity consumption
53NaNNaN2000 13 billion tons$80 billion
54NaNNaN19x Annual CO2 emissions from electricityValue of global carbon market
55NaNNaNGrowth in solar photovoltaic and wind generati...transactions
56NaNNaNcapacity since 2000 and planesNaN
571.0Not comprehensive; indicative groups, products...NaNNaN
582.0For CDC-7600, considered the world’s fastest...NaNNaN
593.0Baxter is a general-purpose basic manufacturin...NaNNaN
\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 Unnamed: 1 \\\n", + "0 NaN NaN \n", + "1 NaN NaN \n", + "2 NaN NaN \n", + "3 NaN NaN \n", + "4 NaN NaN \n", + "5 NaN Cloud \n", + "6 NaN technology \n", + "7 NaN NaN \n", + "8 NaN NaN \n", + "9 NaN NaN \n", + "10 NaN NaN \n", + "11 NaN Advanced \n", + "12 NaN robotics \n", + "13 NaN NaN \n", + "14 NaN NaN \n", + "15 NaN NaN \n", + "16 NaN Autonomous \n", + "17 NaN and near- \n", + "18 NaN autonomous \n", + "19 NaN vehicles \n", + "20 NaN NaN \n", + "21 NaN NaN \n", + "22 NaN NaN \n", + "23 NaN NaN \n", + "24 NaN NaN \n", + "25 NaN Next- \n", + "26 NaN generation \n", + "27 NaN genomics \n", + "28 NaN NaN \n", + "29 NaN NaN \n", + "30 NaN Energy \n", + "31 NaN storage \n", + "32 NaN NaN \n", + "33 NaN NaN \n", + "34 NaN NaN \n", + "35 NaN NaN \n", + "36 NaN 3D printing \n", + "37 NaN NaN \n", + "38 NaN NaN \n", + "39 NaN NaN \n", + "40 NaN NaN \n", + "41 NaN Advanced \n", + "42 NaN materials \n", + "43 NaN NaN \n", + "44 NaN NaN \n", + "45 NaN NaN \n", + "46 NaN Advanced \n", + "47 NaN oil and gas \n", + "48 NaN exploration \n", + "49 NaN and recovery \n", + "50 NaN NaN \n", + "51 NaN Renewable \n", + "52 NaN energy \n", + "53 NaN NaN \n", + "54 NaN NaN \n", + "55 NaN NaN \n", + "56 NaN NaN \n", + "57 1.0 Not comprehensive; indicative groups, products... \n", + "58 2.0 For CDC-7600, considered the world’s fastest... \n", + "59 3.0 Baxter is a general-purpose basic manufacturin... \n", + "\n", + " over past 5 years across industries such as manufacturing, \\\n", + "0 80–90% health care, and mining \n", + "1 Price decline in MEMS (microelectromechanical ... \n", + "2 systems) sensors in past 5 years Global machin... \n", + "3 connections across sectors like transportation, \n", + "4 security, health care, and utilities \n", + "5 18 months 2 billion \n", + "6 Time to double server performance per dollar G... \n", + "7 3x like Gmail, Yahoo, and Hotmail \n", + "8 Monthly cost of owning a server vs. renting in... \n", + "9 the cloud North American institutions hosting ... \n", + "10 to host critical applications on the cloud \n", + "11 75–85% 320 million \n", + "12 Lower price for Baxter3 than a typical industr... \n", + "13 170% workforce \n", + "14 Growth in sales of industrial robots, 2009–1... \n", + "15 Annual major surgeries \n", + "16 7 1 billion \n", + "17 Miles driven by top-performing driverless car ... \n", + "18 DARPA Grand Challenge along a 150-mile route 4... \n", + "19 1,540 Civilian, military, and general aviation... \n", + "20 Miles cumulatively driven by cars competing in... \n", + "21 Grand Challenge \n", + "22 300,000+ \n", + "23 Miles driven by Google’s autonomous cars wit... \n", + "24 1 accident (which was human-caused) \n", + "25 10 months 26 million \n", + "26 Time to double sequencing speed per dollar Ann... \n", + "27 100x disease, or type 2 diabetes \n", + "28 Increase in acreage of genetically modified cr... \n", + "29 1996–2012 People employed in agriculture \n", + "30 40% 1 billion \n", + "31 Price decline for a lithium-ion battery pack i... \n", + "32 electric vehicle since 2009 1.2 billion \n", + "33 People without access to electricity \n", + "34 NaN \n", + "35 NaN \n", + "36 90% 320 million \n", + "37 Lower price for a home 3D printer vs. 4 years ... \n", + "38 4x workforce \n", + "39 Increase in additive manufacturing revenue in ... \n", + "40 10 years Annual number of toys manufactured gl... \n", + "41 $1,000 vs. $50 7.6 million tons \n", + "42 Difference in price of 1 gram of nanotubes ove... \n", + "43 10 years 45,000 metric tons \n", + "44 115x Annual global carbon fiber consumption \n", + "45 Strength-to-weight ratio of carbon nanotubes v... \n", + "46 3x 22 billion \n", + "47 Increase in efficiency of US gas wells, 2007â€... \n", + "48 2x produced globally \n", + "49 Increase in efficiency of US oil wells, 2007â€... \n", + "50 Barrels of crude oil produced globally \n", + "51 85% 21,000 TWh \n", + "52 Lower price for a solar photovoltaic cell per ... \n", + "53 2000 13 billion tons \n", + "54 19x Annual CO2 emissions from electricity \n", + "55 Growth in solar photovoltaic and wind generati... \n", + "56 capacity since 2000 and planes \n", + "57 NaN \n", + "58 NaN \n", + "59 NaN \n", + "\n", + " industries (manufacturing, health care, \n", + "0 and mining) \n", + "1 NaN \n", + "2 NaN \n", + "3 NaN \n", + "4 NaN \n", + "5 $1.7 trillion \n", + "6 GDP related to the Internet \n", + "7 $3 trillion \n", + "8 Enterprise IT spend \n", + "9 NaN \n", + "10 NaN \n", + "11 $6 trillion \n", + "12 Manufacturing worker employment \n", + "13 costs, 19% of global employment costs \n", + "14 $2–3 trillion \n", + "15 Cost of major surgeries \n", + "16 $4 trillion \n", + "17 Automobile industry revenue \n", + "18 $155 billion \n", + "19 Revenue from sales of civilian, military, \n", + "20 and general aviation aircraft \n", + "21 NaN \n", + "22 NaN \n", + "23 NaN \n", + "24 NaN \n", + "25 $6.5 trillion \n", + "26 Global health-care costs \n", + "27 $1.1 trillion \n", + "28 Global value of wheat, rice, maize, soy, \n", + "29 and barley \n", + "30 $2.5 trillion \n", + "31 Revenue from global consumption of \n", + "32 gasoline and diesel \n", + "33 $100 billion \n", + "34 Estimated value of electricity for \n", + "35 households currently without access \n", + "36 $11 trillion \n", + "37 Global manufacturing GDP \n", + "38 $85 billion \n", + "39 Revenue from global toy sales \n", + "40 NaN \n", + "41 $1.2 trillion \n", + "42 Revenue from global semiconductor \n", + "43 sales \n", + "44 $4 billion \n", + "45 Revenue from global carbon fiber sales \n", + "46 $800 billion \n", + "47 Revenue from global sales of natural \n", + "48 gas \n", + "49 $3.4 trillion \n", + "50 Revenue from global sales of crude oil \n", + "51 $3.5 trillion \n", + "52 Value of global electricity consumption \n", + "53 $80 billion \n", + "54 Value of global carbon market \n", + "55 transactions \n", + "56 NaN \n", + "57 NaN \n", + "58 NaN \n", + "59 NaN " + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = read_pdf(\"./tmp/pdf/output.pdf\", encoding = 'ISO-8859-1',\n", + " stream=True, area=[269.875, 12.75, 790.5, 961], guess = False)\n", + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## With Camelot\n", + "\n", + "#### Installation\n", + "\n", + "https://pypi.org/project/camelot-py/\n", + "\n", + "`pip install camelot-py`\n", + "\n", + "#### Camelot readme\n", + "\n", + "https://github.com/socialcopsdev/camelot" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123
1Bagel ( 1 average )140 cals (45g)310 calsMedium
2Biscuit digestives86 cals (per biscuit)480 calsHigh
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3\n", + "1 Bagel ( 1 average ) 140 cals (45g) 310 cals Medium\n", + "2 Biscuit digestives 86 cals (per biscuit) 480 cals High" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import camelot\n", + "tables = camelot.read_pdf(\"./tmp/pdf//Food Calories List.pdf\")\n", + "tables[0].df[1:3]" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "-- -------------\n", + " 0 Mobile\n", + " Internet\n", + " 1 Automation\n", + " of knowledge\n", + " work\n", + " 2 The Internet\n", + " of Things\n", + " 3 Cloud\n", + " technology\n", + " 4 Advanced\n", + " robotics\n", + " 5 Autonomous\n", + " and near-\n", + " autonomous\n", + " vehicles\n", + " 6 Next-\n", + " generation\n", + " genomics\n", + " 7 Energy\n", + " storage\n", + " 8 3D printing\n", + " 9 Advanced\n", + " materials\n", + "10 Advanced oil\n", + " and gas\n", + " exploration\n", + " and recovery\n", + "11 Renewable\n", + " energy\n", + "-- -------------\n" + ] + } + ], + "source": [ + "tables1 = camelot.read_pdf(\"./tmp/pdf/MGI_Disruptive_technologies_Full_report_May2013.pdf\", pages='32', area=[269.875, 120.75, 790.5, 561])\n", + "print (tabulate(tables1[0].df))" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "30\n", + "NOK\n", + "31\n", + "NOK\n", + "32\n", + "-- -------------\n", + " 0 Mobile\n", + " Internet\n", + " 1 Automation\n", + " of knowledge\n", + " work\n", + " 2 The Internet\n", + " of Things\n", + " 3 Cloud\n", + " technology\n", + " 4 Advanced\n", + " robotics\n", + " 5 Autonomous\n", + " and near-\n", + " autonomous\n", + " vehicles\n", + " 6 Next-\n", + " generation\n", + " genomics\n", + " 7 Energy\n", + " storage\n", + " 8 3D printing\n", + " 9 Advanced\n", + " materials\n", + "10 Advanced oil\n", + " and gas\n", + " exploration\n", + " and recovery\n", + "11 Renewable\n", + " energy\n", + "-- -------------\n", + "NOK\n", + "33\n", + "NOK\n", + "34\n", + "NOK\n" + ] + } + ], + "source": [ + "for i in range(30,35):\n", + " print (i)\n", + " tables = camelot.read_pdf(\"./tmp/pdf/MGI_Disruptive_technologies_Full_report_May2013.pdf\", pages='%d' % i)\n", + " try:\n", + " print (tabulate(tables[0].df))\n", + " print (tabulate(tables[1].df))\n", + " except IndexError:\n", + " print('NOK')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Extract by PyPDF2\n", + "\n", + "#### Installation\n", + "\n", + "https://pypi.org/project/PyPDF2/\n", + "\n", + "`pip install PyPDF2`" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "b' Fish cake\\n 90 cals per cake\\n 200 cals\\n Medium\\n Fish fingers\\n 50 cals per piece\\n 220 cals\\n Medium\\n Gammon\\n 320 cals\\n 280 cals\\n Med\\n-High\\n Haddock fresh\\n 200 cals\\n 110 cals\\n Low calorie\\n Halibut fresh\\n 220 cals\\n 125 cals\\n Low calorie\\n Ha\\nm 6 cals\\n 240 cals\\n Medium\\n Herring fresh grilled\\n 300 cals\\n 200 cals\\n Medium\\n Kidney\\n 200 cals\\n 160 cals\\n Medium\\n Kipper\\n 200 cals\\n 120 cals\\n Low calorie\\n Liver\\n 200 cals\\n 150 cals\\n Medium\\n Liver\\n pate\\n 150 cals\\n 300 cals\\n Medium\\n Lamb (roast)\\n 300 cals\\n 300 cals\\n Med\\n-High\\n Lobster boiled\\n 200 cals\\n 100 cals\\n Low calorie\\n Luncheon meat\\n 300 cals\\n 400 cals\\n High\\n Mackeral\\n 320 cals\\n 300 cal\\ns Medium\\n Mussels\\n 90 cals\\n 90 cals\\n Low\\n-Med\\n Pheasant roast\\n 200 cals\\n 200 cals\\n Medium\\n Pilchards (tinned)\\n 140 cals\\n 140 cals\\n Medium\\n Prawns\\n 180 cals\\n 100 cals\\n Low\\n- Med\\n Pork \\n 320 cals\\n 290 cals\\n Med\\n-High\\n Pork pie\\n 320 cals\\n 450 cals\\n High\\n Rabbit\\n 200 cals\\n 180 cals\\n Medium\\n Salmon fresh\\n 220 cals\\n 180 cals\\n Medium\\n Sardines tinned in oil\\n 220 cals\\n 220 cals\\n Medium\\n Sardines in tomato sauce\\n 180 cals\\n 180 cals\\n Medium\\n Sausage pork fried\\n 250 cals\\n 320 cals\\n High\\n Sausage pork grilled\\n 220 cals\\n 280 cals\\n Med\\n-High\\n Sausage roll\\n 290 cals\\n 480 cals\\n High\\n Scampi fried in oil\\n 400 cals\\n 340 cals\\n High\\n Steak & kidney pie\\n 400 cals\\n 350 cals\\n High\\n '\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]\n" + ] + } + ], + "source": [ + "import PyPDF2\n", + "pdf_file = open('./tmp/pdf/Food Calories List.pdf', 'rb')\n", + "read_pdf = PyPDF2.PdfFileReader(pdf_file)\n", + "number_of_pages = read_pdf.getNumPages()\n", + "page = read_pdf.getPage(2)\n", + "page_content = page.extractText()\n", + "print (page_content.encode('utf-8'))" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[' Fish cake' ' 90 cals per cake' ' 200 cals' ' Medium']\n", + "[' Fish fingers' ' 50 cals per piece' ' 220 cals' ' Medium']\n", + "[' Gammon' ' 320 cals' ' 280 cals' ' Med']\n", + "['-High' ' Haddock fresh' ' 200 cals' ' 110 cals']\n", + "[' Low calorie' ' Halibut fresh' ' 220 cals' ' 125 cals']\n" + ] + } + ], + "source": [ + "import numpy\n", + "\n", + "table_list = page_content.split('\\n')\n", + "l = numpy.array_split(table_list, len(table_list)/4)\n", + "for i in range(0,5):\n", + " print(l[i])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 96db5998e1df906967d0503da48fd8e3ea42a22a Mon Sep 17 00:00:00 2001 From: softhints Date: Sat, 2 Feb 2019 10:22:14 +0200 Subject: [PATCH 02/76] IPython/Jupyter Notebook tricks for advanced in 2019 --- notebooks/IPython tricks 2019.ipynb | 535 ++++++++++++++++++++++++++++ 1 file changed, 535 insertions(+) create mode 100644 notebooks/IPython tricks 2019.ipynb diff --git a/notebooks/IPython tricks 2019.ipynb b/notebooks/IPython tricks 2019.ipynb new file mode 100644 index 0000000..2ce6b1e --- /dev/null +++ b/notebooks/IPython tricks 2019.ipynb @@ -0,0 +1,535 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# IPython/Jupyter Notebook tricks for advanced in 2019" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Suppress output in IPython Notebook " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "simple" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "2*2" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "2*2;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "function" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Private Message\n" + ] + } + ], + "source": [ + "def myfunc():\n", + " print('Private Message')\n", + "myfunc();" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "def myfunc():\n", + " print('Private Message')\n", + "myfunc()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "function 2" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Private Message\n" + ] + } + ], + "source": [ + "def myfunc():\n", + " print('Private Message')\n", + " \n", + "myfunc()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.utils import io\n", + "\n", + "def myfunc():\n", + " print('Private Message')\n", + "\n", + "with io.capture_output() as captured:\n", + " myfunc()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Get function docs and arguments IPython Notebook " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy\n", + "table_list = [1,2,3,4,4]\n", + "l = numpy.array_split(table_list, len(table_list)/4)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "? numpy.array_split" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Change theme IPython Notebook \n", + "\n", + "install the module by\n", + "\n", + "`pip install jupyterthemes`\n", + "\n", + "install a theme:\n", + "\n", + "`jt -t chesterish`\n", + "\n", + "restore a theme:\n", + "\n", + "`jt -r`\n", + "\n", + "It can be done even inside jupyter notebook by:\n", + "\n", + "`!jt -r`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!jt -r" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!jt -t chesterish" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus: some useful jupyter notebook magics" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!jupyter kernelspec list" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy\n", + "print (numpy.__path__)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Python 3.6.7\r\n" + ] + } + ], + "source": [ + "!python -V" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!which python" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "appdirs==1.4.3\r\n", + "asn1crypto==0.24.0\r\n", + "atomicwrites==1.2.1\r\n", + "attrs==18.2.0\r\n", + "backcall==0.1.0\r\n", + "black==18.9b0\r\n", + "bleach==3.0.2\r\n", + "boto==2.49.0\r\n", + "boto3==1.9.67\r\n", + "botocore==1.12.67\r\n", + "camelot-py==0.7.1\r\n", + "certifi==2018.8.24\r\n", + "cffi==1.11.5\r\n", + "chardet==3.0.4\r\n", + "Click==7.0\r\n", + "cryptography==2.3.1\r\n", + "cycler==0.10.0\r\n", + "decorator==4.3.0\r\n", + "defusedxml==0.5.0\r\n", + "distro==1.3.0\r\n", + "docutils==0.14\r\n", + "entrypoints==0.2.3\r\n", + "et-xmlfile==1.0.1\r\n", + "filelock==3.0.10\r\n", + "idna==2.7\r\n", + "ipykernel==5.1.0\r\n", + "ipython==7.2.0\r\n", + "ipython-genutils==0.2.0\r\n", + "ipywidgets==7.4.2\r\n", + "jdcal==1.4\r\n", + "jedi==0.13.1\r\n", + "Jinja2==2.10\r\n", + "jira==2.0.0\r\n", + "jmespath==0.9.3\r\n", + "jsonref==0.2\r\n", + "jsonschema==2.6.0\r\n", + "jupyter==1.0.0\r\n", + "jupyter-client==5.2.3\r\n", + "jupyter-console==6.0.0\r\n", + "jupyter-core==4.4.0\r\n", + "jupyterthemes==0.20.0\r\n", + "kiwisolver==1.0.1\r\n", + "lesscpy==0.13.0\r\n", + "lxml==4.3.0\r\n", + "MarkupSafe==1.1.0\r\n", + "matplotlib==3.0.0\r\n", + "mistune==0.8.4\r\n", + "more-itertools==5.0.0\r\n", + "nbconvert==5.4.0\r\n", + "nbformat==4.4.0\r\n", + "notebook==5.7.2\r\n", + "numpy==1.15.1\r\n", + "oauthlib==2.1.0\r\n", + "opencv-python==4.0.0.21\r\n", + "openpyxl==2.5.14\r\n", + "packaging==16.8\r\n", + "pandas==0.23.4\r\n", + "pandocfilters==1.4.2\r\n", + "parso==0.3.1\r\n", + "pbr==4.2.0\r\n", + "pdfminer.six==20181108\r\n", + "pexpect==4.6.0\r\n", + "pickleshare==0.7.5\r\n", + "Pillow==5.2.0\r\n", + "pkg-resources==0.0.0\r\n", + "pluggy==0.8.1\r\n", + "ply==3.11\r\n", + "prometheus-client==0.4.2\r\n", + "prompt-toolkit==2.0.7\r\n", + "ptyprocess==0.6.0\r\n", + "py==1.7.0\r\n", + "py-spy==0.1.8\r\n", + "pycodestyle==2.3.1\r\n", + "pycparser==2.18\r\n", + "pycryptodome==3.7.3\r\n", + "Pygments==2.3.0\r\n", + "PyJWT==1.6.4\r\n", + "PyMySQL==0.9.2\r\n", + "pyparsing==2.2.0\r\n", + "PyPDF2==1.26.0\r\n", + "pytesseract==0.2.4\r\n", + "pytest==4.1.1\r\n", + "python-dateutil==2.7.3\r\n", + "pytz==2018.5\r\n", + "pyzmq==17.1.2\r\n", + "qtconsole==4.4.3\r\n", + "requests==2.19.1\r\n", + "requests-oauthlib==1.0.0\r\n", + "requests-toolbelt==0.8.0\r\n", + "retrying==1.3.3\r\n", + "s3transfer==0.1.13\r\n", + "scrapinghub==2.0.3\r\n", + "selenium==3.14.0\r\n", + "Send2Trash==1.5.0\r\n", + "simplejson==3.10.0\r\n", + "six==1.10.0\r\n", + "sortedcontainers==2.1.0\r\n", + "style==1.1.0\r\n", + "tabula-py==1.3.1\r\n", + "tabulate==0.8.2\r\n", + "terminado==0.8.1\r\n", + "testpath==0.4.2\r\n", + "toml==0.10.0\r\n", + "tornado==5.1.1\r\n", + "tox==3.7.0\r\n", + "traitlets==4.3.2\r\n", + "update==0.0.1\r\n", + "urllib3==1.23\r\n", + "virtualenv==16.3.0\r\n", + "Wand==0.4.4\r\n", + "wcwidth==0.1.7\r\n", + "webencodings==0.5.1\r\n", + "widgetsnbextension==3.4.2\r\n" + ] + } + ], + "source": [ + "!pip freeze" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!echo $PATH " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus 2: Top 10 most useful ipython key shortcuts" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Shift + Enter - \trun cell\n", + "* Alt + Enter - \trun cell, insert below\n", + "* Ctrl + m, c - \tcopy cell\n", + "* Ctrl + m, v - \tpaste cell\n", + "* Ctrl + m, l - \ttoggle line numbers\n", + "* Ctrl + m, j -\tmove cell\n", + "* Ctrl + m, y -\tcode cell\n", + "* Ctrl + m, m -\tmarkdown cell\n", + "* Ctrl + m, . -\trestart kernel\n", + "* Ctrl + m, h -\tshow keyboard shortcuts" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "1+1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "1+1\n", + "## markdown" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From b76df831a626da510e38d800a42230c166ae9950 Mon Sep 17 00:00:00 2001 From: softhints Date: Sun, 3 Feb 2019 20:48:14 +0200 Subject: [PATCH 03/76] Pandas search in column, every column and regex --- ...ch in column, every column and regex.ipynb | 1234 +++++++++++++++++ 1 file changed, 1234 insertions(+) create mode 100644 notebooks/Pandas search in column, every column and regex.ipynb diff --git a/notebooks/Pandas search in column, every column and regex.ipynb b/notebooks/Pandas search in column, every column and regex.ipynb new file mode 100644 index 0000000..699ac60 --- /dev/null +++ b/notebooks/Pandas search in column, every column and regex.ipynb @@ -0,0 +1,1234 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas search in column, every column and regex of a dataframe" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Example PDFs\n", + "\n", + "* Food Calories List\n", + "\n", + "http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## With tabula-py\n", + "\n", + "#### Installation\n", + "\n", + "https://pypi.org/project/tabula-py/\n", + "\n", + "`pip install pandas`\n", + "`pip install tabula-py`\n", + "\n", + "#### tabula-py docs\n", + "\n", + "https://www.pydoc.io/pypi/tabula-py-0.9.0/autoapi/wrapper/index.html" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Read tabular data from PDF" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "from tabula import read_pdf\n", + "from tabulate import tabulate" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
FruitCalories per pieceCarbs (grams)Water Content
0Apple (1 average)44 calories10.585 %
1Apple cooking35 calories988 %
2Apricot30 calories6.785 %
3Avocado150 calories260 %
4Banana107 calories2675 %
5Blackberries each1 calorie0.285 %
6Blackcurrant each1.1 calorie0.2577 %
7Blueberries (new) 100g49 Cals ( 100g )15 g81 %
8Cherry each2.4 calories0.683 %
9Clementine24 cals566 %
10Currants5 calories1.416 %
11Damson28 calories7.270 %
12One average date 5g5 cals1.214 %
13Dates with inverted sugar 100g250 calories6312 %
14Figs10 calories2.424 %
15Gooseberries2.6 calories0.6580 %
16Grapes 100g Seedless50 cals1582 %
17one average Grape 6g3 calories0.982 %
18Grapefruit whole100 calories2365 %
19Guava24 calories4.485 %
20Kiwi34 calories875 %
21Lemon20 calories3.485 %
22Lychees3 calories0.780 %
23Mango40 calories9.580 %
24Melon Honeydew (130g)36 calories990 %
25Melon Canteloupe (130g)25 cals693 %
26Nectarines42 calories980 %
27Olives6.8 caloriestrace63 %
\n", + "
" + ], + "text/plain": [ + " Fruit Calories per piece Carbs (grams) \\\n", + "0 Apple (1 average) 44 calories 10.5 \n", + "1 Apple cooking 35 calories 9 \n", + "2 Apricot 30 calories 6.7 \n", + "3 Avocado 150 calories 2 \n", + "4 Banana 107 calories 26 \n", + "5 Blackberries each 1 calorie 0.2 \n", + "6 Blackcurrant each 1.1 calorie 0.25 \n", + "7 Blueberries (new) 100g 49 Cals ( 100g ) 15 g \n", + "8 Cherry each 2.4 calories 0.6 \n", + "9 Clementine 24 cals 5 \n", + "10 Currants 5 calories 1.4 \n", + "11 Damson 28 calories 7.2 \n", + "12 One average date 5g 5 cals 1.2 \n", + "13 Dates with inverted sugar 100g 250 calories 63 \n", + "14 Figs 10 calories 2.4 \n", + "15 Gooseberries 2.6 calories 0.65 \n", + "16 Grapes 100g Seedless 50 cals 15 \n", + "17 one average Grape 6g 3 calories 0.9 \n", + "18 Grapefruit whole 100 calories 23 \n", + "19 Guava 24 calories 4.4 \n", + "20 Kiwi 34 calories 8 \n", + "21 Lemon 20 calories 3.4 \n", + "22 Lychees 3 calories 0.7 \n", + "23 Mango 40 calories 9.5 \n", + "24 Melon Honeydew (130g) 36 calories 9 \n", + "25 Melon Canteloupe (130g) 25 cals 6 \n", + "26 Nectarines 42 calories 9 \n", + "27 Olives 6.8 calories trace \n", + "\n", + " Water Content \n", + "0 85 % \n", + "1 88 % \n", + "2 85 % \n", + "3 60 % \n", + "4 75 % \n", + "5 85 % \n", + "6 77 % \n", + "7 81 % \n", + "8 83 % \n", + "9 66 % \n", + "10 16 % \n", + "11 70 % \n", + "12 14 % \n", + "13 12 % \n", + "14 24 % \n", + "15 80 % \n", + "16 82 % \n", + "17 82 % \n", + "18 65 % \n", + "19 85 % \n", + "20 75 % \n", + "21 85 % \n", + "22 80 % \n", + "23 80 % \n", + "24 90 % \n", + "25 93 % \n", + "26 80 % \n", + "27 63 % " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=8)\n", + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataframe Search in a single column for a string" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
FruitCalories per pieceCarbs (grams)Water Content
0Apple (1 average)44 calories10.585 %
7Blueberries (new) 100g49 Cals ( 100g )15 g81 %
13Dates with inverted sugar 100g250 calories6312 %
16Grapes 100g Seedless50 cals1582 %
24Melon Honeydew (130g)36 calories990 %
25Melon Canteloupe (130g)25 cals693 %
\n", + "
" + ], + "text/plain": [ + " Fruit Calories per piece Carbs (grams) \\\n", + "0 Apple (1 average) 44 calories 10.5 \n", + "7 Blueberries (new) 100g 49 Cals ( 100g ) 15 g \n", + "13 Dates with inverted sugar 100g 250 calories 63 \n", + "16 Grapes 100g Seedless 50 cals 15 \n", + "24 Melon Honeydew (130g) 36 calories 9 \n", + "25 Melon Canteloupe (130g) 25 cals 6 \n", + "\n", + " Water Content \n", + "0 85 % \n", + "7 81 % \n", + "13 12 % \n", + "16 82 % \n", + "24 90 % \n", + "25 93 % " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print('1')\n", + "df[df['Fruit'].str.contains(\"1\")]" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
FruitWater Content
24Melon Honeydew (130g)90 %
25Melon Canteloupe (130g)93 %
\n", + "
" + ], + "text/plain": [ + " Fruit Water Content\n", + "24 Melon Honeydew (130g) 90 %\n", + "25 Melon Canteloupe (130g) 93 %" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['Fruit'].str.contains(\"Melon\")][['Fruit', 'Water Content']]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataframe Search in every column for a string" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
FruitCalories per pieceCarbs (grams)Water Content
0Apple (1 average)44 calories10.585 %
3Avocado150 calories260 %
4Banana107 calories2675 %
5Blackberries each1 calorie0.285 %
6Blackcurrant each1.1 calorie0.2577 %
7Blueberries (new) 100g49 Cals ( 100g )15 g81 %
10Currants5 calories1.416 %
12One average date 5g5 cals1.214 %
13Dates with inverted sugar 100g250 calories6312 %
14Figs10 calories2.424 %
16Grapes 100g Seedless50 cals1582 %
18Grapefruit whole100 calories2365 %
24Melon Honeydew (130g)36 calories990 %
25Melon Canteloupe (130g)25 cals693 %
\n", + "
" + ], + "text/plain": [ + " Fruit Calories per piece Carbs (grams) \\\n", + "0 Apple (1 average) 44 calories 10.5 \n", + "3 Avocado 150 calories 2 \n", + "4 Banana 107 calories 26 \n", + "5 Blackberries each 1 calorie 0.2 \n", + "6 Blackcurrant each 1.1 calorie 0.25 \n", + "7 Blueberries (new) 100g 49 Cals ( 100g ) 15 g \n", + "10 Currants 5 calories 1.4 \n", + "12 One average date 5g 5 cals 1.2 \n", + "13 Dates with inverted sugar 100g 250 calories 63 \n", + "14 Figs 10 calories 2.4 \n", + "16 Grapes 100g Seedless 50 cals 15 \n", + "18 Grapefruit whole 100 calories 23 \n", + "24 Melon Honeydew (130g) 36 calories 9 \n", + "25 Melon Canteloupe (130g) 25 cals 6 \n", + "\n", + " Water Content \n", + "0 85 % \n", + "3 60 % \n", + "4 75 % \n", + "5 85 % \n", + "6 77 % \n", + "7 81 % \n", + "10 16 % \n", + "12 14 % \n", + "13 12 % \n", + "14 24 % \n", + "16 82 % \n", + "18 65 % \n", + "24 90 % \n", + "25 93 % " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print('1')\n", + "df2= df[df.apply(lambda row: row.astype(str).str.contains('1').any(), axis=1)]\n", + "df2" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Melon\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
FruitWater Content
24Melon Honeydew (130g)90 %
25Melon Canteloupe (130g)93 %
\n", + "
" + ], + "text/plain": [ + " Fruit Water Content\n", + "24 Melon Honeydew (130g) 90 %\n", + "25 Melon Canteloupe (130g) 93 %" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print('Melon')\n", + "df2 = df[df.apply(lambda row: row.astype(str).str.contains('Melon').any(), axis=1)][['Fruit', 'Water Content']]\n", + "df2" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "cals\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
FruitCalories per pieceCarbs (grams)Water Content
9Clementine24 cals566 %
12One average date 5g5 cals1.214 %
16Grapes 100g Seedless50 cals1582 %
25Melon Canteloupe (130g)25 cals693 %
\n", + "
" + ], + "text/plain": [ + " Fruit Calories per piece Carbs (grams) Water Content\n", + "9 Clementine 24 cals 5 66 %\n", + "12 One average date 5g 5 cals 1.2 14 %\n", + "16 Grapes 100g Seedless 50 cals 15 82 %\n", + "25 Melon Canteloupe (130g) 25 cals 6 93 %" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print('cals')\n", + "df[df.apply(lambda row: row.astype(str).str.contains('cals').any(), axis=1)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataframe search with regular expression" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Apple (1 average),44 calories,10.5,85 %',\n", + " 'Apple cooking,35 calories,9,88 %',\n", + " 'Apricot,30 calories,6.7,85 %',\n", + " 'Avocado,150 calories,2,60 %',\n", + " 'Banana,107 calories,26,75 %',\n", + " 'Blackberries each,1 calorie,0.2,85 %',\n", + " 'Blackcurrant each,1.1 calorie,0.25,77 %',\n", + " 'Blueberries (new) 100g,49 Cals ( 100g ),15 g,81 %',\n", + " 'Cherry each,2.4 calories,0.6,83 %',\n", + " 'Clementine,24 cals,5,66 %',\n", + " 'Currants,5 calories,1.4,16 %',\n", + " 'Damson,28 calories,7.2,70 %',\n", + " 'One average date 5g,5 cals,1.2,14 %',\n", + " 'Dates with inverted sugar 100g,250 calories,63,12 %',\n", + " 'Figs,10 calories,2.4,24 %',\n", + " 'Gooseberries,2.6 calories,0.65,80 %',\n", + " 'Grapes 100g Seedless,50 cals,15,82 %',\n", + " 'one average Grape 6g,3 calories,0.9,82 %',\n", + " 'Grapefruit whole,100 calories,23,65 %',\n", + " 'Guava,24 calories,4.4,85 %',\n", + " 'Kiwi,34 calories,8,75 %',\n", + " 'Lemon,20 calories,3.4,85 %',\n", + " 'Lychees,3 calories,0.7,80 %',\n", + " 'Mango,40 calories,9.5,80 %',\n", + " 'Melon Honeydew (130g),36 calories,9,90 %',\n", + " 'Melon Canteloupe (130g),25 cals,6,93 %',\n", + " 'Nectarines,42 calories,9,80 %',\n", + " 'Olives,6.8 calories,trace,63 %']" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "vals = df.to_csv(header=None, index=False).strip('\\n').split('\\n')\n", + "vals" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['10.5']\n", + "['9,88']\n", + "['6.7']\n", + "['150', '2,60']\n", + "['107', '26,75']\n", + "['0.2']\n", + "['1.1', '0.25']\n", + "['100', '100']\n", + "['2.4', '0.6']\n", + "['5,66']\n", + "['1.4']\n", + "['7.2']\n", + "['1.2']\n", + "['100', '250', '63,12']\n", + "['2.4']\n", + "['2.6', '0.65']\n", + "['100', '15,82']\n", + "['0.9']\n", + "['100', '23,65']\n", + "['4.4']\n", + "['8,75']\n", + "['3.4']\n", + "['0.7']\n", + "['9.5']\n", + "['130', '9,90']\n", + "['130', '6,93']\n", + "['9,80']\n", + "['6.8']\n" + ] + } + ], + "source": [ + "import re\n", + "for val in vals:\n", + " #print(val)\n", + " found = re.findall(\"\\d+.\\d+\",val)\n", + " if found:\n", + " print(found)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123456789...18192021222324252627
FruitApple (1 average)Apple cookingApricotAvocadoBananaBlackberries eachBlackcurrant eachBlueberries (new) 100gCherry eachClementine...Grapefruit wholeGuavaKiwiLemonLycheesMangoMelon Honeydew (130g)Melon Canteloupe (130g)NectarinesOlives
Calories per piece44 calories35 calories30 calories150 calories107 calories1 calorie1.1 calorie49 Cals ( 100g )2.4 calories24 cals...100 calories24 calories34 calories20 calories3 calories40 calories36 calories25 cals42 calories6.8 calories
Carbs (grams)10.596.72260.20.2515 g0.65...234.483.40.79.5969trace
Water Content85 %88 %85 %60 %75 %85 %77 %81 %83 %66 %...65 %85 %75 %85 %80 %80 %90 %93 %80 %63 %
\n", + "

4 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " 0 1 2 \\\n", + "Fruit Apple (1 average) Apple cooking Apricot \n", + "Calories per piece 44 calories 35 calories 30 calories \n", + "Carbs (grams) 10.5 9 6.7 \n", + "Water Content 85 % 88 % 85 % \n", + "\n", + " 3 4 5 \\\n", + "Fruit Avocado Banana Blackberries each \n", + "Calories per piece 150 calories 107 calories 1 calorie \n", + "Carbs (grams) 2 26 0.2 \n", + "Water Content 60 % 75 % 85 % \n", + "\n", + " 6 7 8 \\\n", + "Fruit Blackcurrant each Blueberries (new) 100g Cherry each \n", + "Calories per piece 1.1 calorie 49 Cals ( 100g ) 2.4 calories \n", + "Carbs (grams) 0.25 15 g 0.6 \n", + "Water Content 77 % 81 % 83 % \n", + "\n", + " 9 ... 18 19 \\\n", + "Fruit Clementine ... Grapefruit whole Guava \n", + "Calories per piece 24 cals ... 100 calories 24 calories \n", + "Carbs (grams) 5 ... 23 4.4 \n", + "Water Content 66 % ... 65 % 85 % \n", + "\n", + " 20 21 22 23 \\\n", + "Fruit Kiwi Lemon Lychees Mango \n", + "Calories per piece 34 calories 20 calories 3 calories 40 calories \n", + "Carbs (grams) 8 3.4 0.7 9.5 \n", + "Water Content 75 % 85 % 80 % 80 % \n", + "\n", + " 24 25 \\\n", + "Fruit Melon Honeydew (130g) Melon Canteloupe (130g) \n", + "Calories per piece 36 calories 25 cals \n", + "Carbs (grams) 9 6 \n", + "Water Content 90 % 93 % \n", + "\n", + " 26 27 \n", + "Fruit Nectarines Olives \n", + "Calories per piece 42 calories 6.8 calories \n", + "Carbs (grams) 9 trace \n", + "Water Content 80 % 63 % \n", + "\n", + "[4 rows x 28 columns]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.T" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 99e16abf611ccaddfc26791a3539d89e238f79d2 Mon Sep 17 00:00:00 2001 From: softhints Date: Mon, 4 Feb 2019 13:07:53 +0200 Subject: [PATCH 04/76] Pandas is column is contained in another column in the same row --- ...ed in another column in the same row.ipynb | 3029 +++++++++++++++++ 1 file changed, 3029 insertions(+) create mode 100644 notebooks/Pandas is column is contained in another column in the same row.ipynb diff --git a/notebooks/Pandas is column is contained in another column in the same row.ipynb b/notebooks/Pandas is column is contained in another column in the same row.ipynb new file mode 100644 index 0000000..0ca1dff --- /dev/null +++ b/notebooks/Pandas is column is contained in another column in the same row.ipynb @@ -0,0 +1,3029 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas is column is contained in another column in the same row\n", + "\n", + "dataset:\n", + "\n", + "https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset#movie_metadata.csv" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namenum_critic_for_reviewsduration
0James Cameron723.0178.0
1Gore Verbinski302.0169.0
2Sam Mendes602.0148.0
3Christopher Nolan813.0164.0
4Doug WalkerNaNNaN
\n", + "
" + ], + "text/plain": [ + " director_name num_critic_for_reviews duration\n", + "0 James Cameron 723.0 178.0\n", + "1 Gore Verbinski 302.0 169.0\n", + "2 Sam Mendes 602.0 148.0\n", + "3 Christopher Nolan 813.0 164.0\n", + "4 Doug Walker NaN NaN" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# read a dataset movies\n", + "import pandas as pd\n", + "movies = pd.read_csv('./csv/movie_metadata.csv', usecols=[1,2,3])\n", + "movies.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_nameactor_2_namegrossgenresmovie_titleplot_keywordscontent_rating
5038Scott SmithDaphne ZunigaNaNComedy|DramaSigned Sealed Deliveredfraud|postal worker|prison|theft|trialNaN
5039NaNValorie CurryNaNCrime|Drama|Mystery|ThrillerThe Followingcult|fbi|hideout|prison escape|serial killerTV-14
5040Benjamin RoberdsMaxwell MoodyNaNDrama|Horror|ThrillerA Plague So PleasantNaNNaN
5041Daniel HsiaDaniel Henney10443.0Comedy|Drama|RomanceShanghai CallingNaNPG-13
5042Jon GunnBrian Herzlinger85222.0DocumentaryMy Date with Drewactress name in title|crush|date|four word tit...PG
\n", + "
" + ], + "text/plain": [ + " director_name actor_2_name gross \\\n", + "5038 Scott Smith Daphne Zuniga NaN \n", + "5039 NaN Valorie Curry NaN \n", + "5040 Benjamin Roberds Maxwell Moody NaN \n", + "5041 Daniel Hsia Daniel Henney 10443.0 \n", + "5042 Jon Gunn Brian Herzlinger 85222.0 \n", + "\n", + " genres movie_title \\\n", + "5038 Comedy|Drama Signed Sealed Delivered  \n", + "5039 Crime|Drama|Mystery|Thriller The Following  \n", + "5040 Drama|Horror|Thriller A Plague So Pleasant  \n", + "5041 Comedy|Drama|Romance Shanghai Calling  \n", + "5042 Documentary My Date with Drew  \n", + "\n", + " plot_keywords content_rating \n", + "5038 fraud|postal worker|prison|theft|trial NaN \n", + "5039 cult|fbi|hideout|prison escape|serial killer TV-14 \n", + "5040 NaN NaN \n", + "5041 NaN PG-13 \n", + "5042 actress name in title|crush|date|four word tit... PG " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# read a dataset movies\n", + "import pandas as pd\n", + "movies = pd.read_csv('./csv/movie_metadata.csv', usecols=['movie_title', 'director_name', 'actor_2_name', 'content_rating','plot_keywords','gross', 'genres'])\n", + "movies.tail()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Compare if two columns match" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titledirector_nameactor_2_name
437The ExpendablesSylvester StalloneSylvester Stallone
504OceansJacques PerrinJacques Perrin
600Star Trek: InsurrectionJonathan FrakesJonathan Frakes
931TedSeth MacFarlaneSeth MacFarlane
1057Dick TracyWarren BeattyWarren Beatty
\n", + "
" + ], + "text/plain": [ + " movie_title director_name actor_2_name\n", + "437 The Expendables  Sylvester Stallone Sylvester Stallone\n", + "504 Oceans  Jacques Perrin Jacques Perrin\n", + "600 Star Trek: Insurrection  Jonathan Frakes Jonathan Frakes\n", + "931 Ted  Seth MacFarlane Seth MacFarlane\n", + "1057 Dick Tracy  Warren Beatty Warren Beatty" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# check if two columns in a single row are identical\n", + "df2 = movies.loc[(movies.director_name == movies.actor_2_name), ['movie_title', 'director_name', 'actor_2_name']]\n", + "df2.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Filter on two conditions" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_nameactor_2_namegrossgenresmovie_titleplot_keywordscontent_rating
1Gore VerbinskiOrlando Bloom309404152.0Action|Adventure|FantasyPirates of the Caribbean: At World's Endgoddess|marriage ceremony|marriage proposal|pi...PG-13
13Gore VerbinskiOrlando Bloom423032628.0Action|Adventure|FantasyPirates of the Caribbean: Dead Man's Chestbox office hit|giant squid|heart|liar's dice|m...PG-13
18Rob MarshallSam Claflin241063875.0Action|Adventure|FantasyPirates of the Caribbean: On Stranger Tidesblackbeard|captain|pirate|revenge|soldierPG-13
21Marc WebbAndrew Garfield262030663.0Action|Adventure|FantasyThe Amazing Spider-Manlizard|outcast|spider|spider man|teenagerPG-13
54Steven SpielbergRay Winstone317011114.0Action|Adventure|FantasyIndiana Jones and the Kingdom of the Crystal S...cult figure|femme fatale|indiana jones|unsubti...PG-13
\n", + "
" + ], + "text/plain": [ + " director_name actor_2_name gross genres \\\n", + "1 Gore Verbinski Orlando Bloom 309404152.0 Action|Adventure|Fantasy \n", + "13 Gore Verbinski Orlando Bloom 423032628.0 Action|Adventure|Fantasy \n", + "18 Rob Marshall Sam Claflin 241063875.0 Action|Adventure|Fantasy \n", + "21 Marc Webb Andrew Garfield 262030663.0 Action|Adventure|Fantasy \n", + "54 Steven Spielberg Ray Winstone 317011114.0 Action|Adventure|Fantasy \n", + "\n", + " movie_title \\\n", + "1 Pirates of the Caribbean: At World's End  \n", + "13 Pirates of the Caribbean: Dead Man's Chest  \n", + "18 Pirates of the Caribbean: On Stranger Tides  \n", + "21 The Amazing Spider-Man  \n", + "54 Indiana Jones and the Kingdom of the Crystal S... \n", + "\n", + " plot_keywords content_rating \n", + "1 goddess|marriage ceremony|marriage proposal|pi... PG-13 \n", + "13 box office hit|giant squid|heart|liar's dice|m... PG-13 \n", + "18 blackbeard|captain|pirate|revenge|soldier PG-13 \n", + "21 lizard|outcast|spider|spider man|teenager PG-13 \n", + "54 cult figure|femme fatale|indiana jones|unsubti... PG-13 " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# several condtitions can be applied with loc\n", + "df2 = movies.loc[(movies.gross > 100000000.0) & (movies.genres == 'Action|Adventure|Fantasy'), :]\n", + "df2.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Search if a column is part of another column on the same row" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "movies['rat'] = movies.content_rating.astype(str)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_nameactor_2_namegrossgenresmovie_titleplot_keywordscontent_ratingrat
0James CameronJoel David Moore760505847.0Action|Adventure|Fantasy|Sci-FiAvataravatar|future|marine|native|paraplegicPG-13PG-13
1Gore VerbinskiOrlando Bloom309404152.0Action|Adventure|FantasyPirates of the Caribbean: At World's Endgoddess|marriage ceremony|marriage proposal|pi...PG-13PG-13
2Sam MendesRory Kinnear200074175.0Action|Adventure|ThrillerSpectrebomb|espionage|sequel|spy|terroristPG-13PG-13
3Christopher NolanChristian Bale448130642.0Action|ThrillerThe Dark Knight Risesdeception|imprisonment|lawlessness|police offi...PG-13PG-13
4Doug WalkerRob WalkerNaNDocumentaryStar Wars: Episode VII - The Force Awakens  ...NaNNaNnan
\n", + "
" + ], + "text/plain": [ + " director_name actor_2_name gross \\\n", + "0 James Cameron Joel David Moore 760505847.0 \n", + "1 Gore Verbinski Orlando Bloom 309404152.0 \n", + "2 Sam Mendes Rory Kinnear 200074175.0 \n", + "3 Christopher Nolan Christian Bale 448130642.0 \n", + "4 Doug Walker Rob Walker NaN \n", + "\n", + " genres \\\n", + "0 Action|Adventure|Fantasy|Sci-Fi \n", + "1 Action|Adventure|Fantasy \n", + "2 Action|Adventure|Thriller \n", + "3 Action|Thriller \n", + "4 Documentary \n", + "\n", + " movie_title \\\n", + "0 Avatar  \n", + "1 Pirates of the Caribbean: At World's End  \n", + "2 Spectre  \n", + "3 The Dark Knight Rises  \n", + "4 Star Wars: Episode VII - The Force Awakens  ... \n", + "\n", + " plot_keywords content_rating rat \n", + "0 avatar|future|marine|native|paraplegic PG-13 PG-13 \n", + "1 goddess|marriage ceremony|marriage proposal|pi... PG-13 PG-13 \n", + "2 bomb|espionage|sequel|spy|terrorist PG-13 PG-13 \n", + "3 deception|imprisonment|lawlessness|police offi... PG-13 PG-13 \n", + "4 NaN NaN nan " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Using lambda expression" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* **Axis 0** iterate on all the ROWS in each COLUMN\n", + "* **Axis 1** iterate on all the COLUMNS in each ROW" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
content_ratingmovie_title
94RTerminator 3: Rise of the Machines
124RThe Matrix Revolutions
126RThe Matrix Reloaded
128RMad Max: Fury Road
179RThe Revenant
\n", + "
" + ], + "text/plain": [ + " content_rating movie_title\n", + "94 R Terminator 3: Rise of the Machines \n", + "124 R The Matrix Revolutions \n", + "126 R The Matrix Reloaded \n", + "128 R Mad Max: Fury Road \n", + "179 R The Revenant " + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_filter = movies[movies.apply(lambda row: row.rat in row.movie_title, axis=1)]\n", + "df_filter[['content_rating','movie_title']].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
content_ratingmovie_title
94RTerminator 3: Rise of the Machines
124RThe Matrix Revolutions
126RThe Matrix Reloaded
128RMad Max: Fury Road
179RThe Revenant
\n", + "
" + ], + "text/plain": [ + " content_rating movie_title\n", + "94 R Terminator 3: Rise of the Machines \n", + "124 R The Matrix Revolutions \n", + "126 R The Matrix Reloaded \n", + "128 R Mad Max: Fury Road \n", + "179 R The Revenant " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_filter = movies[movies.apply(lambda row: row.movie_title.find(row.rat) != -1, axis=1)]\n", + "df_filter[['content_rating','movie_title']].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "14" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'Terminator 3: Rise of the Machines'.find('R')" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "-1" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'Terminator 3: Rise of the Machines'.find('A')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "'Terminator 3: Rise of the Machines'.find('A') != -1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Using for loop and series to search" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "rating_list = movies['content_rating']\n", + "title_list = movies['movie_title']" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[False, False, False, False, False]" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_part = []\n", + "for i, rate in enumerate(rating_list):\n", + " if not pd.isna(title_list[i]) and (str(rate) in title_list[i]):\n", + " is_part.append(True)\n", + " else:\n", + " is_part.append(False)\n", + "is_part[:5] " + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
content_ratingmovie_title
94RTerminator 3: Rise of the Machines
124RThe Matrix Revolutions
126RThe Matrix Reloaded
128RMad Max: Fury Road
179RThe Revenant
\n", + "
" + ], + "text/plain": [ + " content_rating movie_title\n", + "94 R Terminator 3: Rise of the Machines \n", + "124 R The Matrix Revolutions \n", + "126 R The Matrix Reloaded \n", + "128 R Mad Max: Fury Road \n", + "179 R The Revenant " + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[is_part][['content_rating','movie_title']].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## search part of a column it it contained in another column?" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 avatar\n", + "1 goddess\n", + "2 bomb\n", + "3 deception\n", + "4 NaN\n", + "Name: plot_keywords, dtype: object" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "keywords_list = movies['plot_keywords']\n", + "title_list = movies['movie_title']\n", + "keys_list = movies['plot_keywords'].str.split('|').str.get(0)\n", + "keys_list[:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Avatar \n", + "1 Pirates of the Caribbean: At World's End \n", + "2 Spectre \n", + "3 The Dark Knight Rises \n", + "4 Star Wars: Episode VII - The Force Awakens  ...\n", + "Name: movie_title, dtype: object" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "title_list[:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[False, False, False, False, False]" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_part = []\n", + "for i, key in enumerate(keys_list):\n", + " if not pd.isna(title_list[i]) and (str(key) in title_list[i]):\n", + " is_part.append(True)\n", + " else:\n", + " is_part.append(False)\n", + "is_part[:5] " + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
plot_keywordsmovie_title
620ball|blood|skating|song|year 2005Rollerball
989boy|bully|dream|dream sequence|planetThe Adventures of Sharkboy and Lavagirl 3-D
1767cop|corrupt politician|future|senator|time travelTimecop
4185ape|fear|future|spacecraft|spaceshipEscape from the Planet of the Apes
\n", + "
" + ], + "text/plain": [ + " plot_keywords \\\n", + "620 ball|blood|skating|song|year 2005 \n", + "989 boy|bully|dream|dream sequence|planet \n", + "1767 cop|corrupt politician|future|senator|time travel \n", + "4185 ape|fear|future|spacecraft|spaceship \n", + "\n", + " movie_title \n", + "620 Rollerball  \n", + "989 The Adventures of Sharkboy and Lavagirl 3-D  \n", + "1767 Timecop  \n", + "4185 Escape from the Planet of the Apes  " + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[is_part][['plot_keywords','movie_title']]" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[True,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " True,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " True,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " True,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " True,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " True,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " True,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " True,\n", + " False,\n", + " True,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " True,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " False,\n", + " ...]" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_part = []\n", + "for i, key in enumerate(keys_list):\n", + " if not pd.isna(title_list[i]) and (str(key) in title_list[i] or str(key).lower() == title_list[i].strip().lower()):\n", + " is_part.append(True)\n", + " else:\n", + " is_part.append(False)\n", + "is_part" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
plot_keywordsmovie_title
0avatar|future|marine|native|paraplegicAvatar
33alice in wonderland|mistaking reality for drea...Alice in Wonderland
196australia|cattle|darwin|drover|japaneseAustralia
620ball|blood|skating|song|year 2005Rollerball
742contagion|cure|infection|panic|virusContagion
833anger management|argument|irony|sarcasm|therapistAnger Management
847casper|friendly ghost|ghost|maine|mansionCasper
888burlesque|dancer|iowa|small town girl|stageBurlesque
890lolita|nymphet|older man young girl relationsh...Lolita
989boy|bully|dream|dream sequence|planetThe Adventures of Sharkboy and Lavagirl 3-D
1160cleopatra|egypt|epic|queen|roman empireCleopatra
1580arachnophobia|death|doctor|small town|spiderArachnophobia
1767cop|corrupt politician|future|senator|time travelTimecop
1997blindness|epidemic|hospital|pubic hair|quarantineBlindness
2175drumline|drummer|fish out of water|marching ba...Drumline
2186boogeyman|childhood|fear|hometown|uncleBoogeyman
2318flash of genius|genius|intellectual property|p...Flash of Genius
2349ramanujanRamanujan
2481hitman|impersonation|see through dress|topless...Hitman
2492halloween|masked killer|michael myers|slasher|...Halloween
2619halloween|masked killer|michael myers|slasher|...Halloween
2911machete|mexican|mexico|priest|texasMachete
2981celebrity|journalist|lesbian kiss|strong femal...Celebrity
3036phone booth|publicist|single set production|sn...Phone Booth
3113devil|elevator|hit and run|throat slitting|tra...Devil
3277alien|creature|future|outer space|spaceshipAlien
3617college|face slap|high school|loss of virginit...College
3745april fool's day|island|mansion|psycho|secretApril Fool's Day
4109freeway|nightmare|police|school|trailer parkFreeway
4128alice in wonderland|mistaking reality for drea...Alice in Wonderland
4185ape|fear|future|spacecraft|spaceshipEscape from the Planet of the Apes
4256lolita|nymphet|older man young girl relationsh...Lolita
4393caramel|friendship|police|secret|suitorCaramel
4821halloween|masked killer|michael myers|slasher|...Halloween
4939aroused|photography|pornography documentary|po...Aroused
\n", + "
" + ], + "text/plain": [ + " plot_keywords \\\n", + "0 avatar|future|marine|native|paraplegic \n", + "33 alice in wonderland|mistaking reality for drea... \n", + "196 australia|cattle|darwin|drover|japanese \n", + "620 ball|blood|skating|song|year 2005 \n", + "742 contagion|cure|infection|panic|virus \n", + "833 anger management|argument|irony|sarcasm|therapist \n", + "847 casper|friendly ghost|ghost|maine|mansion \n", + "888 burlesque|dancer|iowa|small town girl|stage \n", + "890 lolita|nymphet|older man young girl relationsh... \n", + "989 boy|bully|dream|dream sequence|planet \n", + "1160 cleopatra|egypt|epic|queen|roman empire \n", + "1580 arachnophobia|death|doctor|small town|spider \n", + "1767 cop|corrupt politician|future|senator|time travel \n", + "1997 blindness|epidemic|hospital|pubic hair|quarantine \n", + "2175 drumline|drummer|fish out of water|marching ba... \n", + "2186 boogeyman|childhood|fear|hometown|uncle \n", + "2318 flash of genius|genius|intellectual property|p... \n", + "2349 ramanujan \n", + "2481 hitman|impersonation|see through dress|topless... \n", + "2492 halloween|masked killer|michael myers|slasher|... \n", + "2619 halloween|masked killer|michael myers|slasher|... \n", + "2911 machete|mexican|mexico|priest|texas \n", + "2981 celebrity|journalist|lesbian kiss|strong femal... \n", + "3036 phone booth|publicist|single set production|sn... \n", + "3113 devil|elevator|hit and run|throat slitting|tra... \n", + "3277 alien|creature|future|outer space|spaceship \n", + "3617 college|face slap|high school|loss of virginit... \n", + "3745 april fool's day|island|mansion|psycho|secret \n", + "4109 freeway|nightmare|police|school|trailer park \n", + "4128 alice in wonderland|mistaking reality for drea... \n", + "4185 ape|fear|future|spacecraft|spaceship \n", + "4256 lolita|nymphet|older man young girl relationsh... \n", + "4393 caramel|friendship|police|secret|suitor \n", + "4821 halloween|masked killer|michael myers|slasher|... \n", + "4939 aroused|photography|pornography documentary|po... \n", + "\n", + " movie_title \n", + "0 Avatar  \n", + "33 Alice in Wonderland  \n", + "196 Australia  \n", + "620 Rollerball  \n", + "742 Contagion  \n", + "833 Anger Management  \n", + "847 Casper  \n", + "888 Burlesque  \n", + "890 Lolita  \n", + "989 The Adventures of Sharkboy and Lavagirl 3-D  \n", + "1160 Cleopatra  \n", + "1580 Arachnophobia  \n", + "1767 Timecop  \n", + "1997 Blindness  \n", + "2175 Drumline  \n", + "2186 Boogeyman  \n", + "2318 Flash of Genius  \n", + "2349 Ramanujan  \n", + "2481 Hitman  \n", + "2492 Halloween  \n", + "2619 Halloween  \n", + "2911 Machete  \n", + "2981 Celebrity  \n", + "3036 Phone Booth  \n", + "3113 Devil  \n", + "3277 Alien  \n", + "3617 College  \n", + "3745 April Fool's Day  \n", + "4109 Freeway  \n", + "4128 Alice in Wonderland  \n", + "4185 Escape from the Planet of the Apes  \n", + "4256 Lolita  \n", + "4393 Caramel  \n", + "4821 Halloween  \n", + "4939 Aroused  " + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[is_part][['plot_keywords','movie_title']]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Performance tests\n", + "\n", + "* for loop and comparison - 8.304 seconds\n", + "* lambda expresion - 25.893 seconds" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 18545451 function calls (18544652 primitive calls) in 8.718 seconds\n", + "\n", + " Ordered by: standard name\n", + "\n", + " ncalls tottime percall cumtime percall filename:lineno(function)\n", + " 198 0.000 0.000 0.000 0.000 :416(parent)\n", + " 1290 0.001 0.000 0.002 0.000 :997(_handle_fromlist)\n", + " 1 0.498 0.498 8.718 8.718 :7(before)\n", + " 1 0.000 0.000 8.718 8.718 :1()\n", + " 100 0.000 0.000 0.000 0.000 __init__.py:205(iteritems)\n", + " 497 0.000 0.000 0.003 0.000 _methods.py:42(_any)\n", + " 297 0.000 0.000 0.001 0.000 algorithms.py:1421(_get_take_nd_function)\n", + " 297 0.003 0.000 0.031 0.000 algorithms.py:1548(take_nd)\n", + " 1 0.000 0.000 0.000 0.000 base.py:1569(is_unique)\n", + " 499257 0.441 0.000 1.580 0.000 base.py:1647(_convert_scalar_indexer)\n", + " 2 0.000 0.000 0.000 0.000 base.py:1935(_engine)\n", + " 2 0.000 0.000 0.000 0.000 base.py:1938()\n", + " 297 0.001 0.000 0.001 0.000 base.py:2033(__contains__)\n", + " 4 0.000 0.000 0.000 0.000 base.py:2067(__getitem__)\n", + " 99 0.000 0.000 0.010 0.000 base.py:2179(take)\n", + " 99 0.000 0.000 0.002 0.000 base.py:2445(equals)\n", + " 99 0.002 0.000 0.007 0.000 base.py:255(__new__)\n", + " 8 0.000 0.000 0.000 0.000 base.py:3071(get_loc)\n", + " 499257 1.162 0.000 7.143 0.000 base.py:3090(get_value)\n", + " 499257 0.120 0.000 0.183 0.000 base.py:4117(_maybe_cast_indexer)\n", + " 100 0.000 0.000 0.001 0.000 base.py:473(_simple_new)\n", + " 204 0.000 0.000 0.000 0.000 base.py:4914(_ensure_index)\n", + " 99 0.001 0.000 0.008 0.000 base.py:520(_shallow_copy_with_infer)\n", + " 1685 0.001 0.000 0.003 0.000 base.py:61(is_dtype)\n", + " 99 0.000 0.000 0.000 0.000 base.py:615(is_)\n", + " 100 0.000 0.000 0.000 0.000 base.py:635(_reset_identity)\n", + " 799 0.000 0.000 0.000 0.000 base.py:641(__len__)\n", + " 99 0.000 0.000 0.000 0.000 base.py:662(dtype)\n", + " 301 0.000 0.000 0.000 0.000 base.py:672(values)\n", + " 2 0.000 0.000 0.000 0.000 base.py:677(_values)\n", + " 198 0.000 0.000 0.000 0.000 base.py:711(get_values)\n", + " 2 0.000 0.000 0.000 0.000 base.py:789(_ndarray_values)\n", + " 99 0.000 0.000 0.005 0.000 base.py:893(tolist)\n", + " 99 0.000 0.000 0.000 0.000 base.py:904(_coerce_to_ndarray)\n", + " 99 0.000 0.000 0.005 0.000 base.py:912(__iter__)\n", + " 99 0.000 0.000 0.000 0.000 base.py:920(_get_attributes_dict)\n", + " 99 0.000 0.000 0.000 0.000 base.py:922()\n", + " 297 0.001 0.000 0.008 0.000 cast.py:257(maybe_promote)\n", + " 99 0.001 0.000 0.011 0.000 common.py:100(is_bool_indexer)\n", + " 99 0.000 0.000 0.001 0.000 common.py:1043(is_datetime64_any_dtype)\n", + " 401 0.000 0.000 0.001 0.000 common.py:122(is_sparse)\n", + " 993 0.002 0.000 0.006 0.000 common.py:1688(is_extension_array_dtype)\n", + " 792 0.000 0.000 0.001 0.000 common.py:1784(_get_dtype)\n", + " 594 0.001 0.000 0.002 0.000 common.py:1835(_get_dtype_type)\n", + " 1 0.000 0.000 0.000 0.000 common.py:195(is_categorical)\n", + " 991 0.000 0.000 0.004 0.000 common.py:227(is_datetimetz)\n", + " 198 0.000 0.000 0.001 0.000 common.py:332(is_datetime64_dtype)\n", + " 1189 0.001 0.000 0.003 0.000 common.py:369(is_datetime64tz_dtype)\n", + " 499554 0.125 0.000 0.169 0.000 common.py:395(_apply_if_callable)\n", + " 198 0.000 0.000 0.001 0.000 common.py:407(is_timedelta64_dtype)\n", + " 495 0.000 0.000 0.001 0.000 common.py:477(is_interval_dtype)\n", + " 199 0.000 0.000 0.000 0.000 common.py:513(is_categorical_dtype)\n", + " 99 0.000 0.000 0.001 0.000 common.py:647(is_datetimelike)\n", + " 396 0.000 0.000 0.001 0.000 common.py:692(is_dtype_equal)\n", + " 99 0.000 0.000 0.000 0.000 common.py:858(is_signed_integer_dtype)\n", + " 99 0.000 0.000 0.001 0.000 common.py:89(is_object_dtype)\n", + " 4 0.000 0.000 0.000 0.000 concat.py:105(_get_sliced_frame_result_type)\n", + " 99 0.000 0.000 0.001 0.000 dtypes.py:401(__new__)\n", + " 99 0.000 0.000 0.001 0.000 dtypes.py:459(construct_from_string)\n", + " 396 0.001 0.000 0.001 0.000 dtypes.py:707(is_dtype)\n", + " 297 0.002 0.000 0.098 0.000 frame.py:2664(__getitem__)\n", + " 198 0.000 0.000 0.001 0.000 frame.py:2690(_getitem_column)\n", + " 99 0.001 0.000 0.094 0.001 frame.py:2707(_getitem_array)\n", + " 4 0.000 0.000 0.000 0.000 frame.py:3093(_box_item_values)\n", + " 4 0.000 0.000 0.000 0.000 frame.py:3100(_box_col_values)\n", + " 99 0.000 0.000 0.000 0.000 frame.py:320(_constructor)\n", + " 99 0.000 0.000 0.001 0.000 frame.py:334(__init__)\n", + " 1 0.000 0.000 0.000 0.000 fromnumeric.py:49(_wrapfunc)\n", + " 1 0.000 0.000 0.000 0.000 fromnumeric.py:882(argsort)\n", + " 103 0.000 0.000 0.000 0.000 generic.py:124(__init__)\n", + " 99 0.000 0.000 0.000 0.000 generic.py:178(_init_mgr)\n", + " 198 0.000 0.000 0.000 0.000 generic.py:2484(_get_item_cache)\n", + " 4 0.000 0.000 0.000 0.000 generic.py:2498(_set_as_cached)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:2577(_clear_item_cache)\n", + " 99 0.000 0.000 0.000 0.000 generic.py:2603(_set_is_copy)\n", + " 99 0.001 0.000 0.071 0.001 generic.py:2783(_take)\n", + " 99 0.000 0.000 0.000 0.000 generic.py:364(_get_axis_number)\n", + " 198 0.000 0.000 0.000 0.000 generic.py:377(_get_axis_name)\n", + " 198 0.000 0.000 0.001 0.000 generic.py:390(_get_axis)\n", + " 99 0.000 0.000 0.000 0.000 generic.py:394(_get_block_manager_axis)\n", + " 99 0.000 0.000 0.000 0.000 generic.py:4345(__finalize__)\n", + " 4 0.000 0.000 0.000 0.000 generic.py:4362(__getattr__)\n", + " 210 0.000 0.000 0.000 0.000 generic.py:4378(__setattr__)\n", + " 99 0.000 0.000 0.002 0.000 generic.py:4423(_protect_consolidate)\n", + " 99 0.000 0.000 0.002 0.000 generic.py:4433(_consolidate_inplace)\n", + " 99 0.000 0.000 0.002 0.000 generic.py:4436(f)\n", + " 504025 0.178 0.000 0.254 0.000 generic.py:7(_check)\n", + " 99 0.000 0.000 0.000 0.000 indexing.py:2321(convert_to_index_sliceable)\n", + " 99 0.000 0.000 0.010 0.000 indexing.py:2345(check_bool_indexer)\n", + " 99 0.002 0.000 0.004 0.000 indexing.py:2441(maybe_convert_indices)\n", + " 4 0.000 0.000 0.000 0.000 inference.py:415(is_hashable)\n", + " 302 0.001 0.000 0.001 0.000 internals.py:116(__init__)\n", + " 297 0.001 0.000 0.036 0.000 internals.py:1237(take_nd)\n", + " 302 0.000 0.000 0.000 0.000 internals.py:127(_check_ndim)\n", + " 8 0.000 0.000 0.000 0.000 internals.py:166(_consolidate_key)\n", + " 499554 0.069 0.000 0.069 0.000 internals.py:203(internal_values)\n", + " 499257 0.142 0.000 0.290 0.000 internals.py:222(to_dense)\n", + " 297 0.000 0.000 0.000 0.000 internals.py:229(fill_value)\n", + " 104 0.000 0.000 0.001 0.000 internals.py:2298(__init__)\n", + " 1206 0.000 0.000 0.000 0.000 internals.py:233(mgr_locs)\n", + " 302 0.000 0.000 0.000 0.000 internals.py:237(mgr_locs)\n", + " 301 0.000 0.000 0.003 0.000 internals.py:269(make_block_same_class)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3148(get_block_type)\n", + " 302 0.001 0.000 0.002 0.000 internals.py:3191(make_block)\n", + " 100 0.000 0.000 0.008 0.000 internals.py:3265(__init__)\n", + " 100 0.000 0.000 0.000 0.000 internals.py:3266()\n", + " 401 0.001 0.000 0.002 0.000 internals.py:3307(shape)\n", + " 1203 0.000 0.000 0.001 0.000 internals.py:3309()\n", + " 400 0.000 0.000 0.000 0.000 internals.py:3311(ndim)\n", + " 101 0.002 0.000 0.004 0.000 internals.py:3363(_rebuild_blknos_and_blklocs)\n", + " 108 0.000 0.000 0.000 0.000 internals.py:3384(_get_items)\n", + " 301 0.000 0.000 0.000 0.000 internals.py:348(shape)\n", + " 100 0.001 0.000 0.002 0.000 internals.py:3488(_verify_integrity)\n", + " 401 0.000 0.000 0.000 0.000 internals.py:3490()\n", + " 499867 0.101 0.000 0.101 0.000 internals.py:352(dtype)\n", + " 305 0.000 0.000 0.001 0.000 internals.py:356(ftype)\n", + " 4 0.000 0.000 0.000 0.000 internals.py:372(iget)\n", + " 298 0.000 0.000 0.000 0.000 internals.py:3776(is_consolidated)\n", + " 101 0.000 0.000 0.001 0.000 internals.py:3784(_consolidate_check)\n", + " 101 0.000 0.000 0.001 0.000 internals.py:3785()\n", + " 99 0.000 0.000 0.001 0.000 internals.py:4085(consolidate)\n", + " 199 0.000 0.000 0.001 0.000 internals.py:4101(_consolidate_inplace)\n", + " 4 0.000 0.000 0.000 0.000 internals.py:4108(get)\n", + " 4 0.000 0.000 0.000 0.000 internals.py:4137(iget)\n", + " 99 0.000 0.000 0.046 0.000 internals.py:4388(reindex_indexer)\n", + " 99 0.000 0.000 0.037 0.000 internals.py:4423()\n", + " 99 0.001 0.000 0.063 0.001 internals.py:4518(take)\n", + " 4 0.000 0.000 0.000 0.000 internals.py:4639(__init__)\n", + " 1498068 0.278 0.000 0.278 0.000 internals.py:4684(_block)\n", + " 499257 0.254 0.000 0.453 0.000 internals.py:4718(dtype)\n", + " 499554 0.266 0.000 0.429 0.000 internals.py:4745(internal_values)\n", + " 499257 0.357 0.000 0.856 0.000 internals.py:4752(get_values)\n", + " 1 0.000 0.000 0.001 0.001 internals.py:5057(_consolidate)\n", + " 8 0.000 0.000 0.000 0.000 internals.py:5063()\n", + " 3 0.000 0.000 0.001 0.000 internals.py:5074(_merge_blocks)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:5088()\n", + " 1 0.000 0.000 0.000 0.000 internals.py:5089()\n", + " 3 0.000 0.000 0.000 0.000 internals.py:5101(_extend_blocks)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:5127(_vstack)\n", + " 4 0.000 0.000 0.000 0.000 missing.py:112(_isna_new)\n", + " 4 0.000 0.000 0.000 0.000 missing.py:32(isna)\n", + " 99 0.000 0.000 0.000 0.000 missing.py:376(array_equivalent)\n", + " 4 0.000 0.000 0.000 0.000 numeric.py:110(is_all_dates)\n", + " 499257 0.530 0.000 2.293 0.000 numeric.py:179(_convert_scalar_indexer)\n", + " 100 0.000 0.000 0.002 0.000 numeric.py:35(__new__)\n", + " 693 0.000 0.000 0.020 0.000 numeric.py:433(asarray)\n", + " 101 0.000 0.000 0.001 0.000 numeric.py:504(asanyarray)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:630(require)\n", + " 99 0.000 0.000 0.008 0.000 numeric.py:64(_shallow_copy)\n", + " 2 0.000 0.000 0.000 0.000 numeric.py:701()\n", + " 1 0.000 0.000 0.000 0.000 range.py:169(_data)\n", + " 1 0.000 0.000 0.000 0.000 range.py:173(_int64index)\n", + " 99 0.000 0.000 0.009 0.000 range.py:260(_shallow_copy)\n", + " 499461 0.280 0.000 0.406 0.000 range.py:481(__len__)\n", + " 4 0.000 0.000 0.000 0.000 series.py:166(__init__)\n", + " 4 0.000 0.000 0.000 0.000 series.py:365(_set_axis)\n", + " 4 0.000 0.000 0.000 0.000 series.py:391(_set_subtyp)\n", + " 4 0.000 0.000 0.000 0.000 series.py:401(name)\n", + " 4 0.000 0.000 0.000 0.000 series.py:405(name)\n", + " 499257 0.218 0.000 0.671 0.000 series.py:412(dtype)\n", + " 499554 0.181 0.000 0.610 0.000 series.py:465(_values)\n", + " 499257 0.175 0.000 1.031 0.000 series.py:476(get_values)\n", + " 499257 0.703 0.000 8.076 0.000 series.py:764(__getitem__)\n", + " 1 0.000 0.000 0.000 0.000 shape_base.py:182(vstack)\n", + " 1 0.000 0.000 0.000 0.000 shape_base.py:234()\n", + " 2 0.000 0.000 0.000 0.000 shape_base.py:63(atleast_2d)\n", + " 100 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x9cff80}\n", + " 499554 0.044 0.000 0.044 0.000 {built-in method builtins.callable}\n", + " 1 0.000 0.000 8.718 8.718 {built-in method builtins.exec}\n", + " 1504621 0.482 0.000 1.091 0.000 {built-in method builtins.getattr}\n", + " 2581 0.001 0.000 0.001 0.000 {built-in method builtins.hasattr}\n", + " 301 0.000 0.000 0.000 0.000 {built-in method builtins.hash}\n", + " 1013118 0.337 0.000 0.590 0.000 {built-in method builtins.isinstance}\n", + " 2089 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}\n", + " 199 0.000 0.000 0.000 0.000 {built-in method builtins.iter}\n", + "503778/502979 0.200 0.000 0.606 0.000 {built-in method builtins.len}\n", + " 499461 0.126 0.000 0.126 0.000 {built-in method builtins.max}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method builtins.sorted}\n", + " 100 0.000 0.000 0.000 0.000 {built-in method builtins.sum}\n", + " 305 0.001 0.000 0.001 0.000 {built-in method numpy.core.multiarray.arange}\n", + " 500052 0.145 0.000 0.145 0.000 {built-in method numpy.core.multiarray.array}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.concatenate}\n", + " 499 0.001 0.000 0.001 0.000 {built-in method numpy.core.multiarray.empty}\n", + " 297 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_int64}\n", + " 99 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_platform_int}\n", + " 998811 0.122 0.000 0.122 0.000 {built-in method pandas._libs.lib.is_float}\n", + " 99 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_integer}\n", + " 499265 0.061 0.000 0.061 0.000 {built-in method pandas._libs.lib.is_scalar}\n", + " 4 0.000 0.000 0.000 0.000 {built-in method pandas._libs.missing.checknull}\n", + " 497 0.000 0.000 0.003 0.000 {method 'any' of 'numpy.ndarray' objects}\n", + " 499262 0.041 0.000 0.041 0.000 {method 'append' of 'list' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'argsort' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'clear' of 'dict' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}\n", + " 202 0.000 0.000 0.000 0.000 {method 'fill' of 'numpy.ndarray' objects}\n", + " 305 0.001 0.000 0.001 0.000 {method 'format' of 'str' objects}\n", + " 792 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}\n", + " 8 0.000 0.000 0.000 0.000 {method 'get_loc' of 'pandas._libs.index.IndexEngine' objects}\n", + " 499257 0.452 0.000 0.452 0.000 {method 'get_value' of 'pandas._libs.index.IndexEngine' objects}\n", + " 199 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}\n", + " 99 0.001 0.000 0.001 0.000 {method 'nonzero' of 'numpy.ndarray' objects}\n", + " 497 0.003 0.000 0.003 0.000 {method 'reduce' of 'numpy.ufunc' objects}\n", + " 198 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects}\n", + " 99 0.000 0.000 0.000 0.000 {method 'search' of '_sre.SRE_Pattern' objects}\n", + " 99 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}\n", + " 99 0.000 0.000 0.000 0.000 {method 'take' of 'numpy.ndarray' objects}\n", + " 99 0.002 0.000 0.002 0.000 {method 'tolist' of 'numpy.ndarray' objects}\n", + " 99 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'upper' of 'str' objects}\n", + " 499558 0.148 0.000 0.148 0.000 {method 'view' of 'numpy.ndarray' objects}\n", + " 99 0.003 0.000 0.003 0.000 {pandas._libs.algos.take_2d_axis1_float64_float64}\n", + " 99 0.001 0.000 0.001 0.000 {pandas._libs.algos.take_2d_axis1_int64_int64}\n", + " 99 0.008 0.000 0.008 0.000 {pandas._libs.algos.take_2d_axis1_object_object}\n", + " 998712 0.411 0.000 1.443 0.000 {pandas._libs.lib.values_from_object}\n", + "\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 53174395 function calls (51174298 primitive calls) in 26.802 seconds\n", + "\n", + " Ordered by: standard name\n", + "\n", + " ncalls tottime percall cumtime percall filename:lineno(function)\n", + " 198 0.000 0.000 0.000 0.000 :416(parent)\n", + " 2574 0.002 0.000 0.004 0.000 :997(_handle_fromlist)\n", + " 1 0.065 0.065 26.802 26.802 :20(after)\n", + " 499257 0.860 0.000 21.422 0.000 :22()\n", + " 1 0.000 0.000 26.802 26.802 :1()\n", + " 99 0.000 0.000 0.000 0.000 __init__.py:205(iteritems)\n", + " 99 0.000 0.000 0.009 0.000 _decorators.py:136(wrapper)\n", + " 594 0.000 0.000 0.004 0.000 _methods.py:42(_any)\n", + " 99 0.000 0.000 0.001 0.000 _methods.py:45(_all)\n", + " 990 0.001 0.000 0.001 0.000 _weakrefset.py:70(__contains__)\n", + " 891 0.002 0.000 0.003 0.000 abc.py:180(__instancecheck__)\n", + " 396 0.001 0.000 0.001 0.000 algorithms.py:1421(_get_take_nd_function)\n", + " 396 0.004 0.000 0.033 0.000 algorithms.py:1548(take_nd)\n", + " 99 0.000 0.000 0.000 0.000 apply.py:101(agg_axis)\n", + " 99 0.001 0.000 26.656 0.269 apply.py:105(get_result)\n", + " 99 0.000 0.000 0.001 0.000 apply.py:14(frame_apply)\n", + " 99 0.003 0.000 26.655 0.269 apply.py:219(apply_standard)\n", + " 99 0.000 0.000 0.000 0.000 apply.py:34(__init__)\n", + " 99 0.000 0.000 0.000 0.000 apply.py:85(columns)\n", + " 99 0.000 0.000 0.192 0.002 apply.py:93(values)\n", + " 99 0.000 0.000 0.019 0.000 apply.py:97(dtypes)\n", + " 99 0.000 0.000 0.000 0.000 base.py:1442(_has_complex_internals)\n", + " 998514 0.450 0.000 1.705 0.000 base.py:1590(is_object)\n", + " 998514 0.935 0.000 2.817 0.000 base.py:1647(_convert_scalar_indexer)\n", + " 1 0.000 0.000 0.000 0.000 base.py:1976(is_all_dates)\n", + " 998613 0.649 0.000 0.784 0.000 base.py:2033(__contains__)\n", + " 998514 0.670 0.000 3.158 0.000 base.py:2101(_can_hold_identifiers_and_holds_name)\n", + " 99 0.000 0.000 0.010 0.000 base.py:2179(take)\n", + " 99 0.000 0.000 0.002 0.000 base.py:2445(equals)\n", + " 99 0.002 0.000 0.007 0.000 base.py:255(__new__)\n", + " 998514 2.477 0.000 13.285 0.000 base.py:3090(get_value)\n", + " 99 0.000 0.000 0.001 0.000 base.py:473(_simple_new)\n", + " 999306 0.296 0.000 0.421 0.000 base.py:4914(_ensure_index)\n", + " 99 0.000 0.000 0.008 0.000 base.py:520(_shallow_copy_with_infer)\n", + " 9702 0.005 0.000 0.011 0.000 base.py:61(is_dtype)\n", + " 99 0.000 0.000 0.000 0.000 base.py:615(is_)\n", + " 99 0.000 0.000 0.000 0.000 base.py:635(_reset_identity)\n", + " 1999602 0.545 0.000 0.763 0.000 base.py:641(__len__)\n", + " 100 0.000 0.000 0.000 0.000 base.py:662(dtype)\n", + " 496 0.000 0.000 0.001 0.000 base.py:672(values)\n", + " 198 0.000 0.000 0.000 0.000 base.py:711(get_values)\n", + " 99 0.000 0.000 0.000 0.000 base.py:904(_coerce_to_ndarray)\n", + " 99 0.000 0.000 0.000 0.000 base.py:920(_get_attributes_dict)\n", + " 99 0.000 0.000 0.000 0.000 base.py:922()\n", + " 99 0.001 0.000 0.006 0.000 cast.py:1093(find_common_type)\n", + " 198 0.000 0.000 0.000 0.000 cast.py:1118()\n", + " 396 0.000 0.000 0.000 0.000 cast.py:1121()\n", + " 198 0.000 0.000 0.000 0.000 cast.py:1126()\n", + " 198 0.000 0.000 0.000 0.000 cast.py:1128()\n", + " 396 0.000 0.000 0.001 0.000 cast.py:1133()\n", + " 297 0.000 0.000 0.001 0.000 cast.py:1232(construct_1d_ndarray_preserving_na)\n", + " 297 0.001 0.000 0.008 0.000 cast.py:257(maybe_promote)\n", + " 495 0.001 0.000 0.001 0.000 cast.py:853(maybe_castable)\n", + " 99 0.001 0.000 0.002 0.000 cast.py:867(maybe_infer_to_datetimelike)\n", + " 297 0.002 0.000 0.008 0.000 cast.py:971(maybe_cast_to_datetime)\n", + " 99 0.000 0.000 0.001 0.000 common.py:100(is_bool_indexer)\n", + " 99 0.000 0.000 0.001 0.000 common.py:1043(is_datetime64_any_dtype)\n", + " 198 0.000 0.000 0.001 0.000 common.py:1170(is_datetime_or_timedelta_dtype)\n", + " 4257 0.001 0.000 0.010 0.000 common.py:122(is_sparse)\n", + " 99 0.000 0.000 0.001 0.000 common.py:1405(needs_i8_conversion)\n", + " 198 0.000 0.000 0.000 0.000 common.py:1527(is_float_dtype)\n", + " 396 0.000 0.000 0.001 0.000 common.py:1578(is_bool_dtype)\n", + " 3267 0.003 0.000 0.028 0.000 common.py:1629(is_extension_type)\n", + " 1386 0.002 0.000 0.008 0.000 common.py:1688(is_extension_array_dtype)\n", + " 1089 0.001 0.000 0.001 0.000 common.py:1784(_get_dtype)\n", + " 1001286 0.431 0.000 0.562 0.000 common.py:1835(_get_dtype_type)\n", + " 3564 0.002 0.000 0.011 0.000 common.py:195(is_categorical)\n", + " 396 0.002 0.000 0.002 0.000 common.py:1965(pandas_dtype)\n", + " 4752 0.002 0.000 0.015 0.000 common.py:227(is_datetimetz)\n", + " 594 0.001 0.000 0.002 0.000 common.py:332(is_datetime64_dtype)\n", + " 5148 0.002 0.000 0.008 0.000 common.py:369(is_datetime64tz_dtype)\n", + " 998613 0.237 0.000 0.333 0.000 common.py:395(_apply_if_callable)\n", + " 396 0.000 0.000 0.001 0.000 common.py:407(is_timedelta64_dtype)\n", + " 99 0.000 0.000 0.000 0.000 common.py:444(is_period_dtype)\n", + " 693 0.000 0.000 0.002 0.000 common.py:477(is_interval_dtype)\n", + " 3960 0.002 0.000 0.006 0.000 common.py:513(is_categorical_dtype)\n", + " 99 0.000 0.000 0.000 0.000 common.py:546(is_string_dtype)\n", + " 495 0.000 0.000 0.001 0.000 common.py:692(is_dtype_equal)\n", + " 99 0.000 0.000 0.000 0.000 common.py:811(is_integer_dtype)\n", + " 99 0.000 0.000 0.000 0.000 common.py:858(is_signed_integer_dtype)\n", + " 999306 0.563 0.000 1.258 0.000 common.py:89(is_object_dtype)\n", + " 99 0.000 0.000 0.000 0.000 common.py:995(is_int_or_datetime_dtype)\n", + " 198 0.000 0.000 0.000 0.000 dtypes.py:266(construct_from_string)\n", + " 99 0.000 0.000 0.001 0.000 dtypes.py:401(__new__)\n", + " 99 0.000 0.000 0.001 0.000 dtypes.py:459(construct_from_string)\n", + " 99 0.000 0.000 0.000 0.000 dtypes.py:584(is_dtype)\n", + " 594 0.001 0.000 0.002 0.000 dtypes.py:707(is_dtype)\n", + " 99 0.001 0.000 0.079 0.001 frame.py:2664(__getitem__)\n", + " 99 0.001 0.000 0.077 0.001 frame.py:2707(_getitem_array)\n", + " 99 0.000 0.000 0.000 0.000 frame.py:320(_constructor)\n", + " 99 0.000 0.000 0.001 0.000 frame.py:334(__init__)\n", + " 99 0.000 0.000 0.000 0.000 frame.py:555(shape)\n", + " 99 0.001 0.000 26.658 0.269 frame.py:5837(apply)\n", + " 99 0.000 0.000 0.000 0.000 frame.py:7047(_get_agg_axis)\n", + " 99 0.000 0.000 0.000 0.000 function.py:38(__call__)\n", + " 693 0.001 0.000 0.001 0.000 generic.py:124(__init__)\n", + " 99 0.000 0.000 0.001 0.000 generic.py:1490(__hash__)\n", + " 198 0.000 0.000 0.002 0.000 generic.py:164(_validate_dtype)\n", + " 99 0.000 0.000 0.000 0.000 generic.py:178(_init_mgr)\n", + " 99 0.000 0.000 0.000 0.000 generic.py:2603(_set_is_copy)\n", + " 99 0.001 0.000 0.068 0.001 generic.py:2783(_take)\n", + " 297 0.001 0.000 0.001 0.000 generic.py:364(_get_axis_number)\n", + " 297 0.001 0.000 0.001 0.000 generic.py:377(_get_axis_name)\n", + " 297 0.000 0.000 0.001 0.000 generic.py:390(_get_axis)\n", + " 99 0.000 0.000 0.001 0.000 generic.py:394(_get_block_manager_axis)\n", + " 297 0.000 0.000 0.001 0.000 generic.py:4345(__finalize__)\n", + " 999306 1.556 0.000 20.563 0.000 generic.py:4362(__getattr__)\n", + " 891 0.003 0.000 0.005 0.000 generic.py:4378(__setattr__)\n", + " 998613 0.367 0.000 0.633 0.000 generic.py:438(_info_axis)\n", + " 198 0.000 0.000 0.001 0.000 generic.py:4423(_protect_consolidate)\n", + " 198 0.000 0.000 0.001 0.000 generic.py:4433(_consolidate_inplace)\n", + " 198 0.000 0.000 0.001 0.000 generic.py:4436(f)\n", + " 99 0.000 0.000 0.192 0.002 generic.py:4563(values)\n", + " 99 0.001 0.000 0.019 0.000 generic.py:4765(dtypes)\n", + " 99 0.001 0.000 0.009 0.000 generic.py:4890(astype)\n", + " 1020096 0.331 0.000 0.509 0.000 generic.py:7(_check)\n", + " 99 0.000 0.000 0.010 0.000 generic.py:9675(logical_func)\n", + " 99 0.000 0.000 0.000 0.000 indexing.py:2321(convert_to_index_sliceable)\n", + " 99 0.000 0.000 0.003 0.000 indexing.py:2345(check_bool_indexer)\n", + " 99 0.002 0.000 0.004 0.000 indexing.py:2441(maybe_convert_indices)\n", + " 198 0.000 0.000 0.000 0.000 inference.py:119(is_iterator)\n", + " 891 0.001 0.000 0.005 0.000 inference.py:251(is_list_like)\n", + " 99 0.000 0.000 0.000 0.000 inference.py:364(is_dict_like)\n", + " 499356 0.156 0.000 0.266 0.000 inference.py:415(is_hashable)\n", + " 99 0.000 0.000 0.000 0.000 inspect.py:73(isclass)\n", + " 891 0.002 0.000 0.005 0.000 internals.py:116(__init__)\n", + " 297 0.001 0.000 0.034 0.000 internals.py:1237(take_nd)\n", + " 891 0.001 0.000 0.001 0.000 internals.py:127(_check_ndim)\n", + " 99 0.000 0.000 0.000 0.000 internals.py:184(is_categorical_astype)\n", + " 297 0.000 0.000 0.000 0.000 internals.py:199(external_values)\n", + " 998613 0.147 0.000 0.147 0.000 internals.py:203(internal_values)\n", + " 297 0.000 0.000 0.109 0.000 internals.py:213(get_values)\n", + " 998613 0.327 0.000 0.663 0.000 internals.py:222(to_dense)\n", + " 297 0.000 0.000 0.000 0.000 internals.py:229(fill_value)\n", + " 495 0.001 0.000 0.004 0.000 internals.py:2298(__init__)\n", + " 2178 0.001 0.000 0.001 0.000 internals.py:233(mgr_locs)\n", + " 891 0.001 0.000 0.001 0.000 internals.py:237(mgr_locs)\n", + " 396 0.000 0.000 0.003 0.000 internals.py:269(make_block_same_class)\n", + " 495 0.002 0.000 0.009 0.000 internals.py:3148(get_block_type)\n", + " 891 0.002 0.000 0.017 0.000 internals.py:3191(make_block)\n", + " 99 0.001 0.000 0.009 0.000 internals.py:3265(__init__)\n", + " 99 0.000 0.000 0.000 0.000 internals.py:3266()\n", + " 594 0.001 0.000 0.003 0.000 internals.py:3307(shape)\n", + " 1782 0.001 0.000 0.002 0.000 internals.py:3309()\n", + " 495 0.000 0.000 0.000 0.000 internals.py:3311(ndim)\n", + " 499257 0.527 0.000 1.341 0.000 internals.py:3315(set_axis)\n", + " 99 0.000 0.000 0.000 0.000 internals.py:3351(_is_single_block)\n", + " 99 0.002 0.000 0.005 0.000 internals.py:3363(_rebuild_blknos_and_blklocs)\n", + " 297 0.000 0.000 0.000 0.000 internals.py:3384(_get_items)\n", + " 99 0.000 0.000 0.005 0.000 internals.py:3404(get_dtypes)\n", + " 99 0.000 0.000 0.000 0.000 internals.py:3405()\n", + " 198 0.000 0.000 0.000 0.000 internals.py:3473(__len__)\n", + " 297 0.000 0.000 0.000 0.000 internals.py:348(shape)\n", + " 99 0.001 0.000 0.002 0.000 internals.py:3488(_verify_integrity)\n", + " 396 0.000 0.000 0.000 0.000 internals.py:3490()\n", + " 99 0.001 0.000 0.005 0.000 internals.py:3500(apply)\n", + " 1000098 0.218 0.000 0.218 0.000 internals.py:352(dtype)\n", + " 297 0.000 0.000 0.001 0.000 internals.py:356(ftype)\n", + " 99 0.000 0.000 0.000 0.000 internals.py:3561()\n", + " 99 0.000 0.000 0.005 0.000 internals.py:3713(astype)\n", + " 495 0.000 0.000 0.000 0.000 internals.py:3776(is_consolidated)\n", + " 99 0.000 0.000 0.001 0.000 internals.py:3784(_consolidate_check)\n", + " 99 0.000 0.000 0.001 0.000 internals.py:3785()\n", + " 99 0.000 0.000 0.000 0.000 internals.py:3789(is_mixed_type)\n", + " 99 0.000 0.000 0.191 0.002 internals.py:3922(as_array)\n", + " 99 0.056 0.001 0.190 0.002 internals.py:3953(_interleave)\n", + " 198 0.000 0.000 0.000 0.000 internals.py:4085(consolidate)\n", + " 297 0.000 0.000 0.000 0.000 internals.py:4101(_consolidate_inplace)\n", + " 99 0.001 0.000 0.044 0.000 internals.py:4388(reindex_indexer)\n", + " 99 0.000 0.000 0.034 0.000 internals.py:4423()\n", + " 99 0.001 0.000 0.062 0.001 internals.py:4518(take)\n", + " 594 0.002 0.000 0.017 0.000 internals.py:4639(__init__)\n", + " 3495591 0.682 0.000 0.682 0.000 internals.py:4684(_block)\n", + " 99 0.000 0.000 0.000 0.000 internals.py:4709(index)\n", + " 998811 0.529 0.000 0.956 0.000 internals.py:4718(dtype)\n", + " 297 0.000 0.000 0.001 0.000 internals.py:4742(external_values)\n", + " 998613 0.566 0.000 0.914 0.000 internals.py:4745(internal_values)\n", + " 998613 0.782 0.000 1.901 0.000 internals.py:4752(get_values)\n", + " 198 0.000 0.000 0.000 0.000 internals.py:4774(_consolidate_inplace)\n", + " 99 0.000 0.000 0.007 0.000 internals.py:5044(_interleaved_dtype)\n", + " 99 0.000 0.000 0.000 0.000 internals.py:5048()\n", + " 99 0.000 0.000 0.000 0.000 internals.py:5101(_extend_blocks)\n", + " 99 0.000 0.000 0.003 0.000 internals.py:573(astype)\n", + " 99 0.001 0.000 0.003 0.000 internals.py:577(_astype)\n", + " 99 0.000 0.000 0.001 0.000 internals.py:774(copy)\n", + " 99 0.000 0.000 0.004 0.000 missing.py:112(_isna_new)\n", + " 99 0.001 0.000 0.003 0.000 missing.py:189(_isna_ndarraylike)\n", + " 99 0.000 0.000 0.004 0.000 missing.py:32(isna)\n", + " 99 0.000 0.000 0.000 0.000 missing.py:376(array_equivalent)\n", + " 99 0.000 0.000 0.000 0.000 nanops.py:179(_get_fill_value)\n", + " 99 0.001 0.000 0.006 0.000 nanops.py:202(_get_values)\n", + " 99 0.000 0.000 0.000 0.000 nanops.py:256(_na_ok_dtype)\n", + " 99 0.000 0.000 0.000 0.000 nanops.py:260(_view_if_needed)\n", + " 99 0.000 0.000 0.007 0.000 nanops.py:318(nanany)\n", + " 99 0.000 0.000 0.000 0.000 numeric.py:110(is_all_dates)\n", + " 396 0.001 0.000 0.003 0.000 numeric.py:2491(seterr)\n", + " 396 0.001 0.000 0.001 0.000 numeric.py:2592(geterr)\n", + " 198 0.000 0.000 0.000 0.000 numeric.py:2887(__init__)\n", + " 198 0.000 0.000 0.002 0.000 numeric.py:2891(__enter__)\n", + " 198 0.000 0.000 0.001 0.000 numeric.py:2896(__exit__)\n", + " 99 0.000 0.000 0.002 0.000 numeric.py:35(__new__)\n", + " 693 0.000 0.000 0.003 0.000 numeric.py:433(asarray)\n", + " 99 0.000 0.000 0.001 0.000 numeric.py:504(asanyarray)\n", + " 99 0.000 0.000 0.008 0.000 numeric.py:64(_shallow_copy)\n", + " 99 0.000 0.000 0.000 0.000 numerictypes.py:1001()\n", + " 99 0.000 0.000 0.000 0.000 numerictypes.py:1002()\n", + " 198 0.002 0.000 0.003 0.000 numerictypes.py:927(_can_coerce_all)\n", + " 1881 0.001 0.000 0.001 0.000 numerictypes.py:936()\n", + " 99 0.000 0.000 0.003 0.000 numerictypes.py:950(find_common_type)\n", + " 99 0.000 0.000 0.009 0.000 range.py:260(_shallow_copy)\n", + " 198 0.000 0.000 0.001 0.000 range.py:315(equals)\n", + " 1287 0.001 0.000 0.002 0.000 range.py:481(__len__)\n", + " 594 0.006 0.000 0.061 0.000 series.py:166(__init__)\n", + " 99 0.002 0.000 0.048 0.000 series.py:3069(apply)\n", + " 99 0.001 0.000 0.010 0.000 series.py:3203(_reduce)\n", + " 198 0.000 0.000 0.000 0.000 series.py:349(_constructor)\n", + " 499851 0.796 0.000 2.627 0.000 series.py:365(_set_axis)\n", + " 499851 0.255 0.000 0.255 0.000 series.py:391(_set_subtyp)\n", + " 792 0.001 0.000 0.002 0.000 series.py:401(name)\n", + " 495 0.003 0.000 0.022 0.000 series.py:4019(_sanitize_array)\n", + " 495 0.002 0.000 0.016 0.000 series.py:4036(_try_cast)\n", + " 500049 0.378 0.000 0.645 0.000 series.py:405(name)\n", + " 998811 0.448 0.000 1.404 0.000 series.py:412(dtype)\n", + " 297 0.000 0.000 0.001 0.000 series.py:432(values)\n", + " 998613 0.412 0.000 1.326 0.000 series.py:465(_values)\n", + " 998613 0.374 0.000 2.275 0.000 series.py:476(get_values)\n", + " 198 0.000 0.000 0.001 0.000 series.py:562(__len__)\n", + " 99 0.000 0.000 0.001 0.000 series.py:637(__array__)\n", + " 998514 1.469 0.000 15.215 0.000 series.py:764(__getitem__)\n", + " 99 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x9cff80}\n", + " 396 0.000 0.000 0.001 0.000 {built-in method builtins.all}\n", + " 198 0.000 0.000 0.001 0.000 {built-in method builtins.any}\n", + " 998613 0.096 0.000 0.096 0.000 {built-in method builtins.callable}\n", + " 1 0.000 0.000 26.802 26.802 {built-in method builtins.exec}\n", + " 4026825 1.271 0.000 2.597 0.000 {built-in method builtins.getattr}\n", + " 4950 0.002 0.000 0.002 0.000 {built-in method builtins.hasattr}\n", + " 1497969 0.245 0.000 0.246 0.000 {built-in method builtins.hash}\n", + " 4047912 0.987 0.000 1.499 0.000 {built-in method builtins.isinstance}\n", + " 1006434 0.136 0.000 0.136 0.000 {built-in method builtins.issubclass}\n", + " 99 0.000 0.000 0.000 0.000 {built-in method builtins.iter}\n", + "4009005/2009007 0.842 0.000 1.390 0.000 {built-in method builtins.len}\n", + " 1287 0.001 0.000 0.001 0.000 {built-in method builtins.max}\n", + " 99 0.000 0.000 0.000 0.000 {built-in method builtins.sum}\n", + " 297 0.001 0.000 0.001 0.000 {built-in method numpy.core.multiarray.arange}\n", + "1000098/999999 0.284 0.000 0.285 0.000 {built-in method numpy.core.multiarray.array}\n", + " 792 0.017 0.000 0.017 0.000 {built-in method numpy.core.multiarray.empty}\n", + " 99 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.putmask}\n", + " 99 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.zeros}\n", + " 792 0.000 0.000 0.000 0.000 {built-in method numpy.core.umath.geterrobj}\n", + " 396 0.000 0.000 0.000 0.000 {built-in method numpy.core.umath.seterrobj}\n", + " 396 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_int64}\n", + " 100 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_object}\n", + " 99 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_platform_int}\n", + " 99 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.infer_datetimelike_array}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_datetime_array}\n", + " 998811 0.145 0.000 0.145 0.000 {built-in method pandas._libs.lib.is_float}\n", + " 297 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_integer}\n", + " 998613 0.128 0.000 0.128 0.000 {built-in method pandas._libs.lib.is_scalar}\n", + " 99 0.000 0.000 0.002 0.000 {method 'all' of 'numpy.ndarray' objects}\n", + " 594 0.000 0.000 0.004 0.000 {method 'any' of 'numpy.ndarray' objects}\n", + " 99 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}\n", + " 297 0.108 0.000 0.108 0.000 {method 'astype' of 'numpy.ndarray' objects}\n", + " 198 0.000 0.000 0.000 0.000 {method 'copy' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}\n", + " 198 0.000 0.000 0.000 0.000 {method 'fill' of 'numpy.ndarray' objects}\n", + " 396 0.001 0.000 0.001 0.000 {method 'format' of 'str' objects}\n", + " 990 0.001 0.000 0.001 0.000 {method 'get' of 'dict' objects}\n", + " 998514 1.023 0.000 1.023 0.000 {method 'get_value' of 'pandas._libs.index.IndexEngine' objects}\n", + " 198 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}\n", + " 99 0.002 0.000 0.002 0.000 {method 'nonzero' of 'numpy.ndarray' objects}\n", + " 297 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects}\n", + " 693 0.005 0.000 0.005 0.000 {method 'reduce' of 'numpy.ufunc' objects}\n", + " 198 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects}\n", + " 99 0.000 0.000 0.000 0.000 {method 'search' of '_sre.SRE_Pattern' objects}\n", + " 99 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}\n", + " 99 0.000 0.000 0.000 0.000 {method 'take' of 'numpy.ndarray' objects}\n", + " 99 0.000 0.000 0.000 0.000 {method 'transpose' of 'numpy.ndarray' objects}\n", + " 99 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}\n", + " 999109 0.336 0.000 0.336 0.000 {method 'view' of 'numpy.ndarray' objects}\n", + " 99 0.001 0.000 0.001 0.000 {pandas._libs.algos.take_1d_object_object}\n", + " 99 0.003 0.000 0.003 0.000 {pandas._libs.algos.take_2d_axis1_float64_float64}\n", + " 99 0.001 0.000 0.001 0.000 {pandas._libs.algos.take_2d_axis1_int64_int64}\n", + " 99 0.006 0.000 0.006 0.000 {pandas._libs.algos.take_2d_axis1_object_object}\n", + " 99 0.004 0.000 0.028 0.000 {pandas._libs.lib.map_infer}\n", + " 1997325 0.915 0.000 3.190 0.000 {pandas._libs.lib.values_from_object}\n", + " 99 1.553 0.016 26.354 0.266 {pandas._libs.reduction.reduce}\n", + "\n", + "\n" + ] + } + ], + "source": [ + "import cProfile\n", + "import pandas as pd\n", + "movies = pd.read_csv('./csv/movie_metadata.csv')\n", + "movies['rat'] = movies.content_rating.astype(str)\n", + "\n", + "\n", + "def before(movies):\n", + " for i in range(1, 100):\n", + " is_part = []\n", + " title_list = movies['movie_title']\n", + " genre_list = movies['content_rating']\n", + " for i, genre in enumerate(genre_list):\n", + " if str(genre) in title_list[i]:\n", + " is_part.append(True)\n", + " else:\n", + " is_part.append(False)\n", + " movies[is_part]\n", + "\n", + "\n", + "def after(movies):\n", + " for i in range(1, 100):\n", + " movies[movies.apply(lambda x: x.rat in x.movie_title, axis=1)]\n", + "\n", + "\n", + "cProfile.run(\"before(movies)\")\n", + "cProfile.run(\"after(movies)\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} From 18c5f6c1c7bc1d90bc486894300778a2af9ddc76 Mon Sep 17 00:00:00 2001 From: softhints Date: Tue, 5 Feb 2019 16:58:05 +0200 Subject: [PATCH 05/76] Scrape wiki tables with pandas and python --- ...e wiki tables with pandas and python.ipynb | 1506 +++++++++++++++++ 1 file changed, 1506 insertions(+) create mode 100644 notebooks/Scrape wiki tables with pandas and python.ipynb diff --git a/notebooks/Scrape wiki tables with pandas and python.ipynb b/notebooks/Scrape wiki tables with pandas and python.ipynb new file mode 100644 index 0000000..47577ae --- /dev/null +++ b/notebooks/Scrape wiki tables with pandas and python.ipynb @@ -0,0 +1,1506 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use only pandas\n", + "\n", + "source: https://gist.github.com/aculich/b34868c098d94d614515\n", + "\n", + "\n", + "Installation requirements\n", + "\n", + "```\n", + "pip install pandas\n", + "pip install lxml\n", + "pip install html5lib\n", + "pip install BeautifulSoup4```\n", + "\n", + "or \n", + "\n", + "```conda install html5lib ```\n", + "\n", + "or \n", + "\n", + "```easy_install html5lib```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Read Wiki Tables" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracted 1 wikitables\n" + ] + } + ], + "source": [ + "# extract tables from wikipedia\n", + "from pandas.io.html import read_html\n", + "page = 'https://en.wikipedia.org/wiki/List_of_Asian_countries_by_area'\n", + "\n", + "wikitables = read_html(page, attrs={\"class\":\"wikitable\"})\n", + "\n", + "print (\"Extracted {num} wikitables\".format(num=len(wikitables)))" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
0RankCountryArea (km²)NotesNaN
11Russia*1310000017,125,200 including European partNaN
22China9596961excludes Hong Kong, Macau, Taiwan and disputed...NaN
33India3287263NaNNaN
44Kazakhstan*24550342,724,902 km² including European partNaN
\n", + "
" + ], + "text/plain": [ + " 0 1 2 \\\n", + "0 Rank Country Area (km²) \n", + "1 1 Russia* 13100000 \n", + "2 2 China 9596961 \n", + "3 3 India 3287263 \n", + "4 4 Kazakhstan* 2455034 \n", + "\n", + " 3 4 \n", + "0 Notes NaN \n", + "1 17,125,200 including European part NaN \n", + "2 excludes Hong Kong, Macau, Taiwan and disputed... NaN \n", + "3 NaN NaN \n", + "4 2,724,902 km² including European part NaN " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "wikitables[0].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracted 2 wikitables\n" + ] + } + ], + "source": [ + "# extract several tables from wikipedia from a single page\n", + "from pandas.io.html import read_html\n", + "page = 'https://en.wikipedia.org/wiki/List_of_UFC_events'\n", + "\n", + "wikitables = read_html(page, index_col=0, attrs={\"class\":\"wikitable\"})\n", + "\n", + "print (\"Extracted {num} wikitables\".format(num=len(wikitables)))" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
1234
0
EventDateVenueLocationRef.
UFC on ESPN 4Jun 29, 2019TBATBA[9]
UFC on ESPN+ 11Jun 22, 2019TBATBA[9]
UFC 238Jun 8, 2019TBATBA[9]
UFC on ESPN+ 10Jun 1, 2019TBATBA[9]
\n", + "
" + ], + "text/plain": [ + " 1 2 3 4\n", + "0 \n", + "Event Date Venue Location Ref.\n", + "UFC on ESPN 4 Jun 29, 2019 TBA TBA [9]\n", + "UFC on ESPN+ 11 Jun 22, 2019 TBA TBA [9]\n", + "UFC 238 Jun 8, 2019 TBA TBA [9]\n", + "UFC on ESPN+ 10 Jun 1, 2019 TBA TBA [9]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "wikitables[0].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(470, 6)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "wikitables[1].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
123456
0
005UFC 5: The Return of the BeastApr 7, 1995Independence ArenaCharlotte, North Carolina, U.S.6000[423]
004UFC 4: Revenge of the WarriorsDec 16, 1994Expo Square PavilionTulsa, Oklahoma, U.S.5857[424]
003UFC 3: The American DreamSep 9, 1994Grady Cole CenterCharlotte, North Carolina, U.S.NaNNaN
002UFC 2: No Way OutMar 11, 1994Mammoth GardensDenver, Colorado, U.S.2000[425]
001UFC 1: The BeginningNov 12, 1993McNichols Sports ArenaDenver, Colorado, U.S.7800[426]
\n", + "
" + ], + "text/plain": [ + " 1 2 3 \\\n", + "0 \n", + "005 UFC 5: The Return of the Beast Apr 7, 1995 Independence Arena \n", + "004 UFC 4: Revenge of the Warriors Dec 16, 1994 Expo Square Pavilion \n", + "003 UFC 3: The American Dream Sep 9, 1994 Grady Cole Center \n", + "002 UFC 2: No Way Out Mar 11, 1994 Mammoth Gardens \n", + "001 UFC 1: The Beginning Nov 12, 1993 McNichols Sports Arena \n", + "\n", + " 4 5 6 \n", + "0 \n", + "005 Charlotte, North Carolina, U.S. 6000 [423] \n", + "004 Tulsa, Oklahoma, U.S. 5857 [424] \n", + "003 Charlotte, North Carolina, U.S. NaN NaN \n", + "002 Denver, Colorado, U.S. 2000 [425] \n", + "001 Denver, Colorado, U.S. 7800 [426] " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "wikitables[1].tail()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
1234
0
EventDateVenueLocationRef.
UFC on ESPN 4Jun 29, 2019TBATBA[9]
UFC on ESPN+ 11Jun 22, 2019TBATBA[9]
UFC 238Jun 8, 2019TBATBA[9]
UFC on ESPN+ 10Jun 1, 2019TBATBA[9]
\n", + "
" + ], + "text/plain": [ + " 1 2 3 4\n", + "0 \n", + "Event Date Venue Location Ref.\n", + "UFC on ESPN 4 Jun 29, 2019 TBA TBA [9]\n", + "UFC on ESPN+ 11 Jun 22, 2019 TBA TBA [9]\n", + "UFC 238 Jun 8, 2019 TBA TBA [9]\n", + "UFC on ESPN+ 10 Jun 1, 2019 TBA TBA [9]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "wikitables[0].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracted 1 wikitables\n" + ] + } + ], + "source": [ + "# change the index table\n", + "from pandas.io.html import read_html\n", + "page = 'https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)'\n", + "\n", + "wikitables = read_html(page, index_col=1, attrs={\"class\":\"wikitable\"})\n", + "\n", + "print (\"Extracted {num} wikitables\".format(num=len(wikitables)))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
023456
1
Country or areaRankUN continentalregion[2]UN statisticalregion[2]Population(1 July 2016)[3]Population(1 July 2017)[3]Change
World74669642807550262101+1.1%
China[a]1AsiaEastern Asia14035003651409517397+0.4%
India2AsiaSouthern Asia13241713541339180127+1.1%
United States3AmericasNorthern America322179605324459463+0.7%
\n", + "
" + ], + "text/plain": [ + " 0 2 3 \\\n", + "1 \n", + "Country or area Rank UN continentalregion[2] UN statisticalregion[2] \n", + "World — — — \n", + "China[a] 1 Asia Eastern Asia \n", + "India 2 Asia Southern Asia \n", + "United States 3 Americas Northern America \n", + "\n", + " 4 5 \\\n", + "1 \n", + "Country or area Population(1 July 2016)[3] Population(1 July 2017)[3] \n", + "World 7466964280 7550262101 \n", + "China[a] 1403500365 1409517397 \n", + "India 1324171354 1339180127 \n", + "United States 322179605 324459463 \n", + "\n", + " 6 \n", + "1 \n", + "Country or area Change \n", + "World +1.1% \n", + "China[a] +0.4% \n", + "India +1.1% \n", + "United States +0.7% " + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "wikitables[0].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracted 1 wikitables\n" + ] + } + ], + "source": [ + "# works with different languages ( option encoding is available if needed)\n", + "from pandas.io.html import read_html\n", + "page = 'https://zh.wikipedia.org/wiki/%E4%B8%96%E7%95%8C%E5%9B%BD%E5%AE%B6%E5%92%8C%E5%9C%B0%E5%8C%BA%E4%BA%BA%E5%8F%A3%E6%8E%92%E5%90%8D%E5%88%97%E8%A1%A8'\n", + "\n", + "wikitables = read_html(page, index_col=0, attrs={\"class\":\"wikitable\"})\n", + "\n", + "print (\"Extracted {num} wikitables\".format(num=len(wikitables)))" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
123456
0
排名国家或者地区大洲[2]統計地區[2]人口(2016年7月1日)[3]人口(2017年7月1日)[3]变化率
世界74669642807550262101+1.1%
1中华人民共和国[a]亚洲东亚14035003651409517397+0.4%
2印度亚洲南亚13241713541339180127+1.1%
3美國美洲北美322179605324459463+0.7%
\n", + "
" + ], + "text/plain": [ + " 1 2 3 4 5 6\n", + "0 \n", + "排名 国家或者地区 大洲[2] 統計地區[2] 人口(2016年7月1日)[3] 人口(2017年7月1日)[3] 变化率\n", + "— 世界 — — 7466964280 7550262101 +1.1%\n", + "1 中华人民共和国[a] 亚洲 东亚 1403500365 1409517397 +0.4%\n", + "2 印度 亚洲 南亚 1324171354 1339180127 +1.1%\n", + "3 美國 美洲 北美 322179605 324459463 +0.7%" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "wikitables[0].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Read wiki Infoboxes" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracted 1 infoboxes\n", + "Extracted 4 wikitables\n" + ] + } + ], + "source": [ + "from pandas.io.html import read_html\n", + "page = 'https://en.wikipedia.org/wiki/University_of_California,_Berkeley'\n", + "infoboxes = read_html(page, index_col=0, attrs={\"class\":\"infobox\"})\n", + "wikitables = read_html(page, index_col=0, attrs={\"class\":\"wikitable\"})\n", + "\n", + "print (\"Extracted {num} infoboxes\".format(num=len(infoboxes)))\n", + "print (\"Extracted {num} wikitables\".format(num=len(wikitables)))" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
1
0
University rankingsNaN
NationalNaN
ARWU[106]4.0
Forbes[107]14.0
U.S. News & World Report[108]22.0
Washington Monthly[109]7.0
GlobalNaN
ARWU[110]5.0
QS[111]27.0
Times[112]15.0
U.S. News & World Report[113]4.0
\n", + "
" + ], + "text/plain": [ + " 1\n", + "0 \n", + "University rankings NaN\n", + "National NaN\n", + "ARWU[106] 4.0\n", + "Forbes[107] 14.0\n", + "U.S. News & World Report[108] 22.0\n", + "Washington Monthly[109] 7.0\n", + "Global NaN\n", + "ARWU[110] 5.0\n", + "QS[111] 27.0\n", + "Times[112] 15.0\n", + "U.S. News & World Report[113] 4.0" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "infoboxes[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracted 1 infoboxes\n", + "Extracted 1 wikitables\n" + ] + } + ], + "source": [ + "from pandas.io.html import read_html\n", + "page = 'https://en.wikipedia.org/wiki/Lisbon'\n", + "infoboxes = read_html(page, index_col=0, attrs={\"class\":\"infobox geography vcard\"})\n", + "wikitables = read_html(page, index_col=0, attrs={\"class\":\"wikitable\"})\n", + "\n", + "print (\"Extracted {num} infoboxes\".format(num=len(infoboxes)))\n", + "print (\"Extracted {num} wikitables\".format(num=len(wikitables)))" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
12
0
CountryPortugalNaN
NUTS II RegionLisbon metropolitan areaNaN
NUTS III SubregionLisbon metropolitan areaNaN
DistrictLisbonNaN
MunicipalityLisbonNaN
SettlementPrior to Roman ruleNaN
Cityc. 1256NaN
Civil parishes(see text)NaN
GovernmentNaNNaN
• TypeLAUNaN
\n", + "
" + ], + "text/plain": [ + " 1 2\n", + "0 \n", + "Country Portugal NaN\n", + "NUTS II Region Lisbon metropolitan area NaN\n", + "NUTS III Subregion Lisbon metropolitan area NaN\n", + "District Lisbon NaN\n", + "Municipality Lisbon NaN\n", + "Settlement Prior to Roman rule NaN\n", + "City c. 1256 NaN\n", + "Civil parishes (see text) NaN\n", + "Government NaN NaN\n", + "• Type LAU NaN" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "infoboxes[0][10:20]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Scrape non wiki tables" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
1234567
0
NaNPlayer IDPlayer NameTotal (Overall)NaNHighest Paying GameTotal (Game)% of Total
1.0KuroKyKuro Takhasomi$4,136,926.95|Dota 2$4,135,203.6199.96%
2.0N0tailJohan Sundstein$3,742,055.59|Dota 2$3,733,903.9899.78%
3.0Miracle-Amer Al-Barkawi$3,701,337.28|Dota 2$3,701,337.28100.00%
4.0MinD_ContRoLIvan Ivanov$3,492,411.76|Dota 2$3,492,411.76100.00%
5.0MatumbamanLasse Urpalainen$3,476,116.04|Dota 2$3,476,116.04100.00%
6.0JerAxJesse Vainikka$3,313,463.82|Dota 2$3,313,463.82100.00%
7.0SumaiLSumail Hassan$3,305,914.94|Dota 2$3,305,914.94100.00%
8.0GHMaroun Merhej$3,095,344.84|Dota 2$3,095,344.84100.00%
9.0UNiVeRsESaahil Arora$3,035,737.67|Dota 2$3,035,737.67100.00%
\n", + "
" + ], + "text/plain": [ + " 1 2 3 4 \\\n", + "0 \n", + "NaN Player ID Player Name Total (Overall) NaN \n", + " 1.0 KuroKy Kuro Takhasomi $4,136,926.95 | \n", + " 2.0 N0tail Johan Sundstein $3,742,055.59 | \n", + " 3.0 Miracle- Amer Al-Barkawi $3,701,337.28 | \n", + " 4.0 MinD_ContRoL Ivan Ivanov $3,492,411.76 | \n", + " 5.0 Matumbaman Lasse Urpalainen $3,476,116.04 | \n", + " 6.0 JerAx Jesse Vainikka $3,313,463.82 | \n", + " 7.0 SumaiL Sumail Hassan $3,305,914.94 | \n", + " 8.0 GH Maroun Merhej $3,095,344.84 | \n", + " 9.0 UNiVeRsE Saahil Arora $3,035,737.67 | \n", + "\n", + " 5 6 7 \n", + "0 \n", + "NaN Highest Paying Game Total (Game) % of Total \n", + " 1.0 Dota 2 $4,135,203.61 99.96% \n", + " 2.0 Dota 2 $3,733,903.98 99.78% \n", + " 3.0 Dota 2 $3,701,337.28 100.00% \n", + " 4.0 Dota 2 $3,492,411.76 100.00% \n", + " 5.0 Dota 2 $3,476,116.04 100.00% \n", + " 6.0 Dota 2 $3,313,463.82 100.00% \n", + " 7.0 Dota 2 $3,305,914.94 100.00% \n", + " 8.0 Dota 2 $3,095,344.84 100.00% \n", + " 9.0 Dota 2 $3,035,737.67 100.00% " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from pandas.io.html import read_html\n", + "page = 'https://www.esportsearnings.com/players'\n", + "infoboxes = read_html(page, index_col=0, attrs={\"class\":\"detail_list_table\"})\n", + "\n", + "infoboxes[0].head(10)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Convert html tables to csv/excel" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "from pandas.io.html import read_html\n", + "page = 'https://www.esportsearnings.com/players'\n", + "infoboxes = read_html(page, index_col=0, attrs={\"class\":\"detail_list_table\"})\n", + "\n", + "file_name = './my_file.csv'\n", + "infoboxes[0].to_csv(file_name, sep='\\t')\n" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./my_file.csv\r\n", + "./csv/movie_metadata.csv\r\n" + ] + } + ], + "source": [ + "!find . -type f -name \"*.csv\" " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Web Scraping Wikipedia Tables using BeautifulSoup and Python\n", + "\n", + "source: https://github.com/stewync/Web-Scraping-Wiki-tables-using-BeautifulSoup-and-Python/blob/master/Scraping%2BWiki%2Btable%2Busing%2BPython%2Band%2BBeautifulSoup.ipynb" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "website_url = requests.get('https://en.wikipedia.org/wiki/List_of_UFC_events').text" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "soup = BeautifulSoup(website_url,'lxml')" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "My_table = soup.find('table',{'class':'wikitable'})" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "links = My_table.findAll('a')" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "events = []\n", + "for link in links:\n", + " events.append(link.get('title')) " + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
events
0UFC on ESPN 4
1TBA
2None
3UFC on ESPN+ 11
4TBA
\n", + "
" + ], + "text/plain": [ + " events\n", + "0 UFC on ESPN 4\n", + "1 TBA\n", + "2 None\n", + "3 UFC on ESPN+ 11\n", + "4 TBA" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "df = pd.DataFrame()\n", + "df['events'] = events\n", + "\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Other" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### wiki-table-scrape\n", + "https://github.com/rocheio/wiki-table-scrape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Scraping Wikipedia Tables with Python\n", + "https://roche.io/2016/05/scrape-wikipedia-with-python" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 139859a4f2dfddb82f40667d6000c5c883fb961a Mon Sep 17 00:00:00 2001 From: softhints Date: Wed, 6 Feb 2019 13:55:02 +0200 Subject: [PATCH 06/76] Pandas is column is contained in another column in the same row --- ...e wiki tables with pandas and python.ipynb | 1223 ++++++++++++++++- 1 file changed, 1193 insertions(+), 30 deletions(-) diff --git a/notebooks/Scrape wiki tables with pandas and python.ipynb b/notebooks/Scrape wiki tables with pandas and python.ipynb index 47577ae..9c5c5b0 100644 --- a/notebooks/Scrape wiki tables with pandas and python.ipynb +++ b/notebooks/Scrape wiki tables with pandas and python.ipynb @@ -130,24 +130,504 @@ " 2,724,902 km² including European part\n", " NaN\n", " \n", + " \n", + " 5\n", + " 5\n", + " Saudi Arabia\n", + " 2149690\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 6\n", + " 6\n", + " Iran\n", + " 1648195\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 7\n", + " 7\n", + " Mongolia\n", + " 1564110\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 8\n", + " 8\n", + " Indonesia*\n", + " 1472639\n", + " 1,904,569 km² including Oceanian part\n", + " NaN\n", + " \n", + " \n", + " 9\n", + " 9\n", + " Pakistan\n", + " 796095\n", + " 882,363 km² including Gilgit-Baltistan and AJK\n", + " NaN\n", + " \n", + " \n", + " 10\n", + " 10\n", + " Turkey*\n", + " 747272\n", + " 783,562 km² including European part\n", + " NaN\n", + " \n", + " \n", + " 11\n", + " 11\n", + " Myanmar\n", + " 676578\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 12\n", + " 12\n", + " Afghanistan\n", + " 652230\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 13\n", + " 13\n", + " Yemen\n", + " 527968\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 14\n", + " 14\n", + " Thailand\n", + " 513120\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 15\n", + " 15\n", + " Turkmenistan\n", + " 488100\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 16\n", + " 16\n", + " Uzbekistan\n", + " 447400\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 17\n", + " 17\n", + " Iraq\n", + " 438317\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 18\n", + " 18\n", + " Japan\n", + " 377930\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 19\n", + " 19\n", + " Vietnam\n", + " 331212\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 20\n", + " 20\n", + " Malaysia\n", + " 330803\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 21\n", + " 21\n", + " Oman\n", + " 309500\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 22\n", + " 22\n", + " Philippines\n", + " 300000\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 23\n", + " 23\n", + " Laos\n", + " 236800\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 24\n", + " 24\n", + " Kyrgyzstan\n", + " 199951\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 25\n", + " 25\n", + " Syria\n", + " 185180\n", + " Includes the parts of the Golan Heights\n", + " NaN\n", + " \n", + " \n", + " 26\n", + " 26\n", + " Cambodia\n", + " 181035\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 27\n", + " 27\n", + " Bangladesh\n", + " 147570\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 28\n", + " 28\n", + " Nepal\n", + " 147181\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 29\n", + " 29\n", + " Tajikistan\n", + " 143100\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 30\n", + " 30\n", + " North Korea\n", + " 120538\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 31\n", + " 31\n", + " South Korea\n", + " 100210\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 32\n", + " 32\n", + " Jordan\n", + " 89342\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 33\n", + " 33\n", + " Azerbaijan*\n", + " 86600\n", + " Sometimes considered part of Europe\n", + " NaN\n", + " \n", + " \n", + " 34\n", + " 34\n", + " United Arab Emirates\n", + " 83600\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 35\n", + " 35\n", + " Georgia*\n", + " 69000\n", + " Sometimes considered part of Europe\n", + " NaN\n", + " \n", + " \n", + " 36\n", + " 36\n", + " Sri Lanka\n", + " 65610\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 37\n", + " 37\n", + " Egypt*\n", + " 60000\n", + " 1,002,450 km² including African part\n", + " NaN\n", + " \n", + " \n", + " 38\n", + " 38\n", + " Bhutan\n", + " 38394\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 39\n", + " 39\n", + " Taiwan\n", + " 36193\n", + " excludes Hong Kong, Macau, Mainland China and ...\n", + " NaN\n", + " \n", + " \n", + " 40\n", + " 40\n", + " Armenia*\n", + " 29843\n", + " Sometimes considered part of Europe\n", + " NaN\n", + " \n", + " \n", + " 41\n", + " 41\n", + " Israel\n", + " 20273\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 42\n", + " 42\n", + " Kuwait\n", + " 17818\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 43\n", + " 43\n", + " Timor-Leste\n", + " 14874\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 44\n", + " 44\n", + " Qatar\n", + " 11586\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 45\n", + " 45\n", + " Lebanon\n", + " 10452\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 46\n", + " 46\n", + " Cyprus\n", + " 9251\n", + " 5,896 km² excluding Northern Cyprus. Political...\n", + " NaN\n", + " \n", + " \n", + " 47\n", + " 47\n", + " Palestine\n", + " 6220\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 48\n", + " 48\n", + " Brunei\n", + " 5765\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 49\n", + " 49\n", + " Bahrain\n", + " 765\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 50\n", + " 50\n", + " Singapore\n", + " 716\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 51\n", + " 51\n", + " Maldives\n", + " 300\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 52\n", + " NaN\n", + " Total\n", + " 44579000\n", + " NaN\n", + " NaN\n", + " \n", " \n", "\n", "" ], "text/plain": [ - " 0 1 2 \\\n", - "0 Rank Country Area (km²) \n", - "1 1 Russia* 13100000 \n", - "2 2 China 9596961 \n", - "3 3 India 3287263 \n", - "4 4 Kazakhstan* 2455034 \n", + " 0 1 2 \\\n", + "0 Rank Country Area (km²) \n", + "1 1 Russia* 13100000 \n", + "2 2 China 9596961 \n", + "3 3 India 3287263 \n", + "4 4 Kazakhstan* 2455034 \n", + "5 5 Saudi Arabia 2149690 \n", + "6 6 Iran 1648195 \n", + "7 7 Mongolia 1564110 \n", + "8 8 Indonesia* 1472639 \n", + "9 9 Pakistan 796095 \n", + "10 10 Turkey* 747272 \n", + "11 11 Myanmar 676578 \n", + "12 12 Afghanistan 652230 \n", + "13 13 Yemen 527968 \n", + "14 14 Thailand 513120 \n", + "15 15 Turkmenistan 488100 \n", + "16 16 Uzbekistan 447400 \n", + "17 17 Iraq 438317 \n", + "18 18 Japan 377930 \n", + "19 19 Vietnam 331212 \n", + "20 20 Malaysia 330803 \n", + "21 21 Oman 309500 \n", + "22 22 Philippines 300000 \n", + "23 23 Laos 236800 \n", + "24 24 Kyrgyzstan 199951 \n", + "25 25 Syria 185180 \n", + "26 26 Cambodia 181035 \n", + "27 27 Bangladesh 147570 \n", + "28 28 Nepal 147181 \n", + "29 29 Tajikistan 143100 \n", + "30 30 North Korea 120538 \n", + "31 31 South Korea 100210 \n", + "32 32 Jordan 89342 \n", + "33 33 Azerbaijan* 86600 \n", + "34 34 United Arab Emirates 83600 \n", + "35 35 Georgia* 69000 \n", + "36 36 Sri Lanka 65610 \n", + "37 37 Egypt* 60000 \n", + "38 38 Bhutan 38394 \n", + "39 39 Taiwan 36193 \n", + "40 40 Armenia* 29843 \n", + "41 41 Israel 20273 \n", + "42 42 Kuwait 17818 \n", + "43 43 Timor-Leste 14874 \n", + "44 44 Qatar 11586 \n", + "45 45 Lebanon 10452 \n", + "46 46 Cyprus 9251 \n", + "47 47 Palestine 6220 \n", + "48 48 Brunei 5765 \n", + "49 49 Bahrain 765 \n", + "50 50 Singapore 716 \n", + "51 51 Maldives 300 \n", + "52 NaN Total 44579000 \n", "\n", - " 3 4 \n", - "0 Notes NaN \n", - "1 17,125,200 including European part NaN \n", - "2 excludes Hong Kong, Macau, Taiwan and disputed... NaN \n", - "3 NaN NaN \n", - "4 2,724,902 km² including European part NaN " + " 3 4 \n", + "0 Notes NaN \n", + "1 17,125,200 including European part NaN \n", + "2 excludes Hong Kong, Macau, Taiwan and disputed... NaN \n", + "3 NaN NaN \n", + "4 2,724,902 km² including European part NaN \n", + "5 NaN NaN \n", + "6 NaN NaN \n", + "7 NaN NaN \n", + "8 1,904,569 km² including Oceanian part NaN \n", + "9 882,363 km² including Gilgit-Baltistan and AJK NaN \n", + "10 783,562 km² including European part NaN \n", + "11 NaN NaN \n", + "12 NaN NaN \n", + "13 NaN NaN \n", + "14 NaN NaN \n", + "15 NaN NaN \n", + "16 NaN NaN \n", + "17 NaN NaN \n", + "18 NaN NaN \n", + "19 NaN NaN \n", + "20 NaN NaN \n", + "21 NaN NaN \n", + "22 NaN NaN \n", + "23 NaN NaN \n", + "24 NaN NaN \n", + "25 Includes the parts of the Golan Heights NaN \n", + "26 NaN NaN \n", + "27 NaN NaN \n", + "28 NaN NaN \n", + "29 NaN NaN \n", + "30 NaN NaN \n", + "31 NaN NaN \n", + "32 NaN NaN \n", + "33 Sometimes considered part of Europe NaN \n", + "34 NaN NaN \n", + "35 Sometimes considered part of Europe NaN \n", + "36 NaN NaN \n", + "37 1,002,450 km² including African part NaN \n", + "38 NaN NaN \n", + "39 excludes Hong Kong, Macau, Mainland China and ... NaN \n", + "40 Sometimes considered part of Europe NaN \n", + "41 NaN NaN \n", + "42 NaN NaN \n", + "43 NaN NaN \n", + "44 NaN NaN \n", + "45 NaN NaN \n", + "46 5,896 km² excluding Northern Cyprus. Political... NaN \n", + "47 NaN NaN \n", + "48 NaN NaN \n", + "49 NaN NaN \n", + "50 NaN NaN \n", + "51 NaN NaN \n", + "52 NaN NaN " ] }, "execution_count": 2, @@ -156,7 +636,7 @@ } ], "source": [ - "wikitables[0].head()" + "wikitables[0]" ] }, { @@ -302,7 +782,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -345,6 +825,510 @@ " \n", " \n", " \n", + " #\n", + " Event\n", + " Date\n", + " Venue\n", + " Location\n", + " Attendance\n", + " Ref.\n", + " \n", + " \n", + " 465\n", + " UFC Fight Night: Assunção vs. Moraes 2\n", + " Feb 2, 2019\n", + " Centro de Formação Olímpica do Nordeste\n", + " Fortaleza, Brazil\n", + " 10040\n", + " [21]\n", + " \n", + " \n", + " –\n", + " UFC 233\n", + " Jan 26, 2019\n", + " Honda Center\n", + " Anaheim, California, U.S.\n", + " Cancelled\n", + " [22]\n", + " \n", + " \n", + " 464\n", + " UFC Fight Night: Cejudo vs. Dillashaw\n", + " Jan 19, 2019\n", + " Barclays Center\n", + " Brooklyn, New York, U.S.\n", + " 12152\n", + " [23]\n", + " \n", + " \n", + " 463\n", + " UFC 232: Jones vs. Gustafsson 2\n", + " Dec 29, 2018\n", + " The Forum\n", + " Inglewood, California, U.S.\n", + " 15862\n", + " [24]\n", + " \n", + " \n", + " 462\n", + " UFC on Fox: Lee vs. Iaquinta 2\n", + " Dec 15, 2018\n", + " Fiserv Forum\n", + " Milwaukee, Wisconsin, U.S.\n", + " 9010\n", + " [25]\n", + " \n", + " \n", + " 461\n", + " UFC 231: Holloway vs. Ortega\n", + " Dec 8, 2018\n", + " Scotiabank Arena\n", + " Toronto, Ontario, Canada\n", + " 19039\n", + " [26]\n", + " \n", + " \n", + " 460\n", + " UFC Fight Night: dos Santos vs. Tuivasa\n", + " Dec 2, 2018\n", + " Adelaide Entertainment Centre\n", + " Adelaide, Australia\n", + " 8652\n", + " [27]\n", + " \n", + " \n", + " 459\n", + " The Ultimate Fighter: Heavy Hitters Finale\n", + " Nov 30, 2018\n", + " Pearl Theatre\n", + " Las Vegas, Nevada, U.S.\n", + " 2020\n", + " [28]\n", + " \n", + " \n", + " 458\n", + " UFC Fight Night: Blaydes vs. Ngannou 2\n", + " Nov 24, 2018\n", + " Cadillac Arena\n", + " Beijing, China\n", + " 10302\n", + " [29]\n", + " \n", + " \n", + " 457\n", + " UFC Fight Night: Magny vs. Ponzinibbio\n", + " Nov 17, 2018\n", + " Estadio Mary Terán de Weiss\n", + " Buenos Aires, Argentina\n", + " 10245\n", + " [30]\n", + " \n", + " \n", + " 456\n", + " UFC Fight Night: Korean Zombie vs. Rodríguez\n", + " Nov 10, 2018\n", + " Pepsi Center\n", + " Denver, Colorado, U.S.\n", + " 11426\n", + " [31]\n", + " \n", + " \n", + " 455\n", + " UFC 230: Cormier vs. Lewis\n", + " Nov 3, 2018\n", + " Madison Square Garden\n", + " New York City, New York, U.S.\n", + " 17011\n", + " [32]\n", + " \n", + " \n", + " 454\n", + " UFC Fight Night: Volkan vs. Smith\n", + " Oct 27, 2018\n", + " Avenir Centre\n", + " Moncton, New Brunswick, Canada\n", + " 6282\n", + " [33]\n", + " \n", + " \n", + " 453\n", + " UFC 229: Khabib vs. McGregor\n", + " Oct 6, 2018\n", + " T-Mobile Arena\n", + " Las Vegas, Nevada, U.S.\n", + " 20034\n", + " [34]\n", + " \n", + " \n", + " 452\n", + " UFC Fight Night: Santos vs. Anders\n", + " Sep 22, 2018\n", + " Ginásio do Ibirapuera\n", + " São Paulo, Brazil\n", + " 9485\n", + " [35]\n", + " \n", + " \n", + " 451\n", + " UFC Fight Night: Hunt vs. Oleinik\n", + " Sep 15, 2018\n", + " Olimpiyskiy Stadium\n", + " Moscow, Russia\n", + " 22603\n", + " [36]\n", + " \n", + " \n", + " 450\n", + " UFC 228: Woodley vs. Till\n", + " Sep 8, 2018\n", + " American Airlines Center\n", + " Dallas, Texas, U.S.\n", + " 14073\n", + " [37]\n", + " \n", + " \n", + " 449\n", + " UFC Fight Night: Gaethje vs. Vick\n", + " Aug 25, 2018\n", + " Pinnacle Bank Arena\n", + " Lincoln, Nebraska, U.S.\n", + " 6409\n", + " [38]\n", + " \n", + " \n", + " 448\n", + " UFC 227: Dillashaw vs. Garbrandt 2\n", + " Aug 4, 2018\n", + " Staples Center\n", + " Los Angeles, California, U.S.\n", + " 17794\n", + " [39]\n", + " \n", + " \n", + " 447\n", + " UFC on Fox: Alvarez vs. Poirier 2\n", + " Jul 28, 2018\n", + " Scotiabank Saddledome\n", + " Calgary, Alberta, Canada\n", + " 10603\n", + " [40]\n", + " \n", + " \n", + " 446\n", + " UFC Fight Night: Shogun vs. Smith\n", + " Jul 22, 2018\n", + " Barclaycard Arena\n", + " Hamburg, Germany\n", + " 7798\n", + " [41]\n", + " \n", + " \n", + " 445\n", + " UFC Fight Night: dos Santos vs. Ivanov\n", + " Jul 14, 2018\n", + " CenturyLink Arena\n", + " Boise, Idaho, U.S.\n", + " 5648\n", + " [42]\n", + " \n", + " \n", + " 444\n", + " UFC 226: Miocic vs. Cormier\n", + " Jul 7, 2018\n", + " T-Mobile Arena\n", + " Las Vegas, Nevada, U.S.\n", + " 17464\n", + " [43]\n", + " \n", + " \n", + " 443\n", + " The Ultimate Fighter: Undefeated Finale\n", + " Jul 6, 2018\n", + " Palms Casino Resort\n", + " Las Vegas, Nevada, U.S.\n", + " 2123\n", + " [44]\n", + " \n", + " \n", + " 442\n", + " UFC Fight Night: Cowboy vs. Edwards\n", + " Jun 23, 2018\n", + " Singapore Indoor Stadium\n", + " Kallang, Singapore\n", + " 6419\n", + " [45]\n", + " \n", + " \n", + " 441\n", + " UFC 225: Whittaker vs. Romero 2\n", + " Jun 9, 2018\n", + " United Center\n", + " Chicago, Illinois, U.S.\n", + " 18117\n", + " [46]\n", + " \n", + " \n", + " 440\n", + " UFC Fight Night: Rivera vs. Moraes\n", + " Jun 1, 2018\n", + " Adirondack Bank Center\n", + " Utica, New York, U.S.\n", + " 5063\n", + " [47]\n", + " \n", + " \n", + " 439\n", + " UFC Fight Night: Thompson vs. Till\n", + " May 27, 2018\n", + " Echo Arena\n", + " Liverpool, England, U.K.\n", + " 8520\n", + " [48]\n", + " \n", + " \n", + " 438\n", + " UFC Fight Night: Maia vs. Usman\n", + " May 19, 2018\n", + " Movistar Arena\n", + " Santiago, Chile\n", + " 11082\n", + " [49]\n", + " \n", + " \n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " \n", + " \n", + " 030\n", + " UFC 26: Ultimate Field of Dreams\n", + " Jun 9, 2000\n", + " Five Seasons Events Center\n", + " Cedar Rapids, Iowa, U.S.\n", + " 1100\n", + " [409]\n", + " \n", + " \n", + " 029\n", + " UFC 25: Ultimate Japan 3\n", + " Apr 14, 2000\n", + " Yoyogi National Gymnasium\n", + " Tokyo, Japan\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 028\n", + " UFC 24: First Defense\n", + " Mar 10, 2000\n", + " Lake Charles Civic Center\n", + " Lake Charles, Louisiana, U.S.\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 027\n", + " UFC 23: Ultimate Japan 2\n", + " Nov 19, 1999\n", + " Tokyo Bay NK Hall\n", + " Chiba, Japan\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 026\n", + " UFC 22: Only One Can be Champion\n", + " Sep 24, 1999\n", + " Lake Charles Civic Center\n", + " Lake Charles, Louisiana, U.S.\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 025\n", + " UFC 21: Return of the Champions\n", + " Jul 16, 1999\n", + " Five Seasons Events Center\n", + " Cedar Rapids, Iowa, U.S.\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 024\n", + " UFC 20: Battle for the Gold\n", + " May 7, 1999\n", + " Boutwell Memorial Auditorium\n", + " Birmingham, Alabama, U.S.\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 023\n", + " UFC 19: Ultimate Young Guns\n", + " Mar 5, 1999\n", + " Casino Magic Bay St. Louis\n", + " Bay St. Louis, Mississippi, U.S.\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 022\n", + " UFC 18: The Road to the Heavyweight Title\n", + " Jan 8, 1999\n", + " Pontchartrain Center\n", + " New Orleans, Louisiana, U.S.\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 021\n", + " UFC Brazil: Ultimate Brazil\n", + " Oct 16, 1998\n", + " Ginásio da Portuguesa\n", + " São Paulo, Brazil\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 020\n", + " UFC 17: Redemption\n", + " May 15, 1998\n", + " Mobile Civic Center\n", + " Mobile, Alabama, U.S.\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 019\n", + " UFC 16: Battle in the Bayou\n", + " Mar 13, 1998\n", + " Pontchartrain Center\n", + " New Orleans, Louisiana, U.S.\n", + " 4600\n", + " [410]\n", + " \n", + " \n", + " 018\n", + " UFC Japan: Ultimate Japan\n", + " Dec 21, 1997\n", + " Yokohama Arena\n", + " Yokohama, Japan\n", + " 5000\n", + " [411]\n", + " \n", + " \n", + " 017\n", + " UFC 15: Collision Course\n", + " Oct 17, 1997\n", + " Casino Magic Bay St. Louis\n", + " Bay St. Louis, Mississippi, U.S.\n", + " NaN\n", + " NaN\n", + " \n", + " \n", + " 016\n", + " UFC 14: Showdown\n", + " Jul 27, 1997\n", + " Boutwell Memorial Auditorium\n", + " Birmingham, Alabama, U.S.\n", + " 5000\n", + " [412]\n", + " \n", + " \n", + " 015\n", + " UFC 13: The Ultimate Force\n", + " May 30, 1997\n", + " Augusta Civic Center\n", + " Augusta, Georgia, U.S.\n", + " 5100\n", + " [413]\n", + " \n", + " \n", + " 014\n", + " UFC 12: Judgement Day\n", + " Feb 7, 1997\n", + " Dothan Civic Center\n", + " Dothan, Alabama, U.S.\n", + " 3100\n", + " [414]\n", + " \n", + " \n", + " 013\n", + " UFC: The Ultimate Ultimate 2\n", + " Dec 7, 1996\n", + " Fair Park Arena\n", + " Birmingham, Alabama, U.S.\n", + " 6000\n", + " [415]\n", + " \n", + " \n", + " 012\n", + " UFC 11: The Proving Ground\n", + " Sep 20, 1996\n", + " Augusta Civic Center\n", + " Augusta, Georgia, U.S.\n", + " 4500\n", + " [416]\n", + " \n", + " \n", + " 011\n", + " UFC 10: The Tournament\n", + " Jul 12, 1996\n", + " Fair Park Arena\n", + " Birmingham, Alabama, U.S.\n", + " 4300\n", + " [417]\n", + " \n", + " \n", + " 010\n", + " UFC 9: Motor City Madness\n", + " May 17, 1996\n", + " Cobo Arena\n", + " Detroit, Michigan, U.S.\n", + " 10000\n", + " [418]\n", + " \n", + " \n", + " 009\n", + " UFC 8: David vs. Goliath\n", + " Feb 16, 1996\n", + " Coliseo Rubén Rodríguez\n", + " Bayamón, Puerto Rico\n", + " 13000\n", + " [419]\n", + " \n", + " \n", + " 008\n", + " UFC: The Ultimate Ultimate\n", + " Dec 16, 1995\n", + " Mammoth Gardens\n", + " Denver, Colorado, U.S.\n", + " 2800\n", + " [420]\n", + " \n", + " \n", + " 007\n", + " UFC 7: The Brawl in Buffalo\n", + " Sep 8, 1995\n", + " Buffalo Memorial Auditorium\n", + " Buffalo, New York, U.S.\n", + " 9000\n", + " [421]\n", + " \n", + " \n", + " 006\n", + " UFC 6: Clash of the Titans\n", + " Jul 14, 1995\n", + " Casper Events Center\n", + " Casper, Wyoming, U.S.\n", + " 2700\n", + " [422]\n", + " \n", + " \n", " 005\n", " UFC 5: The Return of the Beast\n", " Apr 7, 1995\n", @@ -391,33 +1375,212 @@ " \n", " \n", "\n", + "

470 rows × 6 columns

\n", "" ], "text/plain": [ - " 1 2 3 \\\n", - "0 \n", - "005 UFC 5: The Return of the Beast Apr 7, 1995 Independence Arena \n", - "004 UFC 4: Revenge of the Warriors Dec 16, 1994 Expo Square Pavilion \n", - "003 UFC 3: The American Dream Sep 9, 1994 Grady Cole Center \n", - "002 UFC 2: No Way Out Mar 11, 1994 Mammoth Gardens \n", - "001 UFC 1: The Beginning Nov 12, 1993 McNichols Sports Arena \n", + " 1 2 \\\n", + "0 \n", + "# Event Date \n", + "465 UFC Fight Night: Assunção vs. Moraes 2 Feb 2, 2019 \n", + "– UFC 233 Jan 26, 2019 \n", + "464 UFC Fight Night: Cejudo vs. Dillashaw Jan 19, 2019 \n", + "463 UFC 232: Jones vs. Gustafsson 2 Dec 29, 2018 \n", + "462 UFC on Fox: Lee vs. Iaquinta 2 Dec 15, 2018 \n", + "461 UFC 231: Holloway vs. Ortega Dec 8, 2018 \n", + "460 UFC Fight Night: dos Santos vs. Tuivasa Dec 2, 2018 \n", + "459 The Ultimate Fighter: Heavy Hitters Finale Nov 30, 2018 \n", + "458 UFC Fight Night: Blaydes vs. Ngannou 2 Nov 24, 2018 \n", + "457 UFC Fight Night: Magny vs. Ponzinibbio Nov 17, 2018 \n", + "456 UFC Fight Night: Korean Zombie vs. Rodríguez Nov 10, 2018 \n", + "455 UFC 230: Cormier vs. Lewis Nov 3, 2018 \n", + "454 UFC Fight Night: Volkan vs. Smith Oct 27, 2018 \n", + "453 UFC 229: Khabib vs. McGregor Oct 6, 2018 \n", + "452 UFC Fight Night: Santos vs. Anders Sep 22, 2018 \n", + "451 UFC Fight Night: Hunt vs. Oleinik Sep 15, 2018 \n", + "450 UFC 228: Woodley vs. Till Sep 8, 2018 \n", + "449 UFC Fight Night: Gaethje vs. Vick Aug 25, 2018 \n", + "448 UFC 227: Dillashaw vs. Garbrandt 2 Aug 4, 2018 \n", + "447 UFC on Fox: Alvarez vs. Poirier 2 Jul 28, 2018 \n", + "446 UFC Fight Night: Shogun vs. Smith Jul 22, 2018 \n", + "445 UFC Fight Night: dos Santos vs. Ivanov Jul 14, 2018 \n", + "444 UFC 226: Miocic vs. Cormier Jul 7, 2018 \n", + "443 The Ultimate Fighter: Undefeated Finale Jul 6, 2018 \n", + "442 UFC Fight Night: Cowboy vs. Edwards Jun 23, 2018 \n", + "441 UFC 225: Whittaker vs. Romero 2 Jun 9, 2018 \n", + "440 UFC Fight Night: Rivera vs. Moraes Jun 1, 2018 \n", + "439 UFC Fight Night: Thompson vs. Till May 27, 2018 \n", + "438 UFC Fight Night: Maia vs. Usman May 19, 2018 \n", + ".. ... ... \n", + "030 UFC 26: Ultimate Field of Dreams Jun 9, 2000 \n", + "029 UFC 25: Ultimate Japan 3 Apr 14, 2000 \n", + "028 UFC 24: First Defense Mar 10, 2000 \n", + "027 UFC 23: Ultimate Japan 2 Nov 19, 1999 \n", + "026 UFC 22: Only One Can be Champion Sep 24, 1999 \n", + "025 UFC 21: Return of the Champions Jul 16, 1999 \n", + "024 UFC 20: Battle for the Gold May 7, 1999 \n", + "023 UFC 19: Ultimate Young Guns Mar 5, 1999 \n", + "022 UFC 18: The Road to the Heavyweight Title Jan 8, 1999 \n", + "021 UFC Brazil: Ultimate Brazil Oct 16, 1998 \n", + "020 UFC 17: Redemption May 15, 1998 \n", + "019 UFC 16: Battle in the Bayou Mar 13, 1998 \n", + "018 UFC Japan: Ultimate Japan Dec 21, 1997 \n", + "017 UFC 15: Collision Course Oct 17, 1997 \n", + "016 UFC 14: Showdown Jul 27, 1997 \n", + "015 UFC 13: The Ultimate Force May 30, 1997 \n", + "014 UFC 12: Judgement Day Feb 7, 1997 \n", + "013 UFC: The Ultimate Ultimate 2 Dec 7, 1996 \n", + "012 UFC 11: The Proving Ground Sep 20, 1996 \n", + "011 UFC 10: The Tournament Jul 12, 1996 \n", + "010 UFC 9: Motor City Madness May 17, 1996 \n", + "009 UFC 8: David vs. Goliath Feb 16, 1996 \n", + "008 UFC: The Ultimate Ultimate Dec 16, 1995 \n", + "007 UFC 7: The Brawl in Buffalo Sep 8, 1995 \n", + "006 UFC 6: Clash of the Titans Jul 14, 1995 \n", + "005 UFC 5: The Return of the Beast Apr 7, 1995 \n", + "004 UFC 4: Revenge of the Warriors Dec 16, 1994 \n", + "003 UFC 3: The American Dream Sep 9, 1994 \n", + "002 UFC 2: No Way Out Mar 11, 1994 \n", + "001 UFC 1: The Beginning Nov 12, 1993 \n", + "\n", + " 3 \\\n", + "0 \n", + "# Venue \n", + "465 Centro de Formação Olímpica do Nordeste \n", + "– Honda Center \n", + "464 Barclays Center \n", + "463 The Forum \n", + "462 Fiserv Forum \n", + "461 Scotiabank Arena \n", + "460 Adelaide Entertainment Centre \n", + "459 Pearl Theatre \n", + "458 Cadillac Arena \n", + "457 Estadio Mary Terán de Weiss \n", + "456 Pepsi Center \n", + "455 Madison Square Garden \n", + "454 Avenir Centre \n", + "453 T-Mobile Arena \n", + "452 Ginásio do Ibirapuera \n", + "451 Olimpiyskiy Stadium \n", + "450 American Airlines Center \n", + "449 Pinnacle Bank Arena \n", + "448 Staples Center \n", + "447 Scotiabank Saddledome \n", + "446 Barclaycard Arena \n", + "445 CenturyLink Arena \n", + "444 T-Mobile Arena \n", + "443 Palms Casino Resort \n", + "442 Singapore Indoor Stadium \n", + "441 United Center \n", + "440 Adirondack Bank Center \n", + "439 Echo Arena \n", + "438 Movistar Arena \n", + ".. ... \n", + "030 Five Seasons Events Center \n", + "029 Yoyogi National Gymnasium \n", + "028 Lake Charles Civic Center \n", + "027 Tokyo Bay NK Hall \n", + "026 Lake Charles Civic Center \n", + "025 Five Seasons Events Center \n", + "024 Boutwell Memorial Auditorium \n", + "023 Casino Magic Bay St. Louis \n", + "022 Pontchartrain Center \n", + "021 Ginásio da Portuguesa \n", + "020 Mobile Civic Center \n", + "019 Pontchartrain Center \n", + "018 Yokohama Arena \n", + "017 Casino Magic Bay St. Louis \n", + "016 Boutwell Memorial Auditorium \n", + "015 Augusta Civic Center \n", + "014 Dothan Civic Center \n", + "013 Fair Park Arena \n", + "012 Augusta Civic Center \n", + "011 Fair Park Arena \n", + "010 Cobo Arena \n", + "009 Coliseo Rubén Rodríguez \n", + "008 Mammoth Gardens \n", + "007 Buffalo Memorial Auditorium \n", + "006 Casper Events Center \n", + "005 Independence Arena \n", + "004 Expo Square Pavilion \n", + "003 Grady Cole Center \n", + "002 Mammoth Gardens \n", + "001 McNichols Sports Arena \n", "\n", - " 4 5 6 \n", - "0 \n", - "005 Charlotte, North Carolina, U.S. 6000 [423] \n", - "004 Tulsa, Oklahoma, U.S. 5857 [424] \n", - "003 Charlotte, North Carolina, U.S. NaN NaN \n", - "002 Denver, Colorado, U.S. 2000 [425] \n", - "001 Denver, Colorado, U.S. 7800 [426] " + " 4 5 6 \n", + "0 \n", + "# Location Attendance Ref. \n", + "465 Fortaleza, Brazil 10040 [21] \n", + "– Anaheim, California, U.S. Cancelled [22] \n", + "464 Brooklyn, New York, U.S. 12152 [23] \n", + "463 Inglewood, California, U.S. 15862 [24] \n", + "462 Milwaukee, Wisconsin, U.S. 9010 [25] \n", + "461 Toronto, Ontario, Canada 19039 [26] \n", + "460 Adelaide, Australia 8652 [27] \n", + "459 Las Vegas, Nevada, U.S. 2020 [28] \n", + "458 Beijing, China 10302 [29] \n", + "457 Buenos Aires, Argentina 10245 [30] \n", + "456 Denver, Colorado, U.S. 11426 [31] \n", + "455 New York City, New York, U.S. 17011 [32] \n", + "454 Moncton, New Brunswick, Canada 6282 [33] \n", + "453 Las Vegas, Nevada, U.S. 20034 [34] \n", + "452 São Paulo, Brazil 9485 [35] \n", + "451 Moscow, Russia 22603 [36] \n", + "450 Dallas, Texas, U.S. 14073 [37] \n", + "449 Lincoln, Nebraska, U.S. 6409 [38] \n", + "448 Los Angeles, California, U.S. 17794 [39] \n", + "447 Calgary, Alberta, Canada 10603 [40] \n", + "446 Hamburg, Germany 7798 [41] \n", + "445 Boise, Idaho, U.S. 5648 [42] \n", + "444 Las Vegas, Nevada, U.S. 17464 [43] \n", + "443 Las Vegas, Nevada, U.S. 2123 [44] \n", + "442 Kallang, Singapore 6419 [45] \n", + "441 Chicago, Illinois, U.S. 18117 [46] \n", + "440 Utica, New York, U.S. 5063 [47] \n", + "439 Liverpool, England, U.K. 8520 [48] \n", + "438 Santiago, Chile 11082 [49] \n", + ".. ... ... ... \n", + "030 Cedar Rapids, Iowa, U.S. 1100 [409] \n", + "029 Tokyo, Japan NaN NaN \n", + "028 Lake Charles, Louisiana, U.S. NaN NaN \n", + "027 Chiba, Japan NaN NaN \n", + "026 Lake Charles, Louisiana, U.S. NaN NaN \n", + "025 Cedar Rapids, Iowa, U.S. NaN NaN \n", + "024 Birmingham, Alabama, U.S. NaN NaN \n", + "023 Bay St. Louis, Mississippi, U.S. NaN NaN \n", + "022 New Orleans, Louisiana, U.S. NaN NaN \n", + "021 São Paulo, Brazil NaN NaN \n", + "020 Mobile, Alabama, U.S. NaN NaN \n", + "019 New Orleans, Louisiana, U.S. 4600 [410] \n", + "018 Yokohama, Japan 5000 [411] \n", + "017 Bay St. Louis, Mississippi, U.S. NaN NaN \n", + "016 Birmingham, Alabama, U.S. 5000 [412] \n", + "015 Augusta, Georgia, U.S. 5100 [413] \n", + "014 Dothan, Alabama, U.S. 3100 [414] \n", + "013 Birmingham, Alabama, U.S. 6000 [415] \n", + "012 Augusta, Georgia, U.S. 4500 [416] \n", + "011 Birmingham, Alabama, U.S. 4300 [417] \n", + "010 Detroit, Michigan, U.S. 10000 [418] \n", + "009 Bayamón, Puerto Rico 13000 [419] \n", + "008 Denver, Colorado, U.S. 2800 [420] \n", + "007 Buffalo, New York, U.S. 9000 [421] \n", + "006 Casper, Wyoming, U.S. 2700 [422] \n", + "005 Charlotte, North Carolina, U.S. 6000 [423] \n", + "004 Tulsa, Oklahoma, U.S. 5857 [424] \n", + "003 Charlotte, North Carolina, U.S. NaN NaN \n", + "002 Denver, Colorado, U.S. 2000 [425] \n", + "001 Denver, Colorado, U.S. 7800 [426] \n", + "\n", + "[470 rows x 6 columns]" ] }, - "execution_count": 6, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "wikitables[1].tail()" + "wikitables[1]" ] }, { From 9cf3b5db5982f22ae735bd91769f4db3ef82ad2f Mon Sep 17 00:00:00 2001 From: softhints Date: Wed, 6 Feb 2019 13:58:21 +0200 Subject: [PATCH 07/76] How_to_extract_information_from_excel_with_Python_and_Pandas --- ...on_from_excel_with_Python_and_Pandas.ipynb | 702 ++++++++++++++++++ ...e wiki tables with pandas and python.ipynb | 4 +- 2 files changed, 704 insertions(+), 2 deletions(-) create mode 100644 notebooks/How_to_extract_information_from_excel_with_Python_and_Pandas.ipynb diff --git a/notebooks/How_to_extract_information_from_excel_with_Python_and_Pandas.ipynb b/notebooks/How_to_extract_information_from_excel_with_Python_and_Pandas.ipynb new file mode 100644 index 0000000..49500d1 --- /dev/null +++ b/notebooks/How_to_extract_information_from_excel_with_Python_and_Pandas.ipynb @@ -0,0 +1,702 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## How to extract information from excel with Python and Pandas\n", + "\n", + "source\n", + "\n", + "http://blog.softhints.com/excel-export-results-read-excel-python-pandas/\n", + "\n", + "requirements:\n", + "\n", + "```\n", + "pip install xlrd\n", + "pip install pandas\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "pd.set_option('display.max_columns', None) # or 1000\n", + "pd.set_option('display.max_rows', None) # or 1000\n", + "pd.set_option('display.max_colwidth', -1) # or 199" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Read excel file with python/pandas" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# read the file\n", + "xls = pd.ExcelFile('~/Documents/example.xlsx')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['People', 'Events', 'Countries']" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# get all sheet names\n", + "sheet_names = xls.sheet_names\n", + "sheet_names" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#EventDateVenueLocationAttendanceRef.
0465UFC Fight Night: Assunção vs. Moraes 2Feb 2, 2019Centro de Formação Olímpica do NordesteFortaleza, Brazil10040[21]
1UFC 233Jan 26, 2019Honda CenterAnaheim, California, U.S.Cancelled[22]
2464UFC Fight Night: Cejudo vs. DillashawJan 19, 2019Barclays CenterBrooklyn, New York, U.S.12152[23]
3463UFC 232: Jones vs. Gustafsson 2Dec 29, 2018The ForumInglewood, California, U.S.15862[24]
4462UFC on Fox: Lee vs. Iaquinta 2Dec 15, 2018Fiserv ForumMilwaukee, Wisconsin, U.S.9010[25]
5461UFC 231: Holloway vs. OrtegaDec 8, 2018Scotiabank ArenaToronto, Ontario, Canada19039[26]
6460UFC Fight Night: dos Santos vs. TuivasaDec 2, 2018Adelaide Entertainment CentreAdelaide, Australia8652[27]
7459The Ultimate Fighter: Heavy Hitters FinaleNov 30, 2018Pearl TheatreLas Vegas, Nevada, U.S.2020[28]
8458UFC Fight Night: Blaydes vs. Ngannou 2Nov 24, 2018Cadillac ArenaBeijing, China10302[29]
9457UFC Fight Night: Magny vs. PonzinibbioNov 17, 2018Estadio Mary Terán de WeissBuenos Aires, Argentina10245[30]
\n", + "
" + ], + "text/plain": [ + " # Event Date \\\n", + "0 465 UFC Fight Night: Assunção vs. Moraes 2 Feb 2, 2019 \n", + "1 – UFC 233 Jan 26, 2019 \n", + "2 464 UFC Fight Night: Cejudo vs. Dillashaw Jan 19, 2019 \n", + "3 463 UFC 232: Jones vs. Gustafsson 2 Dec 29, 2018 \n", + "4 462 UFC on Fox: Lee vs. Iaquinta 2 Dec 15, 2018 \n", + "5 461 UFC 231: Holloway vs. Ortega Dec 8, 2018 \n", + "6 460 UFC Fight Night: dos Santos vs. Tuivasa Dec 2, 2018 \n", + "7 459 The Ultimate Fighter: Heavy Hitters Finale Nov 30, 2018 \n", + "8 458 UFC Fight Night: Blaydes vs. Ngannou 2 Nov 24, 2018 \n", + "9 457 UFC Fight Night: Magny vs. Ponzinibbio Nov 17, 2018 \n", + "\n", + " Venue Location \\\n", + "0 Centro de Formação Olímpica do Nordeste Fortaleza, Brazil \n", + "1 Honda Center Anaheim, California, U.S. \n", + "2 Barclays Center Brooklyn, New York, U.S. \n", + "3 The Forum Inglewood, California, U.S. \n", + "4 Fiserv Forum Milwaukee, Wisconsin, U.S. \n", + "5 Scotiabank Arena Toronto, Ontario, Canada \n", + "6 Adelaide Entertainment Centre Adelaide, Australia \n", + "7 Pearl Theatre Las Vegas, Nevada, U.S. \n", + "8 Cadillac Arena Beijing, China \n", + "9 Estadio Mary Terán de Weiss Buenos Aires, Argentina \n", + "\n", + " Attendance Ref. \n", + "0 10040 [21] \n", + "1 Cancelled [22] \n", + "2 12152 [23] \n", + "3 15862 [24] \n", + "4 9010 [25] \n", + "5 19039 [26] \n", + "6 8652 [27] \n", + "7 2020 [28] \n", + "8 10302 [29] \n", + "9 10245 [30] " + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# get infomration only for one sheet\n", + "df = pd.read_excel(xls, \"Events\")\n", + "df.head(10) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Working with many sheets" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "################################## People ##################################\n", + " OrderDate Country Region\n", + "0 1/6/2018 US East \n", + "1 1/23/2018 Brazil Central\n", + "2 2/9/2018 Congo Central\n", + "3 2/26/2018 Japan Central\n", + "4 3/15/2018 Germany West \n", + "################################## Events ##################################\n", + " # Event Date\n", + "0 465 UFC Fight Night: Assunção vs. Moraes 2 Feb 2, 2019 \n", + "1 – UFC 233 Jan 26, 2019\n", + "2 464 UFC Fight Night: Cejudo vs. Dillashaw Jan 19, 2019\n", + "3 463 UFC 232: Jones vs. Gustafsson 2 Dec 29, 2018\n", + "4 462 UFC on Fox: Lee vs. Iaquinta 2 Dec 15, 2018\n", + "################################## Countries ##################################\n", + " 0 Rank Country\n", + "0 1 1.0 Russia* \n", + "1 2 2.0 China* \n", + "2 3 3.0 India \n", + "3 4 4.0 Kazakhstan* \n", + "4 5 5.0 Saudi Arabia\n" + ] + } + ], + "source": [ + "# read all sheets and extract first 5 rows, 3 columns\n", + "for tab in sheet_names:\n", + " print('################################## ' + tab + ' ##################################')\n", + " df = pd.read_excel(xls, tab)\n", + " print(df.iloc[:5, :3])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Search one sheet, one column for a string" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0RankCountryArea (km²)NotesNaN
011.0Russia*1310000017,125,200 including European partNaN
122.0China*9596961excludes Hong Kong, Macau, Taiwan and disputed...NaN
233.0India3287263NaNNaN
344.0Kazakhstan*24550342,724,902 km² including European partNaN
455.0Saudi Arabia2149690NaNNaN
566.0Iran1648195NaNNaN
677.0Mongolia1564110NaNNaN
788.0Indonesia*14726391,904,569 km² including Oceanian partNaN
899.0Pakistan796095882,363 km² including Gilgit-Baltistan and AJKNaN
91010.0Turkey*747272783,562 km² including European partNaN
\n", + "
" + ], + "text/plain": [ + " 0 Rank Country Area (km²) \\\n", + "0 1 1.0 Russia* 13100000 \n", + "1 2 2.0 China* 9596961 \n", + "2 3 3.0 India 3287263 \n", + "3 4 4.0 Kazakhstan* 2455034 \n", + "4 5 5.0 Saudi Arabia 2149690 \n", + "5 6 6.0 Iran 1648195 \n", + "6 7 7.0 Mongolia 1564110 \n", + "7 8 8.0 Indonesia* 1472639 \n", + "8 9 9.0 Pakistan 796095 \n", + "9 10 10.0 Turkey* 747272 \n", + "\n", + " Notes NaN \n", + "0 17,125,200 including European part NaN \n", + "1 excludes Hong Kong, Macau, Taiwan and disputed... NaN \n", + "2 NaN NaN \n", + "3 2,724,902 km² including European part NaN \n", + "4 NaN NaN \n", + "5 NaN NaN \n", + "6 NaN NaN \n", + "7 1,904,569 km² including Oceanian part NaN \n", + "8 882,363 km² including Gilgit-Baltistan and AJK NaN \n", + "9 783,562 km² including European part NaN " + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = pd.read_excel(xls, \"Countries\")\n", + "df.head(10) \n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0RankCountryArea (km²)NotesNaN
122.0China*9596961excludes Hong Kong, Macau, Taiwan and disputed...NaN
\n", + "
" + ], + "text/plain": [ + " 0 Rank Country Area (km²) \\\n", + "1 2 2.0 China* 9596961 \n", + "\n", + " Notes NaN \n", + "1 excludes Hong Kong, Macau, Taiwan and disputed... NaN " + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "agg = df[df['Country'].str.contains('China', na=False)]\n", + "agg" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Search in all sheets for a string" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "################################## People ##################################\n", + " Country\n", + "3 Japan \n", + "6 Japan \n", + "18 Japan \n", + "26 Japan \n", + "30 Japan \n", + "################################## Events ##################################\n", + " no column Country \n", + "################################## Countries ##################################\n", + " Country\n", + "17 Japan \n" + ] + } + ], + "source": [ + "# search in every sheet in column Country for word 'Japan'\n", + "# print out message if the column is missing\n", + "for tab in sheet_names:\n", + " print('################################## ' + tab + ' ##################################')\n", + " df = pd.read_excel(xls, tab)\n", + " try:\n", + " agg = df[df['Country'].str.contains('Japan', na=False)]\n", + " print(agg[['Country']])\n", + " except KeyError:\n", + " print(' no column Country ')" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "################################## People ##################################\n", + " Country\n", + "14 China \n", + "16 China \n", + "22 China \n", + "32 China \n", + "33 China \n", + "################################## Events ##################################\n", + " no tab Country \n", + "################################## Countries ##################################\n", + " Country\n", + "1 China*\n" + ] + } + ], + "source": [ + "# search for a partial match\n", + "for tab in sheet_names:\n", + " print('################################## ' + tab + ' ##################################')\n", + " df = pd.read_excel(xls, tab)\n", + " try:\n", + " agg = df[df['Country'].str.contains('China', na=False)]\n", + " print(agg[['Country']])\n", + " except KeyError:\n", + " print(' no tab Country ')" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "################################## People ##################################\n", + " Country\n", + "14 China \n", + "16 China \n", + "22 China \n", + "32 China \n", + "33 China \n", + "################################## Events ##################################\n", + " no tab Country \n", + "################################## Countries ##################################\n", + "Empty DataFrame\n", + "Columns: [Country]\n", + "Index: []\n" + ] + } + ], + "source": [ + "# search for a exact match\n", + "for tab in sheet_names:\n", + " print('################################## ' + tab + ' ##################################')\n", + " df = pd.read_excel(xls, tab)\n", + " try:\n", + " agg = df[df['Country'] == 'China']\n", + " print(agg[['Country']])\n", + " except KeyError:\n", + " print(' no tab Country ')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/Scrape wiki tables with pandas and python.ipynb b/notebooks/Scrape wiki tables with pandas and python.ipynb index 9c5c5b0..3264206 100644 --- a/notebooks/Scrape wiki tables with pandas and python.ipynb +++ b/notebooks/Scrape wiki tables with pandas and python.ipynb @@ -782,7 +782,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -1574,7 +1574,7 @@ "[470 rows x 6 columns]" ] }, - "execution_count": 7, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } From e3c390a399bf4285dbbef1e561cf48c7fc8849db Mon Sep 17 00:00:00 2001 From: softhints Date: Thu, 7 Feb 2019 22:32:47 +0200 Subject: [PATCH 08/76] Python problems for beginners --- .../Python_problems_for_beginners_1.ipynb | 243 ++++++++++++++++++ 1 file changed, 243 insertions(+) create mode 100644 notebooks/python_problems/Python_problems_for_beginners_1.ipynb diff --git a/notebooks/python_problems/Python_problems_for_beginners_1.ipynb b/notebooks/python_problems/Python_problems_for_beginners_1.ipynb new file mode 100644 index 0000000..354d2eb --- /dev/null +++ b/notebooks/python_problems/Python_problems_for_beginners_1.ipynb @@ -0,0 +1,243 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Python problems for beginners" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Problem 1 Triangle\n", + "\n", + "Write a simple program that demonstrate star pattern in Python 3.x for any n:\n", + "\n", + "Example n=5\n", + "\n", + " * \n", + " * * \n", + " * * * \n", + " * * * * \n", + " * * * * * " + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "* \n", + "* * \n", + "* * * \n", + "* * * * \n", + "* * * * * \n" + ] + } + ], + "source": [ + "n = 5\n", + "\n", + "for i in range(0, n+1):\n", + " print('* ' * i)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7\n" + ] + } + ], + "source": [ + "n = n +2\n", + "print(n)" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ddd" + ] + } + ], + "source": [ + "for x in ['a', 's', 'd']:\n", + " print('d', end='')" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[3, 6, 9]" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(range(3,10,3))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Problem 2 Triangle with numbers\n", + "\n", + "Write a simple program that demonstrate triangle (with numbers 0..n per line) in Python 3.x for any n:\n", + "\n", + "Example n=4\n", + "\n", + " 1 \n", + " 1 2 \n", + " 1 2 3 \n", + " 1 2 3 4" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "x\n", + "\n", + "x\n", + "y\n", + "1 \n", + "x\n", + "y\n", + "1 y\n", + "2 \n", + "x\n", + "y\n", + "1 y\n", + "2 y\n", + "3 \n", + "x\n", + "y\n", + "1 y\n", + "2 y\n", + "3 y\n", + "4 \n" + ] + } + ], + "source": [ + "n = 4\n", + "\n", + "for i in range(0, n+1):\n", + " for j in range(1, i + 1):\n", + " print(j, end=' ')\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Homework 1 Triangle with letters \n", + "\n", + "Write a simple program that demonstrate triangle (with consequtive letters) in Python 3.x for any n:\n", + "\n", + "Example n=4\n", + "\n", + " A \n", + " B C \n", + " D E F \n", + " G H I J \n", + " K L M N O " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Homework 2 Diagonal of numbers\n", + "\n", + "Write a simple program that demonstrate diagonal pattern in Python 3.x for any n:\n", + "\n", + "Example n=4\n", + "\n", + "0\n", + " 1\n", + " 2\n", + " 3\n", + " 4" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Homework 3 Pyramid\n", + "\n", + "Write a simple program that demonstrate pyramid pattern in Python 3.x for any n:\n", + "\n", + "Example n=3\n", + "\n", + " * \n", + " * * * \n", + " * * * * * " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 9e83effd8a56c5adfb1534ed201c444390782cfb Mon Sep 17 00:00:00 2001 From: softhints Date: Fri, 8 Feb 2019 19:35:45 +0200 Subject: [PATCH 09/76] Load multiple CSV files into a single Dataframe.ipynb --- ...e_CSV_files_into_a_single _Dataframe.ipynb | 456 ++++++++++++++++++ 1 file changed, 456 insertions(+) create mode 100644 notebooks/Load_multiple_CSV_files_into_a_single _Dataframe.ipynb diff --git a/notebooks/Load_multiple_CSV_files_into_a_single _Dataframe.ipynb b/notebooks/Load_multiple_CSV_files_into_a_single _Dataframe.ipynb new file mode 100644 index 0000000..5b9eccf --- /dev/null +++ b/notebooks/Load_multiple_CSV_files_into_a_single _Dataframe.ipynb @@ -0,0 +1,456 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "pd.set_option('display.max_colwidth', -1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Rename multiple CSV files in a folder with Python" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import glob, os\n", + "\n", + "def rename(dir, pathAndFilename, pattern, titlePattern):\n", + " os.rename(pathAndFilename, os.path.join(dir, titlePattern))\n", + "\n", + "# search for csv files in the working folder \n", + "path = os.path.expanduser(\"~/Projects/MYP/Datasets/test/*.csv\")\n", + "\n", + "# iterate and rename them one by one with the number of the iteration\n", + "for i, fname in enumerate(glob.glob(path)):\n", + " rename(os.path.expanduser('~/Projects/MYP/Datasets/test/'), fname, r'*.csv', r'test{}.csv'.format(i))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load several files into Dataframe" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(541, 7)\n", + "(550, 7)\n", + "(1641, 7)\n" + ] + } + ], + "source": [ + "# change separator for CSV file\n", + "df1 = pd.read_csv('~/Projects/MYP/Datasets/test/test0.csv', sep=\"@\")\n", + "df2 = pd.read_csv('~/Projects/MYP/Datasets/test/test1.csv', sep=\"@\")\n", + "df3 = pd.read_csv('~/Projects/MYP/Datasets/test/test1.csv', sep=\"@\")\n", + "\n", + "frames = [df1, df2, df3]\n", + "\n", + "# concatenate multiple data CSV files\n", + "all = pd.concat(frames)\n", + "\n", + "print(df1.shape)\n", + "print(df2.shape)\n", + "print(all.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dynamically Load multiple csv file into Dataframe" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleViewsLikeDislikeCommentchannel
215Turning Google Earth into SimCity 2000168175.03251.01125.0215.0test0.csv
301Microservices + Events + Docker = A Perfect Trio161110.03213.050.083.0test0.csv
265PHP in 2018 by the Creator of PHP164577.03557.069.0384.0test0.csv
468Developing Blockchain Software169484.02512.0116.0133.0test0.csv
398VS Code: The Last Editor You'll Ever Need172738.01930.0194.0340.0test0.csv
175Coding Challenge #74: Clock with p5.js232227.04609.068.0289.0test1.csv
373Coding Challenge #12: The Lorenz Attractor in Processing217172.03680.043.0333.0test1.csv
44710.4: Loading JSON data from a URL (Asynchronous Callbacks!) - p5.js Tutorial218081.02120.079.0240.0test1.csv
269The Coding Train!218635.02482.083.0324.0test1.csv
193Coding Challenge #71: Minesweeper220816.03334.071.0401.0test1.csv
\n", + "
" + ], + "text/plain": [ + " title \\\n", + "215 Turning Google Earth into SimCity 2000 \n", + "301 Microservices + Events + Docker = A Perfect Trio \n", + "265 PHP in 2018 by the Creator of PHP \n", + "468 Developing Blockchain Software \n", + "398 VS Code: The Last Editor You'll Ever Need \n", + "175 Coding Challenge #74: Clock with p5.js \n", + "373 Coding Challenge #12: The Lorenz Attractor in Processing \n", + "447 10.4: Loading JSON data from a URL (Asynchronous Callbacks!) - p5.js Tutorial \n", + "269 The Coding Train! \n", + "193 Coding Challenge #71: Minesweeper \n", + "\n", + " Views Like Dislike Comment channel \n", + "215 168175.0 3251.0 1125.0 215.0 test0.csv \n", + "301 161110.0 3213.0 50.0 83.0 test0.csv \n", + "265 164577.0 3557.0 69.0 384.0 test0.csv \n", + "468 169484.0 2512.0 116.0 133.0 test0.csv \n", + "398 172738.0 1930.0 194.0 340.0 test0.csv \n", + "175 232227.0 4609.0 68.0 289.0 test1.csv \n", + "373 217172.0 3680.0 43.0 333.0 test1.csv \n", + "447 218081.0 2120.0 79.0 240.0 test1.csv \n", + "269 218635.0 2482.0 83.0 324.0 test1.csv \n", + "193 220816.0 3334.0 71.0 401.0 test1.csv " + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import glob\n", + "\n", + "result = pd.DataFrame()\n", + "\n", + "path = os.path.expanduser(\"~/Projects/MYP/Datasets/test/*.csv\")\n", + "\n", + "for fname in glob.glob(path):\n", + " head, tail = os.path.split(fname)\n", + " df = pd.read_csv(fname, sep=\"@\")\n", + " df2 = df.sort_values(by=['Views'], ascending=False).drop(['Favorite', 'videoID'], axis=1).iloc[15:20,:]\n", + " df2['channel'] = tail\n", + " result = pd.concat([result, df2])\n", + "result.sort_values(by=['channel']).iloc[0:10,] " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Generate clickable links with pandas and Jupyter notebook" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleViewsLikeDislikeFavoriteCommentvideoIDnameurl
20How To...80620.0121.013.00.013.0https:...\n", + "
21How To...165533.0432.0143.00.017.0https:...\n", + "
22How To...29636.099.016.00.08.0https:...\n", + "
23How to...409.04.00.00.00.0https:...\n", + "
24How to...31358.059.033.00.02.0https:...\n", + "
25How To...85887.0272.076.00.04.0https:...\n", + "
26How To...61449.095.034.00.00.0https:...\n", + "
27How To...262342.01440.093.00.0447.0https:...\n", + "
28How To...154661.0453.0122.00.011.0https:...\n", + "
29How To...109787.0257.040.00.022.0https:...\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from IPython.display import HTML\n", + "\n", + "# convert url column into href tag and add it as a new column to dataframe\n", + "df['nameurl'] = df['videoID'].apply(lambda x: 'XXXXX'.format(x))\n", + "\n", + "\n", + "\n", + "# otherwise the link will be blank\n", + "pd.set_option('display.max_colwidth', 10)\n", + "\n", + "# in order to display HTML code\n", + "HTML(df.iloc[20:30,] .to_html(escape=False))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} From d6afc00d213443183751c908bf9200f361475a0f Mon Sep 17 00:00:00 2001 From: softhints Date: Sun, 10 Feb 2019 22:26:48 +0200 Subject: [PATCH 10/76] Youtube-PewDiePie --- notebooks/youtube/Youtube-PewDiePie.ipynb | 5707 +++++++++++++++++++++ 1 file changed, 5707 insertions(+) create mode 100644 notebooks/youtube/Youtube-PewDiePie.ipynb diff --git a/notebooks/youtube/Youtube-PewDiePie.ipynb b/notebooks/youtube/Youtube-PewDiePie.ipynb new file mode 100644 index 0000000..bd11a94 --- /dev/null +++ b/notebooks/youtube/Youtube-PewDiePie.ipynb @@ -0,0 +1,5707 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "pd.set_option('display.max_colwidth', -1)\n", + "pd.options.display.float_format = '{:,}'.format" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(359, 8)\n" + ] + } + ], + "source": [ + "df = pd.read_csv(\n", + " \"~/Projects/MYP/Datasets/Youtube/PewDiePie20190210.csv\", sep=\"@\")\n", + "print(df.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleViewsLikeDislikeFavoriteCommentvideoIDtags
0YOU HAD ONE JOB! - with editor Brad15,293,108.0385,429.04,083.00.029,855.0https://www.youtube.com/watch?v=B67OBHNCopkSATIRE, reddit, you had one job, onejob
1Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE5,358,466.0378,535.03,951.00.038,075.0https://www.youtube.com/watch?v=kLM_9gBZIqYSATIRE
2We broke another WORLD RECORD!8,558,673.0595,622.07,901.00.053,664.0https://www.youtube.com/watch?v=d1tAfXKc7-cSATIRE
3FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~3,609,152.0218,530.03,126.00.017,595.0https://www.youtube.com/watch?v=bMLdNrB5hAoSATIRE
4Don't Laugh Challenge, NEW SEASON!!!!!5,888,465.0569,900.07,824.00.029,373.0https://www.youtube.com/watch?v=Zgm_iM3f_MESATIRE
\n", + "
" + ], + "text/plain": [ + " title Views \\\n", + "0 YOU HAD ONE JOB! - with editor Brad1 5,293,108.0 \n", + "1 Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE 5,358,466.0 \n", + "2 We broke another WORLD RECORD! 8,558,673.0 \n", + "3 FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~ 3,609,152.0 \n", + "4 Don't Laugh Challenge, NEW SEASON!!!!! 5,888,465.0 \n", + "\n", + " Like Dislike Favorite Comment \\\n", + "0 385,429.0 4,083.0 0.0 29,855.0 \n", + "1 378,535.0 3,951.0 0.0 38,075.0 \n", + "2 595,622.0 7,901.0 0.0 53,664.0 \n", + "3 218,530.0 3,126.0 0.0 17,595.0 \n", + "4 569,900.0 7,824.0 0.0 29,373.0 \n", + "\n", + " videoID \\\n", + "0 https://www.youtube.com/watch?v=B67OBHNCopk \n", + "1 https://www.youtube.com/watch?v=kLM_9gBZIqY \n", + "2 https://www.youtube.com/watch?v=d1tAfXKc7-c \n", + "3 https://www.youtube.com/watch?v=bMLdNrB5hAo \n", + "4 https://www.youtube.com/watch?v=Zgm_iM3f_ME \n", + "\n", + " tags \n", + "0 SATIRE, reddit, you had one job, onejob \n", + "1 SATIRE \n", + "2 SATIRE \n", + "3 SATIRE \n", + "4 SATIRE " + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dropna()\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(359, 8)" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "df3 = df.tags.str.split(',', expand=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123456789...38394041424344454647
0TrueTrueTrueTrueFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
1TrueFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
2TrueFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
3TrueFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
4TrueFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
\n", + "

5 rows × 48 columns

\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4 5 6 7 8 9 ... \\\n", + "0 True True True True False False False False False False ... \n", + "1 True False False False False False False False False False ... \n", + "2 True False False False False False False False False False ... \n", + "3 True False False False False False False False False False ... \n", + "4 True False False False False False False False False False ... \n", + "\n", + " 38 39 40 41 42 43 44 45 46 47 \n", + "0 False False False False False False False False False False \n", + "1 False False False False False False False False False False \n", + "2 False False False False False False False False False False \n", + "3 False False False False False False False False False False \n", + "4 False False False False False False False False False False \n", + "\n", + "[5 rows x 48 columns]" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df2 = df.tags.str.split(',', expand=True).notna()\n", + "df2.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "RangeIndex(start=0, stop=48, step=1)" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "columns = df2.columns\n", + "columns" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123456789...38394041424344454647
0TrueTrueTrueTrueFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
1TrueFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
2TrueFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
3TrueFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
4TrueFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
\n", + "

5 rows × 48 columns

\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4 5 6 7 8 9 ... \\\n", + "0 True True True True False False False False False False ... \n", + "1 True False False False False False False False False False ... \n", + "2 True False False False False False False False False False ... \n", + "3 True False False False False False False False False False ... \n", + "4 True False False False False False False False False False ... \n", + "\n", + " 38 39 40 41 42 43 44 45 46 47 \n", + "0 False False False False False False False False False False \n", + "1 False False False False False False False False False False \n", + "2 False False False False False False False False False False \n", + "3 False False False False False False False False False False \n", + "4 False False False False False False False False False False \n", + "\n", + "[5 rows x 48 columns]" + ] + }, + "execution_count": 65, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df2.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123456789...38394041424344454647
0SATIREreddityou had one jobonejobNoneNoneNoneNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
1SATIRENoneNoneNoneNoneNoneNoneNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
2SATIRENoneNoneNoneNoneNoneNoneNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
3SATIRENoneNoneNoneNoneNoneNoneNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
4SATIRENoneNoneNoneNoneNoneNoneNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
\n", + "

5 rows × 48 columns

\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4 5 6 7 8 \\\n", + "0 SATIRE reddit you had one job onejob None None None None None \n", + "1 SATIRE None None None None None None None None \n", + "2 SATIRE None None None None None None None None \n", + "3 SATIRE None None None None None None None None \n", + "4 SATIRE None None None None None None None None \n", + "\n", + " 9 ... 38 39 40 41 42 43 44 45 46 47 \n", + "0 None ... None None None None None None None None None None \n", + "1 None ... None None None None None None None None None None \n", + "2 None ... None None None None None None None None None None \n", + "3 None ... None None None None None None None None None None \n", + "4 None ... None None None None None None None None None None \n", + "\n", + "[5 rows x 48 columns]" + ] + }, + "execution_count": 66, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df3.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ssssssssssssssssssssssssssssssssss0ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 reddit \n", + "2 you had one job\n", + "3 onejob \n", + "Name: 0, dtype: object\n", + "ssssssssssssssssssssssssssssssssss1ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 1, dtype: object\n", + "ssssssssssssssssssssssssssssssssss2ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 2, dtype: object\n", + "ssssssssssssssssssssssssssssssssss3ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 3, dtype: object\n", + "ssssssssssssssssssssssssssssssssss4ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 4, dtype: object\n", + "ssssssssssssssssssssssssssssssssss5ssssssssssssssssssssssssssssssssss\n", + "0 player \n", + "1 unknown \n", + "2 PUBG \n", + "3 player unknowns \n", + "4 player unknown's\n", + "5 battleground \n", + "6 battle \n", + "7 ground \n", + "Name: 5, dtype: object\n", + "ssssssssssssssssssssssssssssssssss6ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 6, dtype: object\n", + "ssssssssssssssssssssssssssssssssss7ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 meme review\n", + "2 elon musk \n", + "Name: 7, dtype: object\n", + "ssssssssssssssssssssssssssssssssss8ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 The Battle Wizard . ENDING EXPLAINED\n", + "2 the battle wizard \n", + "3 battle wizard \n", + "4 battle wizard 1977 \n", + "5 battle wizard movie \n", + "6 movie review \n", + "7 movie \n", + "8 film review \n", + "9 pewdiepie \n", + "10 pewds \n", + "11 pewdie \n", + "12 pdp \n", + "13 wizard \n", + "Name: 8, dtype: object\n", + "ssssssssssssssssssssssssssssssssss9ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 dr phil\n", + "2 react \n", + "Name: 9, dtype: object\n", + "ssssssssssssssssssssssssssssssssss10ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 Thats right... I'm a GAMER\n", + "2 gamer \n", + "3 gaming \n", + "4 youtube gaming \n", + "5 memes \n", + "6 pewdiepie \n", + "7 pewds \n", + "8 pewdie \n", + "9 pdp \n", + "Name: 10, dtype: object\n", + "ssssssssssssssssssssssssssssssssss11ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 11, dtype: object\n", + "ssssssssssssssssssssssssssssssssss12ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 tiktok \n", + "2 tik tok \n", + "3 tik tok funny \n", + "4 tik tok compilation\n", + "5 funny tik toks \n", + "6 funny tiktok \n", + "7 funny tiktok memes \n", + "8 tiktok songs \n", + "9 tiktok cringe \n", + "10 cringe \n", + "11 cringe compilation \n", + "12 tiktok memes \n", + "13 tik tok memes \n", + "14 pewdiepie tiktok \n", + "15 pewdiepie \n", + "16 pewds \n", + "17 pewdie \n", + "18 pdp \n", + "19 #ad \n", + "20 4K video \n", + "Name: 12, dtype: object\n", + "ssssssssssssssssssssssssssssssssss13ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 13, dtype: object\n", + "ssssssssssssssssssssssssssssssssss14ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 MY NEW SHOW / You Laugh You Lose\n", + "2 you laugh you lose \n", + "3 ylyl \n", + "4 you laugh you lose challenge \n", + "5 try not to laugh \n", + "6 try not to laugh challenge \n", + "7 pewdiepie \n", + "8 pewdiepie ylyl \n", + "9 ylyl pewds \n", + "10 pewdie \n", + "11 pdp \n", + "12 pewds \n", + "Name: 14, dtype: object\n", + "ssssssssssssssssssssssssssssssssss15ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "1 pew \n", + "2 news \n", + "Name: 15, dtype: object\n", + "ssssssssssssssssssssssssssssssssss16ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 Sasuke Memes are NOT OK\n", + "2 sasuke \n", + "3 sasuke naruto \n", + "4 naruto \n", + "5 pewdiepie \n", + "6 meme review \n", + "7 memes \n", + "8 meme \n", + "9 pewds \n", + "10 pewdie \n", + "11 pdp \n", + "12 wave check \n", + "13 waves \n", + "14 wave hair \n", + "15 waves hair \n", + "Name: 16, dtype: object\n", + "ssssssssssssssssssssssssssssssssss17ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 17, dtype: object\n", + "ssssssssssssssssssssssssssssssssss18ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 dr phil \n", + "2 dr phil spoiled teen \n", + "3 dr phil pewdiepie \n", + "4 Dr Phil VS Spoiled teen *destroyed by facts and logic*\n", + "5 dr phil spoiled \n", + "6 dr phil full episodes \n", + "7 pewds \n", + "8 pewdie \n", + "9 pewdiepie \n", + "10 pdp \n", + "11 dr phil 2019 \n", + "Name: 18, dtype: object\n", + "ssssssssssssssssssssssssssssssssss19ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 19, dtype: object\n", + "ssssssssssssssssssssssssssssssssss20ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 20, dtype: object\n", + "ssssssssssssssssssssssssssssssssss21ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 21, dtype: object\n", + "ssssssssssssssssssssssssssssssssss22ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 Im most handsome 2018 \n", + "2 gamer girls \n", + "3 reaction \n", + "4 react \n", + "5 gamer girls react \n", + "6 most handsome man \n", + "7 pewdiepie \n", + "8 pewds \n", + "9 pewdie \n", + "10 pdp \n", + "11 lwiay \n", + "12 pewdiepie lwiay \n", + "13 pokimane \n", + "14 lords mobile \n", + "15 ads \n", + "16 ad \n", + "17 lords mobile ad \n", + "18 mobile ads \n", + "19 handsome man \n", + "20 most handsome man winner\n", + "21 handsome \n", + "22 gamer girls reaction \n", + "23 gamer \n", + "24 girls \n", + "25 gaming \n", + "26 entertainment \n", + "Name: 22, dtype: object\n", + "ssssssssssssssssssssssssssssssssss23ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 ylyl \n", + "2 comedy \n", + "3 you laugh you lose\n", + "4 compilation \n", + "5 try not to laugh \n", + "6 challenge \n", + "Name: 23, dtype: object\n", + "ssssssssssssssssssssssssssssssssss24ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 24, dtype: object\n", + "ssssssssssssssssssssssssssssssssss25ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 jesus \n", + "2 socalchrist \n", + "3 fake gamers \n", + "4 fake gamer girl \n", + "5 gamer girls \n", + "6 twitch girls \n", + "7 ricegum jakepaul \n", + "8 jake paul \n", + "9 jake paul amazon \n", + "10 amazon \n", + "11 amazon gift card \n", + "12 fake amazon giftcard\n", + "13 fake amazon \n", + "14 pewdiepie \n", + "15 pewds \n", + "16 pewdie \n", + "17 pdp \n", + "18 pew news \n", + "19 #ad \n", + "20 news \n", + "21 current affairs \n", + "22 ricegum jake paul \n", + "Name: 25, dtype: object\n", + "ssssssssssssssssssssssssssssssssss26ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 vr chat\n", + "2 game \n", + "3 gaming \n", + "Name: 26, dtype: object\n", + "ssssssssssssssssssssssssssssssssss27ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 genius \n", + "2 review \n", + "3 lele pons \n", + "4 gabbi hannsomething \n", + "5 jacob whatever his name is\n", + "6 other people \n", + "Name: 27, dtype: object\n", + "ssssssssssssssssssssssssssssssssss28ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 reddit \n", + "2 reddit review \n", + "3 pewdiepie \n", + "4 pewds \n", + "5 pewdie \n", + "6 pdp \n", + "7 tseries \n", + "8 t series \n", + "9 pewdiepie vs tseries \n", + "10 pewdiepie vs t series \n", + "11 oopsie \n", + "12 /r/ \n", + "13 /r \n", + "14 reddit try not to laugh \n", + "15 reddit cringe \n", + "16 reddit stories \n", + "17 reddit cringe compilation\n", + "18 vox \n", + "19 vox media \n", + "20 pewdiepie vox media \n", + "21 pewdiepie vox \n", + "22 Unintentional Opsies \n", + "23 opsies \n", + "Name: 28, dtype: object\n", + "ssssssssssssssssssssssssssssssssss29ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 pewdiepie vs thanos\n", + "2 Pewdiepie vs Thanos\n", + "3 WHO would WIN? \n", + "4 pewdiepie \n", + "5 pewds \n", + "6 pewdie \n", + "7 pdp \n", + "8 thanos \n", + "9 thanos meme \n", + "10 thanos memes \n", + "11 tseries \n", + "Name: 29, dtype: object\n", + "ssssssssssssssssssssssssssssssssss30ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 meme \n", + "2 review\n", + "3 awards\n", + "Name: 30, dtype: object\n", + "ssssssssssssssssssssssssssssssssss31ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 We broke a world record!\n", + "2 world \n", + "3 record \n", + "4 world record \n", + "5 pewdiepie \n", + "6 pewds \n", + "7 pewdie \n", + "8 pdp \n", + "9 world record pewdipie \n", + "10 tseries \n", + "11 t series \n", + "12 youtube rewind \n", + "13 youtube rewind 2018 \n", + "Name: 31, dtype: object\n", + "ssssssssssssssssssssssssssssssssss32ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 32, dtype: object\n", + "ssssssssssssssssssssssssssssssssss33ssssssssssssssssssssssssssssssssss\n", + "0 rewind 2018 \n", + "1 youtube rewind 2018\n", + "Name: 33, dtype: object\n", + "ssssssssssssssssssssssssssssssssss34ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 dr phil \n", + "2 Dr Phil ANNIHILATES spoiled Teen!!\n", + "3 dr phil spoiled daughter \n", + "4 dr phil full episodes \n", + "5 dr phil im white \n", + "6 dr phil annihilates \n", + "7 spoiled teen \n", + "8 dr phil spoiled \n", + "9 dr phil pewdiepie \n", + "10 dr phil 2018 \n", + "11 dr phil funny \n", + "12 dr phil meme review \n", + "13 dr phil treasure \n", + "14 dr phil video \n", + "15 dr phil tv show \n", + "16 pewdiepie \n", + "17 pewds \n", + "18 pewdie \n", + "19 pdp \n", + "Name: 34, dtype: object\n", + "ssssssssssssssssssssssssssssssssss35ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 25 Dec 2018\n", + "Name: 35, dtype: object\n", + "ssssssssssssssssssssssssssssssssss36ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "1 lwiay\n", + "Name: 36, dtype: object\n", + "ssssssssssssssssssssssssssssssssss37ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "1 wsj \n", + "2 hack \n", + "Name: 37, dtype: object\n", + "ssssssssssssssssssssssssssssssssss38ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 ylyl \n", + "2 You SLAV You Lose \n", + "3 you laugh \n", + "4 you lose \n", + "5 try not to laugh \n", + "6 you laugh you lose \n", + "7 you laugh you lose pewdiepie\n", + "8 try not to laugh challenge \n", + "9 pewdiepie \n", + "10 pewds \n", + "11 pewdie \n", + "12 pdp \n", + "Name: 38, dtype: object\n", + "ssssssssssssssssssssssssssssssssss39ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 youtube rewind\n", + "2 rewind \n", + "3 2018 \n", + "4 roast \n", + "5 lwiay \n", + "6 ylyl \n", + "7 meme \n", + "8 review \n", + "Name: 39, dtype: object\n", + "ssssssssssssssssssssssssssssssssss40ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 pewdiepie \n", + "2 pewds \n", + "3 pewdie \n", + "4 pdp \n", + "5 PewDiePie's biggest OOPSIE.\n", + "6 pew news \n", + "7 game awards 2018 \n", + "8 game awards 2018 cringe \n", + "Name: 40, dtype: object\n", + "ssssssssssssssssssssssssssssssssss41ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 41, dtype: object\n", + "ssssssssssssssssssssssssssssssssss42ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 tiktok \n", + "2 tiktok memes \n", + "3 tiktok songs \n", + "4 tiktok cringe \n", + "5 tiktok tutorial \n", + "6 tiktok hit or miss \n", + "7 tiktok music \n", + "8 tiktok fortnite \n", + "9 tiktok cringe compilation \n", + "10 tiktok epic \n", + "11 best tiktok \n", + "12 best tiktok videos \n", + "13 tiktok funny \n", + "14 tiktok funny videos \n", + "15 tiktok haha \n", + "16 tiktok epic memes \n", + "17 tiktok compilation \n", + "18 tiktok compilation 2018 \n", + "19 tiktok 2018 \n", + "20 Tik Tok Very Funny Haha Epic Compilation Montage BEST TIK TOK 2018 LOL\n", + "21 tiktok montage \n", + "22 pewdiepie tiktok \n", + "23 pewdiepie vs t series \n", + "24 pewdiepie \n", + "25 pewdie \n", + "26 pdp \n", + "Name: 42, dtype: object\n", + "ssssssssssssssssssssssssssssssssss43ssssssssssssssssssssssssssssssssss\n", + "0 player \n", + "1 unknown \n", + "2 PUBG \n", + "3 player unknowns \n", + "4 player unknown's\n", + "5 battleground \n", + "6 battle \n", + "7 ground \n", + "Name: 43, dtype: object\n", + "ssssssssssssssssssssssssssssssssss44ssssssssssssssssssssssssssssssssss\n", + "0 TRY TO LAUGH NOT CHALLENGE \n", + "1 TRY NOT TO LAUGH \n", + "2 try not to laugh challenge \n", + "3 try not to laugh challenge impossible\n", + "4 try not to laugh challenge clean \n", + "5 try not to laugh \n", + "6 try not to laugh tiktok \n", + "7 tltl \n", + "8 pewdiepie \n", + "9 pewds \n", + "10 pewdie \n", + "11 ylyl \n", + "12 you laugh you lose \n", + "13 episode 1 season 1 \n", + "14 ep 1 \n", + "15 pdp \n", + "16 pewdiepie ylyl \n", + "17 video \n", + "18 youtube video \n", + "19 youtube channel \n", + "20 t series \n", + "21 tseries vs pewdiepie \n", + "22 tiktok \n", + "23 fortnite \n", + "24 fortnite funny moments \n", + "Name: 44, dtype: object\n", + "ssssssssssssssssssssssssssssssssss45ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 youtube\n", + "2 rewind \n", + "3 meme \n", + "4 yea \n", + "5 review \n", + "Name: 45, dtype: object\n", + "ssssssssssssssssssssssssssssssssss46ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 meme review\n", + "Name: 46, dtype: object\n", + "ssssssssssssssssssssssssssssssssss47ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 pewdiepie\n", + "2 fortnite \n", + "3 lwiay \n", + "4 ylyl \n", + "5 meme \n", + "6 review \n", + "7 season 7 \n", + "8 new \n", + "9 skins \n", + "Name: 47, dtype: object\n", + "ssssssssssssssssssssssssssssssssss48ssssssssssssssssssssssssssssssssss\n", + "0 player \n", + "1 unknown \n", + "2 PUBG \n", + "3 player unknowns \n", + "4 player unknown's\n", + "5 battleground \n", + "6 battle \n", + "7 ground \n", + "Name: 48, dtype: object\n", + "ssssssssssssssssssssssssssssssssss49ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 2x \n", + "2 slow mo\n", + "3 50% \n", + "4 speed \n", + "Name: 49, dtype: object\n", + "ssssssssssssssssssssssssssssssssss50ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 meme \n", + "2 review\n", + "Name: 50, dtype: object\n", + "ssssssssssssssssssssssssssssssssss51ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 tekashi69 \n", + "2 tekashi 6ix9ine \n", + "3 tekashi69 songs \n", + "4 6ix9ine \n", + "5 6ix9ine 2018 \n", + "6 Ninja \n", + "7 ninja fortnite \n", + "8 ninja fortnite gameplay \n", + "9 fortnite \n", + "10 fortnite funny moments \n", + "11 icy five ninja \n", + "12 alinity \n", + "13 alinity pewdiepie \n", + "14 alinity pewdiepie copystrike \n", + "15 pew news \n", + "16 pewdiepie \n", + "17 pewds \n", + "18 pdp \n", + "19 pewdie \n", + "20 youtube video \n", + "21 youtube channel \n", + "22 youtube \n", + "23 Tekashi69 BAN \n", + "24 Ninja caught selling underwear\n", + "25 Alinity facing 32 year prison.\n", + "26 smosh \n", + "27 news \n", + "28 news live \n", + "29 world news \n", + "Name: 51, dtype: object\n", + "ssssssssssssssssssssssssssssssssss52ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 beat \n", + "2 saber \n", + "3 vr \n", + "4 gameplay\n", + "Name: 52, dtype: object\n", + "ssssssssssssssssssssssssssssssssss53ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 The last hope for my channel...\n", + "2 pewdiepie \n", + "3 pewds \n", + "4 pdp \n", + "5 pewdie \n", + "6 last hope \n", + "7 youtube \n", + "8 youtube channel \n", + "Name: 53, dtype: object\n", + "ssssssssssssssssssssssssssssssssss54ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 meme \n", + "2 review\n", + "Name: 54, dtype: object\n", + "ssssssssssssssssssssssssssssssssss55ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 oblivion \n", + "2 skyrimn \n", + "3 skyrim \n", + "4 gameplay \n", + "5 funny \n", + "6 moments \n", + "7 compilation\n", + "8 meme \n", + "9 memes \n", + "Name: 55, dtype: object\n", + "ssssssssssssssssssssssssssssssssss56ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 This video is blocked in your country.\n", + "2 video \n", + "3 youtube video \n", + "4 pewdiepie \n", + "5 youtube pewdiepie \n", + "6 this video is blocked \n", + "7 blocked \n", + "8 pewds \n", + "9 pewdie \n", + "10 pdp \n", + "11 article 13 \n", + "12 article 11 \n", + "13 youtube support \n", + "14 india \n", + "15 iisuperwomanii \n", + "16 taking a break \n", + "Name: 56, dtype: object\n", + "ssssssssssssssssssssssssssssssssss57ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 We made history!!! *again*\n", + "2 We made history! \n", + "3 we \n", + "4 made \n", + "5 history \n", + "6 pewdiepie \n", + "7 pewds \n", + "8 pdp \n", + "9 pewdie \n", + "10 lwaiy \n", + "11 tseries \n", + "12 t-series \n", + "13 lwiay pewdiepie \n", + "14 marzia \n", + "15 markiplier \n", + "16 try not to laugh \n", + "17 we made history again \n", + "Name: 57, dtype: object\n", + "ssssssssssssssssssssssssssssssssss58ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 you laugh you lose\n", + "2 challenge \n", + "Name: 58, dtype: object\n", + "ssssssssssssssssssssssssssssssssss59ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 A message to Obama\n", + "2 OBAMA \n", + "3 memes \n", + "4 meme \n", + "5 dank memes \n", + "6 memes 2018 \n", + "7 pewdiepie \n", + "8 pewds \n", + "9 pdp \n", + "10 pewdie \n", + "Name: 59, dtype: object\n", + "ssssssssssssssssssssssssssssssssss60ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 TIKTOK \n", + "2 tik tok \n", + "3 tik tok cringe \n", + "4 tiktok pewdiepie \n", + "5 pewdiepie \n", + "6 pewds \n", + "7 pewdie \n", + "8 pdp \n", + "9 tiktok has gone too far \n", + "10 OK \n", + "11 TIK TOK HAS GONE TOO FAR NOW...\n", + "12 tiktok compilation \n", + "13 tiktok memes \n", + "14 meme \n", + "15 memes \n", + "16 pewdiepie memes \n", + "17 pewdiepie meme \n", + "18 pewdiepie tik tok \n", + "19 tiktok ad \n", + "20 tiktok funny \n", + "21 cringe challenge \n", + "22 cringe \n", + "23 cringe tiktok \n", + "24 funny tiktok videos \n", + "25 musically \n", + "26 musical.ly \n", + "27 tiktok trolls \n", + "Name: 60, dtype: object\n", + "ssssssssssssssssssssssssssssssssss61ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 We made history!\n", + "2 we \n", + "3 made \n", + "4 history \n", + "5 pewdiepie \n", + "6 pewds \n", + "7 pdp \n", + "8 pewdie \n", + "9 lwaiy \n", + "10 tseries \n", + "11 t-series \n", + "12 lwiay pewdiepie \n", + "13 marzia \n", + "14 markiplier \n", + "15 try not to laugh\n", + "Name: 61, dtype: object\n", + "ssssssssssssssssssssssssssssssssss62ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 ylyl \n", + "2 you laugh you lose\n", + "3 challenge \n", + "Name: 62, dtype: object\n", + "ssssssssssssssssssssssssssssssssss63ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 deltarune \n", + "2 delta \n", + "3 rune \n", + "4 undertale \n", + "5 undertale 2 \n", + "6 squel \n", + "7 sequel \n", + "8 prequel \n", + "9 commentary \n", + "10 gameplay \n", + "11 walkthrough \n", + "12 pacifist \n", + "13 delta rune part 1 \n", + "14 chapter 1 \n", + "15 deltarune part 1 \n", + "16 soundtrack \n", + "17 undertale delta \n", + "18 undertale delta rune\n", + "19 delta rune undertale\n", + "20 part 1 \n", + "21 chapter 1 part 1 \n", + "Name: 63, dtype: object\n", + "ssssssssssssssssssssssssssssssssss64ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 meme \n", + "2 review \n", + "3 ben \n", + "4 shapiro \n", + "5 bonus meme\n", + "6 gnome \n", + "7 obama \n", + "8 elon musk \n", + "9 pikachu \n", + "10 tik tok \n", + "11 tracer \n", + "Name: 64, dtype: object\n", + "ssssssssssssssssssssssssssssssssss65ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 I'm white \n", + "2 im white \n", + "3 im white dr phil \n", + "4 dr phil \n", + "5 im \n", + "6 white \n", + "7 dr phil black white girl \n", + "8 dr phil black girl acts white \n", + "9 dr phil black girl \n", + "10 dr phil full episodes \n", + "11 dr \n", + "12 phil \n", + "13 mom says her daughter \n", + "14 dr phil pewdiepie \n", + "15 dr phil #3 \n", + "16 dr phil 3 \n", + "17 react \n", + "18 pewds \n", + "19 pewdie \n", + "20 pewdiepie \n", + "21 pdp \n", + "22 dr phil destroys \n", + "23 dr phil memes \n", + "24 dr phil meme \n", + "25 dr phil october 2018 \n", + "26 meme \n", + "27 memes \n", + "28 im black \n", + "29 i'm black \n", + "30 im white dr phil full episode \n", + "31 im white dr phil full episodes\n", + "Name: 65, dtype: object\n", + "ssssssssssssssssssssssssssssssssss66ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 66, dtype: object\n", + "ssssssssssssssssssssssssssssssssss67ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 i need your help...\n", + "2 lwiay \n", + "3 help \n", + "4 pewdiepie \n", + "5 pewds \n", + "6 pdp \n", + "7 pewdie \n", + "Name: 67, dtype: object\n", + "ssssssssssssssssssssssssssssssssss68ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 68, dtype: object\n", + "ssssssssssssssssssssssssssssssssss69ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 apology video\n", + "2 my response \n", + "3 pewdiepie \n", + "4 logan paul \n", + "5 laura lee \n", + "6 tmartin \n", + "Name: 69, dtype: object\n", + "ssssssssssssssssssssssssssssssssss70ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 fashion\n", + "2 meme \n", + "3 review \n", + "Name: 70, dtype: object\n", + "ssssssssssssssssssssssssssssssssss71ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 birds \n", + "2 birds aren't real \n", + "3 birds aren't real youtube \n", + "4 npc meme \n", + "5 npc memes \n", + "6 pewdiepie \n", + "7 pewds \n", + "8 pewdie \n", + "9 memes \n", + "10 meme \n", + "11 meme review \n", + "12 BIRDS. AREN'T. REAL. \n", + "13 review \n", + "14 meme compilation \n", + "15 meme compilation 2018 \n", + "16 everyone we have an announcement to make\n", + "Name: 71, dtype: object\n", + "ssssssssssssssssssssssssssssssssss72ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 npc meme \n", + "2 meme \n", + "3 funny \n", + "4 compilation\n", + "5 shane \n", + "6 logan \n", + "7 logan paul \n", + "8 show \n", + "9 youtube \n", + "10 red \n", + "11 youtube red\n", + "Name: 72, dtype: object\n", + "ssssssssssssssssssssssssssssssssss73ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "1 LWIAY\n", + "Name: 73, dtype: object\n", + "ssssssssssssssssssssssssssssssssss74ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 meme review\n", + "2 spooktober \n", + "3 halloween \n", + "4 bone \n", + "5 skeleton \n", + "6 doot doot \n", + "7 sans \n", + "Name: 74, dtype: object\n", + "ssssssssssssssssssssssssssssssssss75ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 ylyl \n", + "2 you laugh you lose\n", + "3 challenge \n", + "4 moth \n", + "5 edition \n", + "6 meme \n", + "Name: 75, dtype: object\n", + "ssssssssssssssssssssssssssssssssss76ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 lwiay \n", + "2 reddit\n", + "Name: 76, dtype: object\n", + "ssssssssssssssssssssssssssssssssss77ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 tseries \n", + "2 t series \n", + "3 diss \n", + "4 track \n", + "5 pewdiepie \n", + "6 song \n", + "7 rap \n", + "8 mixtape \n", + "9 disstrack \n", + "10 diss track \n", + "11 bitch lasagna\n", + "Name: 77, dtype: object\n", + "ssssssssssssssssssssssssssssssssss78ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 𝓜𝓸𝓽𝓱 𝓜𝓮𝓶𝓮𝓼 \n", + "2 moth memes \n", + "3 moth meme \n", + "4 moth meme compilation \n", + "5 moth lamp \n", + "6 moth lamp meme compilation\n", + "7 pewdiepie meme review \n", + "8 pewdiepie \n", + "9 pewds \n", + "10 pdp \n", + "11 pewdie \n", + "12 meme review \n", + "13 memes \n", + "14 meme \n", + "15 moth \n", + "16 lamp \n", + "Name: 78, dtype: object\n", + "ssssssssssssssssssssssssssssssssss79ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 lwiay \n", + "2 pewdiepie \n", + "3 pewds \n", + "4 pewdie \n", + "5 pewdiepie vs t series \n", + "6 ANNOUNCING ME NEW WEBSITE\n", + "7 website \n", + "8 new website \n", + "9 t series \n", + "Name: 79, dtype: object\n", + "ssssssssssssssssssssssssssssssssss80ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 you laugh you lose\n", + "2 ylyl \n", + "3 try not to \n", + "4 laugh \n", + "5 challenge \n", + "Name: 80, dtype: object\n", + "ssssssssssssssssssssssssssssssssss81ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 bowsette \n", + "2 meme review\n", + "Name: 81, dtype: object\n", + "ssssssssssssssssssssssssssssssssss82ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 pew news \n", + "2 serena williams \n", + "3 t series \n", + "4 youtube \n", + "5 alternative influence\n", + "Name: 82, dtype: object\n", + "ssssssssssssssssssssssssssssssssss83ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "1 lego \n", + "2 star \n", + "3 wars \n", + "Name: 83, dtype: object\n", + "ssssssssssssssssssssssssssssssssss84ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 ylyl \n", + "2 you laugh you lose \n", + "3 YOU LAUGH YOU LOSE \n", + "4 TRY NOT TO LAUGH SUPER HARD EDITION\n", + "5 try not to laugh \n", + "6 try not to laugh challenge \n", + "7 pewdiepie \n", + "8 pewds \n", + "9 pewdie \n", + "10 pdp \n", + "Name: 84, dtype: object\n", + "ssssssssssssssssssssssssssssssssss85ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 meme \n", + "2 review\n", + "Name: 85, dtype: object\n", + "ssssssssssssssssssssssssssssssssss86ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 gucci \n", + "2 fashion\n", + "3 meme \n", + "Name: 86, dtype: object\n", + "ssssssssssssssssssssssssssssssssss87ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 meme \n", + "2 review\n", + "3 THANOS\n", + "4 CAR \n", + "Name: 87, dtype: object\n", + "ssssssssssssssssssssssssssssssssss88ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 Try Not To Laugh At Other Youtubers Try Not To Laugh Challenge\n", + "2 try not to laugh \n", + "3 try not to laugh challenge \n", + "4 try not to laugh challenge clean \n", + "5 try not to laugh challenge impossible \n", + "6 try not to laugh markiplier \n", + "7 try not to laugh jacksepticeye \n", + "8 try not to laugh pewdiepie edition \n", + "9 try not to laugh memes \n", + "10 memes \n", + "11 meme \n", + "12 funny memes \n", + "13 funny memes try not to laugh \n", + "14 ylyl \n", + "15 you laugh you lose \n", + "16 pewdiepie ylyl \n", + "17 pewdiepie \n", + "18 pewds \n", + "19 pdp \n", + "20 pewdie \n", + "21 tntl \n", + "22 laugh \n", + "23 try not to \n", + "24 markiplier \n", + "25 jacksepticeye \n", + "Name: 88, dtype: object\n", + "ssssssssssssssssssssssssssssssssss89ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 tumblr \n", + "2 tumblr in action\n", + "3 reddit \n", + "4 reddit review \n", + "Name: 89, dtype: object\n", + "ssssssssssssssssssssssssssssssssss90ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 90, dtype: object\n", + "ssssssssssssssssssssssssssssssssss91ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 YES PAPA \n", + "2 YES PAPA MEME \n", + "3 johny johny yes papa \n", + "4 johnny johnny \n", + "5 johny meme \n", + "6 baby johnny eating sugar\n", + "7 no papa no papa \n", + "8 no papa sugar \n", + "9 meme review \n", + "10 pewdiepie meme review \n", + "11 pewdiepie \n", + "12 pewds \n", + "13 pdp \n", + "14 pewdie \n", + "15 YES PAPA MEME EXPOSED \n", + "Name: 91, dtype: object\n", + "ssssssssssssssssssssssssssssssssss92ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "1 lwiay\n", + "Name: 92, dtype: object\n", + "ssssssssssssssssssssssssssssssssss93ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 ylyl \n", + "2 try not to laugh\n", + "Name: 93, dtype: object\n", + "ssssssssssssssssssssssssssssssssss94ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 episode 1 \n", + "2 gameplay \n", + "3 wlaking \n", + "4 walking dead\n", + "5 final \n", + "6 season \n", + "7 last \n", + "Name: 94, dtype: object\n", + "ssssssssssssssssssssssssssssssssss95ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 pew news\n", + "2 ksi \n", + "3 ninja \n", + "4 female \n", + "5 streamer\n", + "Name: 95, dtype: object\n", + "ssssssssssssssssssssssssssssssssss96ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "1 ylyl \n", + "2 laugh\n", + "3 lose \n", + "Name: 96, dtype: object\n", + "ssssssssssssssssssssssssssssssssss97ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 pubg \n", + "2 player unknown\n", + "3 squads \n", + "Name: 97, dtype: object\n", + "ssssssssssssssssssssssssssssssssss98ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 98, dtype: object\n", + "ssssssssssssssssssssssssssssssssss99ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 What drinking her juice ACTUALLY gives you \n", + "2 jilly juice \n", + "3 dr phil \n", + "4 dr phil 2018 \n", + "5 dr phil jilly juice \n", + "6 dr phil jilly juice reaction \n", + "7 dr phil pewdiepie \n", + "8 pewdiepie dr phil \n", + "9 pewdiepie dr phil eminem \n", + "10 15 YEAR OLD CRIES OVER NOT GETTING $231 \n", + "11 dr phil 1 \n", + "12 dr phil 15 year old \n", + "13 LOGAN PAULS SISTER WANTS TO DO YOUTUBE - Dr Phil #2\n", + "14 YOUTUBER GOES ON DR PHIL. \n", + "15 dr phil playlist \n", + "16 pewdiepie \n", + "17 pewds \n", + "18 pdp \n", + "19 pewdie \n", + "20 juice \n", + "21 comedy \n", + "22 reaction \n", + "23 entertainment \n", + "24 jilly \n", + "Name: 99, dtype: object\n", + "ssssssssssssssssssssssssssssssssss100ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 you laugh \n", + "2 you lose \n", + "3 try not to laugh challenge\n", + "4 challenge \n", + "5 try not to \n", + "Name: 100, dtype: object\n", + "ssssssssssssssssssssssssssssssssss101ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 THAT TOTALLY HAPPENED.\n", + "2 that \n", + "3 totally \n", + "4 happened \n", + "5 /r thathappened \n", + "6 thathappened \n", + "7 redit \n", + "8 reddit \n", + "9 thathappened redit \n", + "10 pewdiepie \n", + "11 reddit review \n", + "12 reddit reaction \n", + "13 reddit cringe \n", + "14 cringe \n", + "15 reddit pewdiepie \n", + "16 pewds \n", + "17 pewdie \n", + "18 pdp \n", + "19 /r \n", + "20 meme \n", + "21 memes \n", + "22 meme review \n", + "Name: 101, dtype: object\n", + "ssssssssssssssssssssssssssssssssss102ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 102, dtype: object\n", + "ssssssssssssssssssssssssssssssssss103ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 103, dtype: object\n", + "ssssssssssssssssssssssssssssssssss104ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 TRY NOT TO LAUGH / EPISODE 1 / NEW SERIES\n", + "2 ylyl \n", + "3 you laugh \n", + "4 you lose \n", + "5 you laugh you lose \n", + "6 you laugh you lose challenge \n", + "7 pewdiepie ylyl \n", + "8 pewdiepie ylyl 1 \n", + "9 try not to laugh \n", + "10 try not to laugh challenge \n", + "11 try not to laugh challenge episode 1 \n", + "12 new series \n", + "13 pewdiepie series \n", + "14 pewds \n", + "15 pewdie \n", + "16 pdp \n", + "17 try not to laugh clean \n", + "18 skrattar du \n", + "19 skrattar du förlorar du \n", + "20 TNTL \n", + "21 tntl clean \n", + "Name: 104, dtype: object\n", + "ssssssssssssssssssssssssssssssssss105ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 detroit \n", + "2 detroit become human \n", + "3 detroit become human gameplay\n", + "4 gameplay detroit become human\n", + "5 gameplay \n", + "6 walkthrough \n", + "7 playthrough \n", + "8 full \n", + "9 commentary \n", + "Name: 105, dtype: object\n", + "ssssssssssssssssssssssssssssssssss106ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 detroit \n", + "2 detroit become human \n", + "3 detroit become human gameplay\n", + "4 gameplay detroit become human\n", + "5 gameplay \n", + "6 walkthrough \n", + "7 playthrough \n", + "8 full \n", + "9 commentary \n", + "Name: 106, dtype: object\n", + "ssssssssssssssssssssssssssssssssss107ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 detroit become human \n", + "2 detroit \n", + "3 detroit become human gameplay\n", + "4 gameplay detroit become human\n", + "5 gameplay \n", + "6 walkthrough \n", + "7 playthrough \n", + "8 full \n", + "9 commentary \n", + "Name: 107, dtype: object\n", + "ssssssssssssssssssssssssssssssssss108ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 detroit \n", + "2 detroit become human \n", + "3 detroit become human gameplay\n", + "4 gameplay detroit become human\n", + "5 gameplay \n", + "6 walkthrough \n", + "7 playthrough \n", + "8 full \n", + "9 commentary \n", + "Name: 108, dtype: object\n", + "ssssssssssssssssssssssssssssssssss109ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 My fans have turned against me...\n", + "2 pewdiepie \n", + "3 pewds \n", + "4 pdp \n", + "5 pewdie \n", + "6 pewdiepie fans \n", + "7 lwiay \n", + "8 pewdiepie lwaiy \n", + "Name: 109, dtype: object\n", + "ssssssssssssssssssssssssssssssssss110ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 110, dtype: object\n", + "ssssssssssssssssssssssssssssssssss111ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 you laugh you lose\n", + "Name: 111, dtype: object\n", + "ssssssssssssssssssssssssssssssssss112ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 fouseytube \n", + "2 drake \n", + "3 drake july 2018 \n", + "4 pewdiepie \n", + "5 pewds \n", + "6 pewdie \n", + "7 pdp \n", + "8 fouseytube drake \n", + "9 dj khaled \n", + "10 djkhaled drake \n", + "11 dj khaled fouseytube\n", + "12 drake concert live \n", + "13 drake concert \n", + "14 concert \n", + "15 new drake \n", + "16 lil \n", + "17 lil rapper \n", + "18 rapper \n", + "19 lil rappers \n", + "20 6ix9ine \n", + "21 tekashi69 \n", + "22 6ix9ine pewdiepie \n", + "23 tekashi \n", + "24 drake pewdiepie \n", + "Name: 112, dtype: object\n", + "ssssssssssssssssssssssssssssssssss113ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 the dobre \n", + "2 dobre \n", + "3 dobre brothers \n", + "4 dobre twins \n", + "5 dobre brothers song \n", + "6 dobre brothers pranks\n", + "7 prank \n", + "8 pranks \n", + "9 slime \n", + "Name: 113, dtype: object\n", + "ssssssssssssssssssssssssssssssssss114ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 Tekashi 6ix9ine \n", + "2 Tekashi \n", + "3 6ix9ine \n", + "4 Tekashi 6ix9ine saved by polite cat\n", + "5 six nine \n", + "6 six \n", + "7 nine \n", + "8 69 \n", + "9 tekashi69 \n", + "10 pewdiepie \n", + "11 pewds \n", + "12 pdp \n", + "13 pewdie \n", + "14 meme review \n", + "15 6ix9ine pewdiepie \n", + "16 six nine pewdiepie \n", + "17 tekashi69 pewdiepie \n", + "18 polite cat \n", + "19 cat meme \n", + "20 cat memes \n", + "21 cats \n", + "22 cat \n", + "23 memes \n", + "24 meme compilation \n", + "Name: 114, dtype: object\n", + "ssssssssssssssssssssssssssssssssss115ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 funny memes\n", + "2 meme \n", + "3 memes \n", + "4 curb \n", + "5 compilation\n", + "Name: 115, dtype: object\n", + "ssssssssssssssssssssssssssssssssss116ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 pew news \n", + "2 youtube \n", + "3 vox media \n", + "4 elon musk \n", + "5 thai \n", + "6 hank green \n", + "7 jessica price\n", + "8 guild wars 2 \n", + "9 media \n", + "Name: 116, dtype: object\n", + "ssssssssssssssssssssssssssssssssss117ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 one hand clapping\n", + "Name: 117, dtype: object\n", + "ssssssssssssssssssssssssssssssssss118ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 IS THIS LIVE...? \n", + "2 YOU \n", + "3 CRINGE \n", + "4 LOSE \n", + "5 you cringe \n", + "6 cringe \n", + "7 you cringe you lose \n", + "8 you cringe you lose pewdiepie\n", + "9 cringe comp \n", + "10 cringe compilation \n", + "11 cringe compilation 2018 \n", + "12 cringe compilations \n", + "13 media cringe \n", + "14 news \n", + "15 news cringe \n", + "16 news cringe reaction \n", + "17 news cringe moments \n", + "18 pewdiepie \n", + "19 pewds \n", + "20 pewdie \n", + "21 pdp \n", + "22 cringe moments \n", + "23 cringe moments on tv \n", + "24 pewdiepie cringe \n", + "Name: 118, dtype: object\n", + "ssssssssssssssssssssssssssssssssss119ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 all woman\n", + "2 are \n", + "3 queen \n", + "Name: 119, dtype: object\n", + "ssssssssssssssssssssssssssssssssss120ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 meme \n", + "2 review \n", + "3 slaps hand on car\n", + "4 car salesman meme\n", + "Name: 120, dtype: object\n", + "ssssssssssssssssssssssssssssssssss121ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 WHAT THE MEDIA DOESNT TELL YOU ABOUT PEWDIEPIE\n", + "2 pewdiepie \n", + "3 pewds \n", + "4 pewdie \n", + "5 pdp \n", + "6 media \n", + "7 pewdiepie media \n", + "8 pewdiepie wsj \n", + "9 pewdiepie scandal \n", + "Name: 121, dtype: object\n", + "ssssssssssssssssssssssssssssssssss122ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 twitch \n", + "2 twitch victims \n", + "3 twitch fails \n", + "4 twitch fails 2018 \n", + "5 twitch girls comp \n", + "6 twitch girls 2018 \n", + "7 twitch gone wrong \n", + "8 twitch compilation\n", + "9 pewdie \n", + "10 pewdiepie \n", + "11 pewds \n", + "12 pdp \n", + "Name: 122, dtype: object\n", + "ssssssssssssssssssssssssssssssssss123ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 you laugh you lose \n", + "2 ylyl \n", + "3 skrattar du förlorar du\n", + "Name: 123, dtype: object\n", + "ssssssssssssssssssssssssssssssssss124ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 news \n", + "2 tana' \n", + "3 tana \n", + "4 mongeau'\n", + "5 tanacon \n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Name: 124, dtype: object\n", + "ssssssssssssssssssssssssssssssssss125ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 vlog \n", + "2 summer\n", + "3 idk \n", + "Name: 125, dtype: object\n", + "ssssssssssssssssssssssssssssssssss126ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "1 eu \n", + "2 ban \n", + "3 memes\n", + "4 not \n", + "5 cool \n", + "6 guys \n", + "Name: 126, dtype: object\n", + "ssssssssssssssssssssssssssssssssss127ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "1 joke \n", + "2 over \n", + "3 head \n", + "Name: 127, dtype: object\n", + "ssssssssssssssssssssssssssssssssss128ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 tanacon \n", + "2 tana mongeau w \n", + "3 tana mongeau \n", + "4 gaming disorder \n", + "5 gaming \n", + "6 disorder \n", + "7 gaming disorder 2018 \n", + "8 gaming disorder video\n", + "9 pewdiepie \n", + "10 pewds \n", + "11 pdp \n", + "12 pewdie \n", + "13 pew news \n", + "Name: 128, dtype: object\n", + "ssssssssssssssssssssssssssssssssss129ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 ylyl \n", + "2 skratta \n", + "3 skrattar \n", + "4 you laugh \n", + "5 you lose \n", + "6 you laugh you lose\n", + "7 try not to laugh \n", + "8 challenge \n", + "9 YOU LAUGH YOU SAD \n", + "Name: 129, dtype: object\n", + "ssssssssssssssssssssssssssssssssss130ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "1 LWIAY\n", + "Name: 130, dtype: object\n", + "ssssssssssssssssssssssssssssssssss131ssssssssssssssssssssssssssssssssss\n", + "0 meme \n", + "1 review \n", + "2 youtubes\n", + "3 favorite\n", + "4 show \n", + "Name: 131, dtype: object\n", + "ssssssssssssssssssssssssssssssssss132ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 132, dtype: object\n", + "ssssssssssssssssssssssssssssssssss133ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 MEMES COULD GET BANNED (NEEDS HELP ASAP SEND TO ALL YOUR FRIENDS AND FAMILY)\n", + "2 memes \n", + "3 meme \n", + "4 memes banned \n", + "5 memes ban \n", + "6 ban \n", + "7 pewds \n", + "8 pewdie \n", + "9 pewdiepie \n", + "10 pew news \n", + "11 news \n", + "12 pew \n", + "Name: 133, dtype: object\n", + "ssssssssssssssssssssssssssssssssss134ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 lil \n", + "2 tay \n", + "3 lil tay \n", + "4 pewdiepie\n", + "5 pewdie \n", + "6 pdp \n", + "7 pewds \n", + "Name: 134, dtype: object\n", + "ssssssssssssssssssssssssssssssssss135ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 flex seal \n", + "2 flex spray \n", + "3 flex commercial\n", + "4 tape commercial\n", + "5 commercial \n", + "Name: 135, dtype: object\n", + "ssssssssssssssssssssssssssssssssss136ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 dr phil \n", + "2 logan paul\n", + "3 psycho \n", + "4 youtuber \n", + "Name: 136, dtype: object\n", + "ssssssssssssssssssssssssssssssssss137ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 fridays \n", + "2 with \n", + "3 pewdiepie \n", + "4 fridays with pewdiepie\n", + "5 lwiay \n", + "Name: 137, dtype: object\n", + "ssssssssssssssssssssssssssssssssss138ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 YLYL \n", + "2 SKRATTA DU \n", + "3 TRY NOT TO LAUGH\n", + "4 CHALLENGE \n", + "Name: 138, dtype: object\n", + "ssssssssssssssssssssssssssssssssss139ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 pewdiepie \n", + "2 react \n", + "3 world \n", + "4 dr phil \n", + "5 spoiled \n", + "6 brat \n", + "7 beverly hills\n", + "8 girl \n", + "9 15 \n", + "Name: 139, dtype: object\n", + "ssssssssssssssssssssssssssssssssss140ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 pew news \n", + "2 news \n", + "3 pewdiepie\n", + "Name: 140, dtype: object\n", + "ssssssssssssssssssssssssssssssssss141ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "1 lwiay\n", + "Name: 141, dtype: object\n", + "ssssssssssssssssssssssssssssssssss142ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 baldis basics \n", + "2 baldi's basics in education and learning \n", + "3 baldi's basics in education and learning secrets \n", + "4 baldis basics gameplay \n", + "5 baldis basics game \n", + "6 baldi's basics \n", + "7 baldis classroom \n", + "8 baldis education \n", + "9 baldis education and learning \n", + "10 baldis \n", + "11 basics \n", + "12 BALDIS BASICS IS THE SPOOKIEST GAME IN THE HISTORY OF THE WORLD AND UNIVERSE\n", + "13 baldis basics scary \n", + "14 baldis basics speedrun \n", + "15 pewds \n", + "16 pewdiepie \n", + "17 pewdie \n", + "18 pdp \n", + "19 baldi pewdiepie \n", + "Name: 142, dtype: object\n", + "ssssssssssssssssssssssssssssssssss143ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 media \n", + "2 vice \n", + "3 news \n", + "4 article \n", + "5 pewdiepie\n", + "Name: 143, dtype: object\n", + "ssssssssssssssssssssssssssssssssss144ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 meme \n", + "2 review\n", + "Name: 144, dtype: object\n", + "ssssssssssssssssssssssssssssssssss145ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 humble \n", + "2 humble brag \n", + "3 bragging \n", + "4 youtuber \n", + "5 humble youtubers\n", + "6 youtubers humble\n", + "7 rich youtubers \n", + "8 rich youtube \n", + "Name: 145, dtype: object\n", + "ssssssssssssssssssssssssssssssssss146ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 lwiay \n", + "2 pewds \n", + "3 pewdie \n", + "4 pewdiepie\n", + "5 pdp \n", + "Name: 146, dtype: object\n", + "ssssssssssssssssssssssssssssssssss147ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 book \n", + "2 review \n", + "3 literature\n", + "4 club \n", + "Name: 147, dtype: object\n", + "ssssssssssssssssssssssssssssssssss148ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 fortnite\n", + "2 cringe \n", + "3 ali a \n", + "4 ninja \n", + "5 summit \n", + "Name: 148, dtype: object\n", + "ssssssssssssssssssssssssssssssssss149ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 sleep \n", + "2 challenge\n", + "3 horror \n", + "4 video \n", + "5 game \n", + "6 play \n", + "Name: 149, dtype: object\n", + "ssssssssssssssssssssssssssssssssss150ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 TESTING OUT EYETRACKING \n", + "2 eyetracker \n", + "3 eye tracking \n", + "4 eye tracker \n", + "5 tobii \n", + "6 tobii eye tracker \n", + "7 tobii eye tracking \n", + "8 tobii review \n", + "9 tobii eye tracker review\n", + "10 pewdiepie \n", + "11 pewds \n", + "12 pewdie \n", + "13 pdp \n", + "14 tracker \n", + "15 eye \n", + "16 tracking \n", + "Name: 150, dtype: object\n", + "ssssssssssssssssssssssssssssssssss151ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE \n", + "1 you laugh you lose\n", + "2 ylyl \n", + "3 india \n", + "4 indian \n", + "5 meme \n", + "6 comedy \n", + "Name: 151, dtype: object\n", + "ssssssssssssssssssssssssssssssssss152ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 152, dtype: object\n", + "ssssssssssssssssssssssssssssssssss153ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 meme review\n", + "2 savage \n", + "3 patrick \n", + "4 fortnite \n", + "5 pubg \n", + "6 meme \n", + "7 memes \n", + "8 spongebob \n", + "Name: 153, dtype: object\n", + "ssssssssssssssssssssssssssssssssss154ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 trapland\n", + "Name: 154, dtype: object\n", + "ssssssssssssssssssssssssssssssssss155ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 155, dtype: object\n", + "ssssssssssssssssssssssssssssssssss156ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 156, dtype: object\n", + "ssssssssssssssssssssssssssssssssss157ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 157, dtype: object\n", + "ssssssssssssssssssssssssssssssssss158ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pew \n", + "2 news \n", + "Name: 158, dtype: object\n", + "ssssssssssssssssssssssssssssssssss159ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 trap adventure 2\n", + "Name: 159, dtype: object\n", + "ssssssssssssssssssssssssssssssssss160ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 lwiay \n", + "Name: 160, dtype: object\n", + "ssssssssssssssssssssssssssssssssss161ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 hmmm \n", + "Name: 161, dtype: object\n", + "ssssssssssssssssssssssssssssssssss162ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 162, dtype: object\n", + "ssssssssssssssssssssssssssssssssss163ssssssssssssssssssssssssssssssssss\n", + "0 party in backyard\n", + "1 hej monika \n", + "2 monika \n", + "3 monica \n", + "4 song \n", + "5 pewdiepie \n", + "6 sing \n", + "7 singing \n", + "Name: 163, dtype: object\n", + "ssssssssssssssssssssssssssssssssss164ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 trap adventure 2 \n", + "2 rage \n", + "3 quit \n", + "4 game \n", + "5 videogame \n", + "6 trap \n", + "7 adventure \n", + "8 free download \n", + "9 link \n", + "10 trap adventure download \n", + "11 trap adventure 2 download \n", + "12 trap adventure 2 free download\n", + "Name: 164, dtype: object\n", + "ssssssssssssssssssssssssssssssssss165ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 165, dtype: object\n", + "ssssssssssssssssssssssssssssssssss166ssssssssssssssssssssssssssssssssss\n", + "0 vr chat\n", + "Name: 166, dtype: object\n", + "ssssssssssssssssssssssssssssssssss167ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 jacksfilms\n", + "2 lwiay \n", + "3 yiay \n", + "Name: 167, dtype: object\n", + "ssssssssssssssssssssssssssssssssss168ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepi\n", + "1 indian \n", + "2 meme \n", + "Name: 168, dtype: object\n", + "ssssssssssssssssssssssssssssssssss169ssssssssssssssssssssssssssssssssss\n", + "0 ylyl\n", + "Name: 169, dtype: object\n", + "ssssssssssssssssssssssssssssssssss170ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 mad lad \n", + "Name: 170, dtype: object\n", + "ssssssssssssssssssssssssssssssssss171ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "Name: 171, dtype: object\n", + "ssssssssssssssssssssssssssssssssss172ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 you laugh you lose\n", + "2 ylyl \n", + "3 laugh \n", + "4 lose \n", + "Name: 172, dtype: object\n", + "ssssssssssssssssssssssssssssssssss173ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 nice guy \n", + "2 nice guys\n", + "3 reddit \n", + "Name: 173, dtype: object\n", + "ssssssssssssssssssssssssssssssssss174ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 vine \n", + "2 instagram\n", + "Name: 174, dtype: object\n", + "ssssssssssssssssssssssssssssssssss175ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 vr \n", + "2 vr chat \n", + "3 la noire\n", + "4 vr cases\n", + "Name: 175, dtype: object\n", + "ssssssssssssssssssssssssssssssssss176ssssssssssssssssssssssssssssssssss\n", + "0 im14thisisdeep \n", + "1 im 14 this is deep\n", + "2 this is deep \n", + "3 this is so deep \n", + "Name: 176, dtype: object\n", + "ssssssssssssssssssssssssssssssssss177ssssssssssssssssssssssssssssssssss\n", + "0 rick \n", + "1 and morty \n", + "2 rick and morty\n", + "Name: 177, dtype: object\n", + "ssssssssssssssssssssssssssssssssss178ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 the impossible quiz\n", + "Name: 178, dtype: object\n", + "ssssssssssssssssssssssssssssssssss179ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 179, dtype: object\n", + "ssssssssssssssssssssssssssssssssss180ssssssssssssssssssssssssssssssssss\n", + "0 YLYL\n", + "Name: 180, dtype: object\n", + "ssssssssssssssssssssssssssssssssss181ssssssssssssssssssssssssssssssssss\n", + "0 To the moon \n", + "1 sequel \n", + "2 finding paradise\n", + "3 paradice \n", + "4 walkthrough \n", + "5 playthrough \n", + "6 lets play \n", + "7 pewdiepie \n", + "8 part 1 \n", + "Name: 181, dtype: object\n", + "ssssssssssssssssssssssssssssssssss182ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 zootopia \n", + "2 doki doki literature club\n", + "3 doki doki \n", + "4 meme review \n", + "5 meme \n", + "6 death stranding \n", + "Name: 182, dtype: object\n", + "ssssssssssssssssssssssssssssssssss183ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 183, dtype: object\n", + "ssssssssssssssssssssssssssssssssss184ssssssssssssssssssssssssssssssssss\n", + "0 ylyl\n", + "Name: 184, dtype: object\n", + "ssssssssssssssssssssssssssssssssss185ssssssssssssssssssssssssssssssssss\n", + "0 doki doki \n", + "1 literature \n", + "2 club \n", + "3 litterature\n", + "4 part 1 \n", + "Name: 185, dtype: object\n", + "ssssssssssssssssssssssssssssssssss186ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 186, dtype: object\n", + "ssssssssssssssssssssssssssssssssss187ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 187, dtype: object\n", + "ssssssssssssssssssssssssssssssssss188ssssssssssssssssssssssssssssssssss\n", + "0 getting over it \n", + "1 walkthrough \n", + "2 playthrough \n", + "3 get over it \n", + "4 hiking \n", + "5 hammer \n", + "6 climb \n", + "7 climb game \n", + "8 clop \n", + "9 qwop \n", + "10 funny game \n", + "11 getting over it part 1\n", + "12 tutorial \n", + "13 full \n", + "Name: 188, dtype: object\n", + "ssssssssssssssssssssssssssssssssss189ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 189, dtype: object\n", + "ssssssssssssssssssssssssssssssssss190ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 190, dtype: object\n", + "ssssssssssssssssssssssssssssssssss191ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 191, dtype: object\n", + "ssssssssssssssssssssssssssssssssss192ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 192, dtype: object\n", + "ssssssssssssssssssssssssssssssssss193ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 jacksepticeye\n", + "2 whiskey \n", + "3 irish \n", + "4 review \n", + "Name: 193, dtype: object\n", + "ssssssssssssssssssssssssssssssssss194ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 194, dtype: object\n", + "ssssssssssssssssssssssssssssssssss195ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 south park \n", + "2 the fractured \n", + "3 but whole \n", + "4 south park game\n", + "5 sequel \n", + "6 new \n", + "7 gameplay \n", + "8 walkthrough \n", + "9 part 1 \n", + "10 full game \n", + "Name: 195, dtype: object\n", + "ssssssssssssssssssssssssssssssssss196ssssssssssssssssssssssssssssssssss\n", + "0 lwiay \n", + "1 reddit\n", + "Name: 196, dtype: object\n", + "ssssssssssssssssssssssssssssssssss197ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "Name: 197, dtype: object\n", + "ssssssssssssssssssssssssssssssssss198ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 198, dtype: object\n", + "ssssssssssssssssssssssssssssssssss199ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 pewd \n", + "4 pewdiepie cooking \n", + "5 cooking \n", + "6 how to \n", + "7 how to cook \n", + "8 how to cook meatballs \n", + "9 meatballs \n", + "10 meat balls \n", + "11 how to cook meatballs in a pan\n", + "12 how to cook meatballs in sauce\n", + "13 meatballs recipe \n", + "14 meatballs recipe tasty \n", + "15 tasty \n", + "16 recipe \n", + "17 best recipe \n", + "18 how to make \n", + "19 how to make meatballs \n", + "20 cook \n", + "21 homemade \n", + "Name: 199, dtype: object\n", + "ssssssssssssssssssssssssssssssssss200ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 200, dtype: object\n", + "ssssssssssssssssssssssssssssssssss201ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 201, dtype: object\n", + "ssssssssssssssssssssssssssssssssss202ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 202, dtype: object\n", + "ssssssssssssssssssssssssssssssssss203ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 anime \n", + "2 myanimelist \n", + "3 favourite anime\n", + "Name: 203, dtype: object\n", + "ssssssssssssssssssssssssssssssssss204ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 you \n", + "2 laugh \n", + "3 lose \n", + "4 challenge\n", + "Name: 204, dtype: object\n", + "ssssssssssssssssssssssssssssssssss205ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 205, dtype: object\n", + "ssssssssssssssssssssssssssssssssss206ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 whiskey \n", + "2 japanese\n", + "3 review \n", + "Name: 206, dtype: object\n", + "ssssssssssssssssssssssssssssssssss207ssssssssssssssssssssssssssssssssss\n", + "0 ylyl \n", + "1 you laugh you lose\n", + "2 try not to laugh \n", + "3 challenge \n", + "Name: 207, dtype: object\n", + "ssssssssssssssssssssssssssssssssss208ssssssssssssssssssssssssssssssssss\n", + "0 hardest\n", + "1 game \n", + "2 ever \n", + "Name: 208, dtype: object\n", + "ssssssssssssssssssssssssssssssssss209ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 how to \n", + "2 get started\n", + "3 youtube \n", + "4 youtuber \n", + "Name: 209, dtype: object\n", + "ssssssssssssssssssssssssssssssssss210ssssssssssssssssssssssssssssssssss\n", + "0 drawing \n", + "1 youtuber \n", + "2 youtubers\n", + "Name: 210, dtype: object\n", + "ssssssssssssssssssssssssssssssssss211ssssssssssssssssssssssssssssssssss\n", + "0 Pewdiepie\n", + "1 would \n", + "2 you \n", + "3 rather \n", + "Name: 211, dtype: object\n", + "ssssssssssssssssssssssssssssssssss212ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 stream \n", + "2 twitch \n", + "3 fail \n", + "4 fails \n", + "Name: 212, dtype: object\n", + "ssssssssssssssssssssssssssssssssss213ssssssssssssssssssssssssssssssssss\n", + "0 you \n", + "1 laugh \n", + "2 you lose\n", + "Name: 213, dtype: object\n", + "ssssssssssssssssssssssssssssssssss214ssssssssssssssssssssssssssssssssss\n", + "0 Pewdiepie\n", + "1 Jake \n", + "2 Logan \n", + "3 Paul \n", + "4 Team 10 \n", + "5 Dab \n", + "Name: 214, dtype: object\n", + "ssssssssssssssssssssssssssssssssss215ssssssssssssssssssssssssssssssssss\n", + "0 wormax.io\n", + "1 wormax \n", + "2 snake \n", + "3 game \n", + "4 online \n", + "5 free \n", + "Name: 215, dtype: object\n", + "ssssssssssssssssssssssssssssssssss216ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 respect \n", + "2 women \n", + "3 piers morgan \n", + "4 good morning britain\n", + "Name: 216, dtype: object\n", + "ssssssssssssssssssssssssssssssssss217ssssssssssssssssssssssssssssssssss\n", + "0 women \n", + "1 bbc \n", + "2 bbc 3\n", + "Name: 217, dtype: object\n", + "ssssssssssssssssssssssssssssssssss218ssssssssssssssssssssssssssssssssss\n", + "0 fridays \n", + "1 with \n", + "2 pewdiepie\n", + "Name: 218, dtype: object\n", + "ssssssssssssssssssssssssssssssssss219ssssssssssssssssssssssssssssssssss\n", + "0 5 weird \n", + "1 stuff \n", + "2 5 weird stuff online\n", + "Name: 219, dtype: object\n", + "ssssssssssssssssssssssssssssssssss220ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 katy perry\n", + "Name: 220, dtype: object\n", + "ssssssssssssssssssssssssssssssssss221ssssssssssssssssssssssssssssssssss\n", + "0 reacting \n", + "1 fridays \n", + "2 with pewdiepie \n", + "3 fridays with pewdiepie\n", + "4 react \n", + "5 fan submission \n", + "6 fan \n", + "7 fans \n", + "Name: 221, dtype: object\n", + "ssssssssssssssssssssssssssssssssss222ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 YOU LAUGH YOU'RE OUT \n", + "2 you laugh \n", + "3 you laugh you \n", + "4 you laugh you're \n", + "5 you laugh lose \n", + "6 you laugh you lose pewdiepie\n", + "7 laugh \n", + "8 lose \n", + "9 laugh lose \n", + "Name: 222, dtype: object\n", + "ssssssssssssssssssssssssssssssssss223ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 oblivion \n", + "2 elder scrolls\n", + "Name: 223, dtype: object\n", + "ssssssssssssssssssssssssssssssssss224ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 REACTING TO CRINGEY SPEED RUNS\n", + "2 cringe compilation \n", + "3 cringe compilation 2017 \n", + "4 speed runs \n", + "5 speed run \n", + "6 cringe \n", + "7 cringe reaction \n", + "8 reaction \n", + "9 cringe react \n", + "10 reacting to cringe \n", + "11 cringy reaction \n", + "12 cringey \n", + "13 reacting to cringey videos \n", + "14 cringey speed runs \n", + "15 speed \n", + "16 run \n", + "17 pewdiepie reaction \n", + "18 react \n", + "Name: 224, dtype: object\n", + "ssssssssssssssssssssssssssssssssss225ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 the rich life of pewdiepie \n", + "2 before he was famous \n", + "3 before he was famous pewdiepie \n", + "4 pewdiepie rich \n", + "5 pewdiepie net worth \n", + "6 how much money does pewdiepie make\n", + "7 how much money \n", + "8 youtube money \n", + "9 money \n", + "10 net worth \n", + "11 networth \n", + "12 rich \n", + "13 rich life \n", + "14 the rich life \n", + "Name: 225, dtype: object\n", + "ssssssssssssssssssssssssssssssssss226ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 react \n", + "2 react world\n", + "3 greenscreen\n", + "4 competition\n", + "Name: 226, dtype: object\n", + "ssssssssssssssssssssssssssssssssss227ssssssssssssssssssssssssssssssssss\n", + "0 moral \n", + "1 moral machine\n", + "Name: 227, dtype: object\n", + "ssssssssssssssssssssssssssssssssss228ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 e3 \n", + "2 react \n", + "3 react world\n", + "Name: 228, dtype: object\n", + "ssssssssssssssssssssssssssssssssss229ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 jake \n", + "2 paul \n", + "3 it's \n", + "4 everyday \n", + "5 bro \n", + "6 react \n", + "7 react world\n", + "8 fine \n", + "Name: 229, dtype: object\n", + "ssssssssssssssssssssssssssssssssss230ssssssssssssssssssssssssssssssssss\n", + "0 respect\n", + "1 women \n", + "2 react \n", + "3 meme \n", + "Name: 230, dtype: object\n", + "ssssssssssssssssssssssssssssssssss231ssssssssssssssssssssssssssssssssss\n", + "0 try not to \n", + "1 try not \n", + "2 dont laugh \n", + "3 try not to laugh\n", + "4 challenge \n", + "Name: 231, dtype: object\n", + "ssssssssssssssssssssssssssssssssss232ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 test \n", + "2 harvard \n", + "3 skin \n", + "4 race \n", + "Name: 232, dtype: object\n", + "ssssssssssssssssssssssssssssssssss233ssssssssssssssssssssssssssssssssss\n", + "0 fidget spinner \n", + "1 fidget spinner tricks\n", + "2 trick \n", + "3 fidget spinner unbox \n", + "Name: 233, dtype: object\n", + "ssssssssssssssssssssssssssssssssss234ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 angry \n", + "2 challenge\n", + "Name: 234, dtype: object\n", + "ssssssssssssssssssssssssssssssssss235ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 buzzfeed\n", + "2 drunk \n", + "3 goggle \n", + "4 goggles \n", + "Name: 235, dtype: object\n", + "ssssssssssssssssssssssssssssssssss236ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 Little Nightmares Gameplay \n", + "5 Little Nightmares Walkthrough Part 1\n", + "6 Little Nightmares Gameplay Part 1 \n", + "7 Little Nightmares Pewdiepie \n", + "8 Little Nightmares Trailer \n", + "9 Little Nightmares Full Gameplay \n", + "10 Little Nightmares PS4 \n", + "11 Little Nightmares Review \n", + "12 Little Nightmares Part 1 \n", + "13 Little Nightmares Reaction \n", + "14 Little Nightmares Scary \n", + "15 Little Nightmares Game \n", + "16 Scary Games \n", + "17 New PS4 Games \n", + "18 New Games 2017 \n", + "19 PS4 Games 2017 \n", + "20 Best Games 2017 \n", + "Name: 236, dtype: object\n", + "ssssssssssssssssssssssssssssssssss237ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 buzzfeed\n", + "Name: 237, dtype: object\n", + "ssssssssssssssssssssssssssssssssss238ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 238, dtype: object\n", + "ssssssssssssssssssssssssssssssssss239ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 barbie \n", + "5 youtube channel\n", + "6 vlogger \n", + "7 vlog \n", + "Name: 239, dtype: object\n", + "ssssssssssssssssssssssssssssssssss240ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "Name: 240, dtype: object\n", + "ssssssssssssssssssssssssssssssssss241ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 family friendly\n", + "5 frozen \n", + "6 frozen games \n", + "Name: 241, dtype: object\n", + "ssssssssssssssssssssssssssssssssss242ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 can this video\n", + "5 get \n", + "Name: 242, dtype: object\n", + "ssssssssssssssssssssssssssssssssss243ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 everything \n", + "5 game \n", + "6 everything game \n", + "7 play as anything \n", + "8 play as everything\n", + "9 play as \n", + "10 play everything \n", + "Name: 243, dtype: object\n", + "ssssssssssssssssssssssssssssssssss244ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 mass \n", + "5 effect \n", + "6 andromeda\n", + "7 video \n", + "8 game \n", + "9 ME \n", + "Name: 244, dtype: object\n", + "ssssssssssssssssssssssssssssssssss245ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "Name: 245, dtype: object\n", + "ssssssssssssssssssssssssssssssssss246ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 mind \n", + "5 blown \n", + "Name: 246, dtype: object\n", + "ssssssssssssssssssssssssssssssssss247ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 how dirty \n", + "5 is your mind\n", + "6 dirty mind \n", + "7 photos \n", + "8 funny \n", + "Name: 247, dtype: object\n", + "ssssssssssssssssssssssssssssssssss248ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 try not \n", + "5 to \n", + "6 laugh \n", + "7 try not to laugh\n", + "8 dont laugh \n", + "9 challenge \n", + "Name: 248, dtype: object\n", + "ssssssssssssssssssssssssssssssssss249ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 249, dtype: object\n", + "ssssssssssssssssssssssssssssssssss250ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 smash \n", + "5 or \n", + "6 pass \n", + "Name: 250, dtype: object\n", + "ssssssssssssssssssssssssssssssssss251ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 before he was famous\n", + "5 famous \n", + "6 young \n", + "7 young pewdiepie \n", + "Name: 251, dtype: object\n", + "ssssssssssssssssssssssssssssssssss252ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 try not \n", + "5 to get \n", + "6 try not to get \n", + "7 scared \n", + "8 challenge \n", + "9 scared challenge \n", + "10 try not to get scared challenge\n", + "Name: 252, dtype: object\n", + "ssssssssssssssssssssssssssssssssss253ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 253, dtype: object\n", + "ssssssssssssssssssssssssssssssssss254ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 re7 \n", + "5 RESIDENT eVIL 7 \n", + "6 GAMEPLAY \n", + "7 Resident Evil 7: Biohazard\n", + "8 BIOHAZARD \n", + "9 rewind \n", + "10 biohazard \n", + "11 survival horror \n", + "12 ps4 \n", + "13 playstation 4 \n", + "14 vr \n", + "15 demo \n", + "Name: 254, dtype: object\n", + "ssssssssssssssssssssssssssssssssss255ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 double gal\n", + "5 gun \n", + "6 double \n", + "7 gal \n", + "8 girl \n", + "9 anime \n", + "10 animes \n", + "Name: 255, dtype: object\n", + "ssssssssssssssssssssssssssssssssss256ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 cringe \n", + "5 try not \n", + "6 challenge \n", + "7 try not to\n", + "8 handshake \n", + "9 handshakes\n", + "Name: 256, dtype: object\n", + "ssssssssssssssssssssssssssssssssss257ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "Name: 257, dtype: object\n", + "ssssssssssssssssssssssssssssssssss258ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 beat \n", + "5 subscribers\n", + "6 most \n", + "Name: 258, dtype: object\n", + "ssssssssssssssssssssssssssssssssss259ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "Name: 259, dtype: object\n", + "ssssssssssssssssssssssssssssssssss260ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 to be continued\n", + "5 meme \n", + "6 compilation \n", + "7 continue \n", + "8 jojo \n", + "9 jojos \n", + "10 bizarre \n", + "11 adventure \n", + "Name: 260, dtype: object\n", + "ssssssssssssssssssssssssssssssssss261ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 how long can you watch\n", + "5 how long \n", + "6 watch \n", + "7 challenge \n", + "8 watching \n", + "9 time \n", + "Name: 261, dtype: object\n", + "ssssssssssssssssssssssssssssssssss262ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 the walking dead \n", + "5 walking dead \n", + "6 part 1 \n", + "7 season 3 \n", + "8 telltale \n", + "9 game \n", + "10 the walking dead seasons 3\n", + "11 walking dead full game \n", + "Name: 262, dtype: object\n", + "ssssssssssssssssssssssssssssssssss263ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "Name: 263, dtype: object\n", + "ssssssssssssssssssssssssssssssssss264ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 who is \n", + "5 more likely \n", + "6 markiplier \n", + "7 jacksepticeye \n", + "8 who is more likely\n", + "9 most likely \n", + "Name: 264, dtype: object\n", + "ssssssssssssssssssssssssssssssssss265ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 Bottleflip \n", + "5 Challenge \n", + "6 Bottle \n", + "7 Dab \n", + "8 Meme \n", + "9 Jacksepticeye\n", + "Name: 265, dtype: object\n", + "ssssssssssssssssssssssssssssssssss266ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 hot \n", + "5 sauce \n", + "6 lootcrate\n", + "Name: 266, dtype: object\n", + "ssssssssssssssssssssssssssssssssss267ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 deleting\n", + "5 channel \n", + "Name: 267, dtype: object\n", + "ssssssssssssssssssssssssssssssssss268ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 vlog \n", + "5 birdabo \n", + "Name: 268, dtype: object\n", + "ssssssssssssssssssssssssssssssssss269ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 Tuber Simulator \n", + "5 Tuber \n", + "6 Simulator \n", + "7 Pewdiepie Simulator \n", + "8 Pewdiepie Game \n", + "9 Youtube Game \n", + "10 IOS \n", + "11 Android \n", + "12 Youtuber Simulator \n", + "13 Competition \n", + "14 Fridays \n", + "15 Fridays with Pewdiepie\n", + "Name: 269, dtype: object\n", + "ssssssssssssssssssssssssssssssssss270ssssssssssssssssssssssssssssssssss\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 Vlog \n", + "5 Jacksepticeye\n", + "6 Slippy \n", + "7 Holiday \n", + "8 video \n", + "9 log \n", + "10 kickthepj \n", + "Name: 270, dtype: object\n", + "ssssssssssssssssssssssssssssssssss271ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 pewdie \n", + "4 my \n", + "5 favourite\n", + "6 videos \n", + "7 ever \n", + "Name: 271, dtype: object\n", + "ssssssssssssssssssssssssssssssssss272ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pewds \n", + "2 pdp \n", + "3 meme \n", + "4 react \n", + "5 spicy \n", + "6 dank \n", + "Name: 272, dtype: object\n", + "ssssssssssssssssssssssssssssssssss273ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 cringy \n", + "4 cringe \n", + "5 cringe kid \n", + "6 cringe compilation\n", + "7 cringe react \n", + "8 react \n", + "Name: 273, dtype: object\n", + "ssssssssssssssssssssssssssssssssss274ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 Happy Wheels \n", + "4 Happy Wheels 3D\n", + "5 Guts and Glory \n", + "6 Let's Play \n", + "7 Download \n", + "8 Alpha \n", + "9 Gameplay \n", + "10 Montage \n", + "Name: 274, dtype: object\n", + "ssssssssssssssssssssssssssssssssss275ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 kick the pj\n", + "4 pj \n", + "5 google \n", + "6 google feud\n", + "Name: 275, dtype: object\n", + "ssssssssssssssssssssssssssssssssss276ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 lootcrate \n", + "4 5 weird stuff online\n", + "5 vlog \n", + "6 unboxing \n", + "Name: 276, dtype: object\n", + "ssssssssssssssssssssssssssssssssss277ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 Marzia \n", + "4 Gangbeasts \n", + "5 Multiplayer \n", + "6 gang beasts \n", + "7 gan \n", + "8 gang \n", + "9 beasts \n", + "10 funny multiplayer\n", + "11 funny \n", + "12 multiplayer \n", + "13 2 player \n", + "14 coop \n", + "Name: 277, dtype: object\n", + "ssssssssssssssssssssssssssssssssss278ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 Welcome to the game \n", + "4 Steam \n", + "5 deep web \n", + "6 illegal \n", + "7 hackers \n", + "8 hacking \n", + "9 hack \n", + "10 Welcome to the game red room \n", + "11 welcome to the game all codes\n", + "12 Hacking Game \n", + "Name: 278, dtype: object\n", + "ssssssssssssssssssssssssssssssssss279ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie\n", + "1 pewds \n", + "2 pdp \n", + "Name: 279, dtype: object\n", + "ssssssssssssssssssssssssssssssssss280ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 react \n", + "4 subscriber \n", + "5 special \n", + "6 montage \n", + "7 old pewdiepie\n", + "8 new pewdiepie\n", + "9 vlog \n", + "10 fridays \n", + "Name: 280, dtype: object\n", + "ssssssssssssssssssssssssssssssssss281ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 vlog \n", + "4 kicked out\n", + "5 moving \n", + "6 house \n", + "7 landlord \n", + "Name: 281, dtype: object\n", + "ssssssssssssssssssssssssssssssssss282ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 diamond \n", + "4 play button\n", + "5 playbutton \n", + "6 youtube \n", + "7 unboxing \n", + "Name: 282, dtype: object\n", + "ssssssssssssssssssssssssssssssssss283ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 283, dtype: object\n", + "ssssssssssssssssssssssssssssssssss284ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 uncharted 4 \n", + "4 uncharted \n", + "5 gameplay \n", + "6 uncharted 4 gameplay \n", + "7 uncharted 4 walkthrough part 1\n", + "8 through \n", + "9 play \n", + "10 walk \n", + "11 let's play \n", + "12 uncharted 4 trailer \n", + "13 gameplay walkthrough \n", + "14 a theif's end \n", + "15 review \n", + "16 multiplayer \n", + "Name: 284, dtype: object\n", + "ssssssssssssssssssssssssssssssssss285ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 sophie's curse\n", + "4 sohpie \n", + "5 curse \n", + "6 steam \n", + "7 horror \n", + "8 jumpscare \n", + "9 let's play \n", + "Name: 285, dtype: object\n", + "ssssssssssssssssssssssssssssssssss286ssssssssssssssssssssssssssssssssss\n", + "0 PewDiePie \n", + "1 YouTube Red \n", + "2 YouTube Red Original Series\n", + "3 horror games \n", + "4 horror video games \n", + "5 video games \n", + "6 pranks \n", + "7 YouTube Red membership \n", + "8 YouTube Red subscription \n", + "Name: 286, dtype: object\n", + "ssssssssssssssssssssssssssssssssss287ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 video games \n", + "4 dark souls 3 \n", + "5 dark souls 3 gameplay\n", + "6 gameplay \n", + "7 lets play \n", + "8 lets \n", + "9 play \n", + "10 commentary \n", + "11 dark souls \n", + "12 part 2 \n", + "13 game \n", + "14 walk \n", + "15 through \n", + "16 walkthrough \n", + "17 playthrough \n", + "Name: 287, dtype: object\n", + "ssssssssssssssssssssssssssssssssss288ssssssssssssssssssssssssssssssssss\n", + "0 PewDiePie \n", + "1 YouTube Red \n", + "2 YouTube Red Original Series\n", + "3 horror games \n", + "4 horror video games \n", + "5 video games \n", + "6 pranks \n", + "7 YouTube Red membership \n", + "8 YouTube Red subscription \n", + "Name: 288, dtype: object\n", + "ssssssssssssssssssssssssssssssssss289ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 video games\n", + "4 60 seconds \n", + "5 60 \n", + "6 seconds \n", + "7 steam \n", + "8 lets play \n", + "Name: 289, dtype: object\n", + "ssssssssssssssssssssssssssssssssss290ssssssssssssssssssssssssssssssssss\n", + "0 PewDiePie \n", + "1 YouTube Red \n", + "2 YouTube Red Original Series\n", + "3 horror games \n", + "4 horror video games \n", + "5 video games \n", + "6 pranks \n", + "7 YouTube Red membership \n", + "8 YouTube Red subscription \n", + "Name: 290, dtype: object\n", + "ssssssssssssssssssssssssssssssssss291ssssssssssssssssssssssssssssssssss\n", + "0 PewDiePie \n", + "1 YouTube Red \n", + "2 YouTube Red Original Series\n", + "3 horror games \n", + "4 horror video games \n", + "5 video games \n", + "6 pranks \n", + "7 YouTube Red membership \n", + "8 YouTube Red subscription \n", + "Name: 291, dtype: object\n", + "ssssssssssssssssssssssssssssssssss292ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewds \n", + "2 pdp \n", + "3 video games \n", + "4 pewdiepie iq\n", + "5 iq \n", + "6 iq test \n", + "7 smart \n", + "8 how smart \n", + "Name: 292, dtype: object\n", + "ssssssssssssssssssssssssssssssssss293ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 293, dtype: object\n", + "ssssssssssssssssssssssssssssssssss294ssssssssssssssssssssssssssssssssss\n", + "0 PewDiePie \n", + "1 YouTube Red \n", + "2 YouTube Red Original Series\n", + "3 horror games \n", + "4 horror video games \n", + "5 video games \n", + "6 pranks \n", + "7 YouTube Red membership \n", + "8 YouTube Red subscription \n", + "Name: 294, dtype: object\n", + "ssssssssssssssssssssssssssssssssss295ssssssssssssssssssssssssssssssssss\n", + "0 PewDiePie \n", + "1 YouTube Red \n", + "2 YouTube Red Original Series\n", + "3 horror games \n", + "4 horror video games \n", + "5 video games \n", + "6 pranks \n", + "7 YouTube Red membership \n", + "8 YouTube Red subscription \n", + "Name: 295, dtype: object\n", + "ssssssssssssssssssssssssssssssssss296ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 walk through\n", + "7 video games \n", + "8 lets play \n", + "Name: 296, dtype: object\n", + "ssssssssssssssssssssssssssssssssss297ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 walk through\n", + "7 video games \n", + "8 lets play \n", + "9 world chef \n", + "Name: 297, dtype: object\n", + "ssssssssssssssssssssssssssssssssss298ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 298, dtype: object\n", + "ssssssssssssssssssssssssssssssssss299ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 walk through \n", + "7 video games \n", + "8 lets play \n", + "9 mgsv \n", + "10 metal gear solid \n", + "11 the phantom pain \n", + "12 metal gear solid 5\n", + "13 intense \n", + "14 youtube gaming \n", + "15 gaming \n", + "16 gameplay \n", + "Name: 299, dtype: object\n", + "ssssssssssssssssssssssssssssssssss300ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 300, dtype: object\n", + "ssssssssssssssssssssssssssssssssss301ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 walk through\n", + "7 video games \n", + "8 lets play \n", + "Name: 301, dtype: object\n", + "ssssssssssssssssssssssssssssssssss302ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 walk through \n", + "7 video games \n", + "8 lets play \n", + "9 spookys \n", + "10 spooky's \n", + "11 house \n", + "12 of jumpscares\n", + "13 jumpscare \n", + "14 jumpscares \n", + "15 jumpscared \n", + "16 horror \n", + "17 scary \n", + "18 funny \n", + "19 reaction \n", + "Name: 302, dtype: object\n", + "ssssssssssssssssssssssssssssssssss303ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 303, dtype: object\n", + "ssssssssssssssssssssssssssssssssss304ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 304, dtype: object\n", + "ssssssssssssssssssssssssssssssssss305ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 305, dtype: object\n", + "ssssssssssssssssssssssssssssssssss306ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 play \n", + "7 walk \n", + "8 through \n", + "9 walk through\n", + "10 video games \n", + "11 vlog vlog \n", + "12 vlog \n", + "13 vlogs \n", + "Name: 306, dtype: object\n", + "ssssssssssssssssssssssssssssssssss307ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 307, dtype: object\n", + "ssssssssssssssssssssssssssssssssss308ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 play \n", + "7 walk \n", + "8 through \n", + "9 walk through \n", + "10 video games \n", + "11 The Walking Dead - Season 2 (TV Season)\n", + "12 telltale game \n", + "13 telltale games \n", + "14 walking dead \n", + "15 story \n", + "16 zombie \n", + "Name: 308, dtype: object\n", + "ssssssssssssssssssssssssssssssssss309ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 play \n", + "7 walk \n", + "8 through \n", + "9 walk through\n", + "10 video games \n", + "Name: 309, dtype: object\n", + "ssssssssssssssssssssssssssssssssss310ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 310, dtype: object\n", + "ssssssssssssssssssssssssssssssssss311ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 play \n", + "7 walk \n", + "8 through \n", + "9 walk through \n", + "10 video games \n", + "11 the imossible \n", + "12 quiz \n", + "13 question \n", + "14 questions \n", + "15 funny \n", + "16 reaction \n", + "17 the impossible quiz\n", + "18 all answers \n", + "19 answers \n", + "20 cheat \n", + "Name: 311, dtype: object\n", + "ssssssssssssssssssssssssssssssssss312ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 play \n", + "7 walk \n", + "8 through \n", + "9 walk through \n", + "10 video games \n", + "11 the wolf among us trailer\n", + "12 telltale \n", + "13 wolf among us \n", + "14 Gameplay \n", + "15 Ps3 \n", + "16 review \n", + "17 telltale games \n", + "18 part 1 \n", + "19 Xbox \n", + "20 the wolf among us \n", + "21 among \n", + "22 snowwhite \n", + "23 snow white \n", + "24 fairytale \n", + "Name: 312, dtype: object\n", + "ssssssssssssssssssssssssssssssssss313ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 play \n", + "7 walk \n", + "8 through \n", + "9 walk through \n", + "10 video games \n", + "11 linger \n", + "12 oculus \n", + "13 rift \n", + "14 reaction \n", + "15 oculus rift \n", + "16 vr \n", + "17 virtual reality\n", + "Name: 313, dtype: object\n", + "ssssssssssssssssssssssssssssssssss314ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 play \n", + "7 walk \n", + "8 through \n", + "9 walk through\n", + "10 video games \n", + "Name: 314, dtype: object\n", + "ssssssssssssssssssssssssssssssssss315ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 pewds \n", + "3 let's play \n", + "4 playthrough \n", + "5 walkthrough \n", + "6 play \n", + "7 walk \n", + "8 through \n", + "9 walk through\n", + "10 video games \n", + "Name: 315, dtype: object\n", + "ssssssssssssssssssssssssssssssssss316ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 316, dtype: object\n", + "ssssssssssssssssssssssssssssssssss317ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 317, dtype: object\n", + "ssssssssssssssssssssssssssssssssss318ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 lets \n", + "3 play \n", + "4 let´s play \n", + "5 horror \n", + "6 game \n", + "7 walkthrough\n", + "8 playthrough\n", + "9 letsplay \n", + "10 mod \n", + "11 gameplay \n", + "12 trailer \n", + "13 commentary \n", + "14 funny \n", + "Name: 318, dtype: object\n", + "ssssssssssssssssssssssssssssssssss319ssssssssssssssssssssssssssssssssss\n", + "0 Sequence\n", + "1 01 \n", + "2 19 \n", + "Name: 319, dtype: object\n", + "ssssssssssssssssssssssssssssssssss320ssssssssssssssssssssssssssssssssss\n", + "0 pewdiepie \n", + "1 pewdie \n", + "2 lets \n", + "3 play \n", + "4 let´s play \n", + "5 horror \n", + "6 game \n", + "7 walkthrough\n", + "8 playthrough\n", + "9 letsplay \n", + "10 mod \n", + "11 gameplay \n", + "12 trailer \n", + "13 commentary \n", + "14 funny \n", + "Name: 320, dtype: object\n", + "ssssssssssssssssssssssssssssssssss321ssssssssssssssssssssssssssssssssss\n", + "0 condemned \n", + "1 part \n", + "2 condmned \n", + "3 parrt \n", + "4 condomned \n", + "5 pewdiepie \n", + "6 lets \n", + "7 play \n", + "8 let's play\n", + "9 video \n", + "10 games \n", + "11 horror \n", + "12 xbox \n", + "13 ps3 \n", + "14 hd \n", + "15 pewdie \n", + "16 scary \n", + "17 game \n", + "18 scary game\n", + "19 gameplay \n", + "20 ending \n", + "21 secret \n", + "22 jumpscare \n", + "23 pop \n", + "24 pewds \n", + "Name: 321, dtype: object\n", + "ssssssssssssssssssssssssssssssssss322ssssssssssssssssssssssssssssssssss\n", + "0 Amnesiaaa \n", + "1 followed \n", + "2 by \n", + "3 death \n", + "4 ch2 \n", + "5 part \n", + "6 amnesia \n", + "7 the \n", + "8 dark \n", + "9 descent \n", + "10 pewdiepie\n", + "11 pewdie \n", + "12 custom \n", + "13 Ghosts \n", + "14 Tape \n", + "15 Pewdiepie\n", + "16 screaming\n", + "17 scream \n", + "18 girly \n", + "19 girl \n", + "20 horror \n", + "21 Scared \n", + "22 Creepy \n", + "23 Funny \n", + "24 chapter \n", + "Name: 322, dtype: object\n", + "ssssssssssssssssssssssssssssssssss323ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 323, dtype: object\n", + "ssssssssssssssssssssssssssssssssss324ssssssssssssssssssssssssssssssssss\n", + "0 callllllllll \n", + "1 28 \n", + "2 calling \n", + "3 27 \n", + "4 part \n", + "5 26 \n", + "6 clallinin \n", + "7 penis \n", + "8 the \n", + "9 lets \n", + "10 play \n", + "11 walkthrough \n", + "12 playthrough \n", + "13 through \n", + "14 wii \n", + "15 gameplay \n", + "16 Suzutani \n", + "17 suzutani \n", + "18 The \n", + "19 Possession \n", + "20 possession \n", + "21 ghosts \n", + "22 yt:quality=high\n", + "23 pewdiepie \n", + "24 funny \n", + "25 scary \n", + "26 wierd \n", + "Name: 324, dtype: object\n", + "ssssssssssssssssssssssssssssssssss325ssssssssssssssssssssssssssssssssss\n", + "0 calling \n", + "1 27 \n", + "2 part \n", + "3 26 \n", + "4 clallinin \n", + "5 penis \n", + "6 the \n", + "7 lets \n", + "8 play \n", + "9 walkthrough \n", + "10 playthrough \n", + "11 through \n", + "12 wii \n", + "13 gameplay \n", + "14 Suzutani \n", + "15 suzutani \n", + "16 The \n", + "17 Possession \n", + "18 possession \n", + "19 ghosts \n", + "20 yt:quality=high\n", + "21 pewdiepie \n", + "22 funny \n", + "23 scary \n", + "24 wierd \n", + "Name: 325, dtype: object\n", + "ssssssssssssssssssssssssssssssssss326ssssssssssssssssssssssssssssssssss\n", + "0 part \n", + "1 26 \n", + "2 clallinin \n", + "3 penis \n", + "4 calling \n", + "5 the \n", + "6 lets \n", + "7 play \n", + "8 walkthrough \n", + "9 playthrough \n", + "10 through \n", + "11 wii \n", + "12 gameplay \n", + "13 Suzutani \n", + "14 suzutani \n", + "15 The \n", + "16 Possession \n", + "17 possession \n", + "18 ghosts \n", + "19 yt:quality=high\n", + "20 pewdiepie \n", + "21 funny \n", + "22 scary \n", + "23 wierd \n", + "Name: 326, dtype: object\n", + "ssssssssssssssssssssssssssssssssss327ssssssssssssssssssssssssssssssssss\n", + "0 calling \n", + "1 the \n", + "2 lets \n", + "3 play \n", + "4 walkthrough \n", + "5 playthrough \n", + "6 through \n", + "7 wii \n", + "8 gameplay \n", + "9 Suzutani \n", + "10 suzutani \n", + "11 The \n", + "12 Possession \n", + "13 possession \n", + "14 ghosts \n", + "15 yt:quality=high\n", + "16 pewdiepie \n", + "17 funny \n", + "18 scary \n", + "19 wierd \n", + "Name: 327, dtype: object\n", + "ssssssssssssssssssssssssssssssssss328ssssssssssssssssssssssssssssssssss\n", + "0 the \n", + "1 attic \n", + "2 part \n", + "3 The \n", + "4 lets \n", + "5 play \n", + "6 playthrough \n", + "7 pewdiepie \n", + "8 chapter \n", + "9 scary \n", + "10 pewdie \n", + "11 walkthrough \n", + "12 horror \n", + "13 scared \n", + "14 screaming \n", + "15 scream \n", + "16 Funny \n", + "17 Horror Fiction\n", + "18 Maze \n", + "19 Game \n", + "20 Weird \n", + "21 Creepy \n", + "22 Open \n", + "23 Scare \n", + "24 Next \n", + "25 Strange \n", + "26 Prank \n", + "27 Story \n", + "28 Outside \n", + "29 Scary Maze \n", + "30 Rat \n", + "31 Scaring \n", + "Name: 328, dtype: object\n", + "ssssssssssssssssssssssssssssssssss329ssssssssssssssssssssssssssssssssss\n", + "0 Sequence \n", + "1 01 \n", + "2 aom \n", + "3 Afraid \n", + "4 Of \n", + "5 Monsters \n", + "6 director's \n", + "7 cut \n", + "8 ending \n", + "9 all endings\n", + "10 soundtrack \n", + "11 creepy \n", + "12 half \n", + "13 life \n", + "14 mod \n", + "15 sweden \n", + "16 pewdiepie \n", + "17 pewdie \n", + "18 scary \n", + "19 Scream \n", + "20 Game \n", + "21 Scared \n", + "22 Maze \n", + "23 Weird \n", + "24 Screaming \n", + "25 Strange \n", + "26 Funny \n", + "27 Prank \n", + "28 Scary Maze \n", + "29 Scaring \n", + "Name: 329, dtype: object\n", + "ssssssssssssssssssssssssssssssssss330ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 330, dtype: object\n", + "ssssssssssssssssssssssssssssssssss331ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 331, dtype: object\n", + "ssssssssssssssssssssssssssssssssss332ssssssssssssssssssssssssssssssssss\n", + "0 octodad \n", + "1 Octodad \n", + "2 Official \n", + "3 Trailer \n", + "4 octodad ending \n", + "5 octodad trailer \n", + "6 walkthrough \n", + "7 playthrough \n", + "8 lets \n", + "9 play \n", + "10 let's \n", + "11 pewdiepie \n", + "12 funny \n", + "13 wierd \n", + "14 indie \n", + "15 Trailer (promotion)\n", + "16 Game \n", + "17 Weird \n", + "18 Gameplay \n", + "19 Playthrough Part \n", + "20 Humour \n", + "21 Play (theatre) \n", + "22 Crazy \n", + "23 Random \n", + "24 Silly \n", + "25 Mission \n", + "Name: 332, dtype: object\n", + "ssssssssssssssssssssssssssssssssss333ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 333, dtype: object\n", + "ssssssssssssssssssssssssssssssssss334ssssssssssssssssssssssssssssssssss\n", + "0 octodad \n", + "1 Octodad \n", + "2 Official \n", + "3 Trailer \n", + "4 octodad ending \n", + "5 octodad trailer \n", + "6 walkthrough \n", + "7 playthrough \n", + "8 lets \n", + "9 play \n", + "10 let's \n", + "11 pewdiepie \n", + "12 funny \n", + "13 wierd \n", + "14 indie \n", + "15 Trailer (promotion)\n", + "16 Game \n", + "17 Weird \n", + "18 Gameplay \n", + "19 Playthrough Part \n", + "Name: 334, dtype: object\n", + "ssssssssssssssssssssssssssssssssss335ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 pewdiepie \n", + "4 walkthrough \n", + "5 naked \n", + "6 scared \n", + "7 playthrough \n", + "8 amnesia \n", + "9 the \n", + "10 dark \n", + "11 descent \n", + "12 custom \n", + "13 story \n", + "14 mod \n", + "15 100% \n", + "16 scary \n", + "17 Scary and Funny Moments\n", + "18 scariest \n", + "19 moment \n", + "20 funny \n", + "21 Black \n", + "22 Plauge \n", + "23 Requiem \n", + "24 Frictional \n", + "25 how \n", + "26 to \n", + "27 Top \n", + "28 Scary \n", + "29 Moments \n", + "30 /W \n", + "31 PewDiePie \n", + "32 countdown \n", + "33 library of alexandria \n", + "34 part \n", + "35 Stephanos House \n", + "36 Stephano \n", + "37 piggeh \n", + "38 bro \n", + "39 Funny \n", + "40 Best \n", + "41 Let's \n", + "42 Game \n", + "43 Weird \n", + "44 Part 2 \n", + "Name: 335, dtype: object\n", + "ssssssssssssssssssssssssssssssssss336ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 336, dtype: object\n", + "ssssssssssssssssssssssssssssssssss337ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 337, dtype: object\n", + "ssssssssssssssssssssssssssssssssss338ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 pewdiepie \n", + "4 walkthrough\n", + "5 naked \n", + "6 scared \n", + "7 playthrough\n", + "8 amnesia \n", + "9 the \n", + "10 dark \n", + "11 descent \n", + "12 custom \n", + "13 story \n", + "14 mod \n", + "15 100% \n", + "16 scary \n", + "17 cannibalism\n", + "18 funny \n", + "19 moments \n", + "20 moment \n", + "21 top \n", + "22 pewdie \n", + "23 monster \n", + "24 trailer \n", + "25 100 \n", + "26 part 2 \n", + "27 episode 2 \n", + "28 Amnesia \n", + "29 nightmare \n", + "Name: 338, dtype: object\n", + "ssssssssssssssssssssssssssssssssss339ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 pewdiepie \n", + "4 walkthrough \n", + "5 magicka \n", + "6 playthrough \n", + "7 through \n", + "8 xebaz \n", + "9 tsubasahara \n", + "10 part 2 \n", + "11 magic wizards and shit\n", + "12 game \n", + "13 gameplay \n", + "14 playthrough part \n", + "15 mission \n", + "16 kevin \n", + "17 video game \n", + "18 Orlando Magic \n", + "19 Magic Johnson \n", + "20 playstation \n", + "21 trick \n", + "22 ps2 \n", + "23 xbox \n", + "24 card \n", + "25 tricks \n", + "26 ARMA 2 \n", + "27 john \n", + "28 revealed \n", + "29 david \n", + "30 criss \n", + "31 PlayStation 3 \n", + "32 Xbox \n", + "Name: 339, dtype: object\n", + "ssssssssssssssssssssssssssssssssss340ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 pewdiepie \n", + "4 walkthrough\n", + "5 naked \n", + "6 scared \n", + "7 playthrough\n", + "8 amnesia \n", + "9 the \n", + "10 dark \n", + "11 descent \n", + "12 custom \n", + "13 story \n", + "14 mod \n", + "15 100% \n", + "16 scary \n", + "17 cannibalism\n", + "18 funny \n", + "19 moments \n", + "20 moment \n", + "21 top \n", + "22 pewdie \n", + "23 monster \n", + "24 trailer \n", + "25 100 \n", + "26 part 5 \n", + "27 episode 5 \n", + "28 Through \n", + "29 portal \n", + "30 secret room\n", + "31 trollface \n", + "32 problem \n", + "Name: 340, dtype: object\n", + "ssssssssssssssssssssssssssssssssss341ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 pewdiepie \n", + "4 walkthrough\n", + "5 naked \n", + "6 scared \n", + "7 playthrough\n", + "8 amnesia \n", + "9 the \n", + "10 dark \n", + "11 descent \n", + "12 custom \n", + "13 story \n", + "14 mod \n", + "15 100% \n", + "16 scary \n", + "17 cannibalism\n", + "18 funny \n", + "19 moments \n", + "20 moment \n", + "21 top \n", + "22 pewdie \n", + "23 monster \n", + "24 trailer \n", + "25 100 \n", + "26 part 3 \n", + "27 episode 3 \n", + "28 Through \n", + "29 portal \n", + "30 game \n", + "31 level \n", + "32 let's \n", + "33 let's play \n", + "34 gameplay \n", + "35 techno \n", + "36 kevin \n", + "37 games \n", + "Name: 341, dtype: object\n", + "ssssssssssssssssssssssssssssssssss342ssssssssssssssssssssssssssssssssss\n", + "0 dead \n", + "1 island \n", + "2 Dead island gameplay \n", + "3 co-op \n", + "4 coop \n", + "5 lets \n", + "6 play \n", + "7 let \n", + "8 playthrough \n", + "9 walkthrough \n", + "10 dead island lets play \n", + "11 dead island playthrough\n", + "12 ending \n", + "13 zombie \n", + "14 zombies \n", + "15 survival \n", + "16 horror \n", + "17 pegi \n", + "18 uk \n", + "19 violence \n", + "20 violent \n", + "21 open \n", + "22 world \n", + "23 sandbox \n", + "24 Zombie \n", + "25 Horror \n", + "26 Banoi \n", + "27 Undead \n", + "28 PC \n", + "29 Xbox \n", + "30 360 \n", + "31 Playstation \n", + "32 PS3 \n", + "33 Deep \n", + "34 Silver \n", + "35 Techland \n", + "36 2011 \n", + "37 yt:quality=high \n", + "38 HD \n", + "39 720 \n", + "40 1080 \n", + "41 pewdiepie \n", + "42 morfar \n", + "43 cam \n", + "44 camera \n", + "45 pre \n", + "46 order \n", + "47 weapon \n", + "Name: 342, dtype: object\n", + "ssssssssssssssssssssssssssssssssss343ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 pewdiepie \n", + "4 walkthrough \n", + "5 naked \n", + "6 scared \n", + "7 playthrough \n", + "8 Fatal \n", + "9 Frame \n", + "10 Lets \n", + "11 blind \n", + "12 fatal \n", + "13 frame \n", + "14 II \n", + "15 pewdie \n", + "16 ending \n", + "17 part 1 \n", + "18 Fatal Frame Playthrough part 1\n", + "19 episode \n", + "20 let's \n", + "21 let's play \n", + "22 crimson \n", + "23 butterfly \n", + "24 scary \n", + "25 game \n", + "26 vampire \n", + "27 funny \n", + "28 gameplay \n", + "29 zero \n", + "30 playthrough part \n", + "31 mission \n", + "32 scream \n", + "33 anime \n", + "34 video \n", + "Name: 343, dtype: object\n", + "ssssssssssssssssssssssssssssssssss344ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 pewdiepie \n", + "4 walkthrough \n", + "5 naked \n", + "6 scared \n", + "7 playthrough \n", + "8 Fatal \n", + "9 Frame \n", + "10 Lets \n", + "11 blind \n", + "12 fatal \n", + "13 frame \n", + "14 II \n", + "15 pewdie \n", + "16 ending \n", + "17 part 1 \n", + "18 Fatal Frame Playthrough part 1\n", + "19 episode \n", + "20 let's \n", + "21 let's play \n", + "22 crimson \n", + "23 butterfly \n", + "24 scary \n", + "25 game \n", + "26 vampire \n", + "27 funny \n", + "28 gameplay \n", + "29 zero \n", + "30 playthrough part \n", + "31 mission \n", + "32 scream \n", + "33 anime \n", + "34 video game \n", + "35 can \n", + "36 basket \n", + "37 kevin \n", + "38 playstation \n", + "39 ps2 \n", + "Name: 344, dtype: object\n", + "ssssssssssssssssssssssssssssssssss345ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 pewdiepie \n", + "4 walkthrough \n", + "5 naked \n", + "6 scared \n", + "7 playthrough \n", + "8 Fatal \n", + "9 Frame \n", + "10 Lets \n", + "11 blind \n", + "12 fatal \n", + "13 frame \n", + "14 II \n", + "15 pewdie \n", + "16 ending \n", + "17 part 1 \n", + "18 Fatal Frame Playthrough part 1\n", + "19 episode \n", + "20 let's \n", + "21 let's play \n", + "22 crimson \n", + "23 butterfly \n", + "24 scary \n", + "25 game \n", + "26 vampire \n", + "27 funny \n", + "28 gameplay \n", + "29 zero \n", + "30 playthrough part \n", + "31 mission \n", + "32 scream \n", + "33 anime \n", + "34 video game \n", + "35 playstation \n", + "36 ps2 \n", + "37 basket \n", + "38 xbox \n", + "39 ps3 \n", + "40 maze \n", + "41 games \n", + "42 weird \n", + "43 creepy \n", + "44 screaming \n", + "Name: 345, dtype: object\n", + "ssssssssssssssssssssssssssssssssss346ssssssssssssssssssssssssssssssssss\n", + "0 Tags:\n", + "Name: 346, dtype: object\n", + "ssssssssssssssssssssssssssssssssss347ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 347, dtype: object\n", + "ssssssssssssssssssssssssssssssssss348ssssssssssssssssssssssssssssssssss\n", + "0 pewdie \n", + "1 xebaz \n", + "2 playing\n", + "3 fear \n", + "Name: 348, dtype: object\n", + "ssssssssssssssssssssssssssssssssss349ssssssssssssssssssssssssssssssssss\n", + "0 pewdie \n", + "1 Xebaz \n", + "2 are \n", + "3 playing\n", + "4 fear \n", + "5 again \n", + "6 and \n", + "7 still \n", + "8 failing\n", + "9 know \n", + "10 you \n", + "11 like \n", + "12 it \n", + "Name: 349, dtype: object\n", + "ssssssssssssssssssssssssssssssssss350ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 Amnesia \n", + "4 custom \n", + "5 story \n", + "6 la \n", + "7 caza \n", + "8 playthrough\n", + "9 walkthrough\n", + "10 walk \n", + "11 through \n", + "12 scary \n", + "13 fun \n", + "14 james \n", + "15 scream \n", + "16 moment \n", + "17 game \n", + "18 scared \n", + "19 horror \n", + "20 movie \n", + "21 gameplay \n", + "22 part 5 \n", + "23 episode 5 \n", + "Name: 350, dtype: object\n", + "ssssssssssssssssssssssssssssssssss351ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 Amnesia \n", + "4 custom \n", + "5 story \n", + "6 la \n", + "7 caza \n", + "8 playthrough\n", + "9 walkthrough\n", + "10 walk \n", + "11 through \n", + "12 scary \n", + "13 fun \n", + "14 james \n", + "15 scream \n", + "16 moment \n", + "17 game \n", + "18 scared \n", + "19 horror \n", + "20 movie \n", + "21 gameplay \n", + "22 part 4 \n", + "23 episode 4 \n", + "Name: 351, dtype: object\n", + "ssssssssssssssssssssssssssssssssss352ssssssssssssssssssssssssssssssssss\n", + "0 Sequence\n", + "1 01 \n", + "2 1 \n", + "Name: 352, dtype: object\n", + "ssssssssssssssssssssssssssssssssss353ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 Amnesia \n", + "4 custom \n", + "5 story \n", + "6 la \n", + "7 caza \n", + "8 playthrough\n", + "9 walkthrough\n", + "10 walk \n", + "11 through \n", + "12 scary \n", + "13 fun \n", + "14 james \n", + "15 scream \n", + "16 moment \n", + "17 game \n", + "18 scared \n", + "19 horror \n", + "20 movie \n", + "21 gameplay \n", + "22 part 2 \n", + "23 episode 2 \n", + "Name: 353, dtype: object\n", + "ssssssssssssssssssssssssssssssssss354ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 amnesia \n", + "4 DLC \n", + "5 Justine \n", + "6 Amnesia \n", + "7 justine \n", + "8 walkthrough\n", + "9 walk \n", + "10 through \n", + "11 pewdiepie \n", + "12 naked \n", + "13 scared \n", + "14 playthrough\n", + "15 the \n", + "16 dark \n", + "17 descent \n", + "18 dlc \n", + "19 100% \n", + "20 scary \n", + "21 funny \n", + "22 moments \n", + "23 moment \n", + "24 top \n", + "25 pewdie \n", + "26 ending \n", + "27 explained \n", + "28 monster \n", + "29 trailer \n", + "30 100 \n", + "31 part 5 \n", + "32 episode 5 \n", + "33 final \n", + "34 last \n", + "35 episode \n", + "36 part \n", + "Name: 354, dtype: object\n", + "ssssssssssssssssssssssssssssssssss355ssssssssssssssssssssssssssssssssss\n", + "0 lets \n", + "1 let \n", + "2 play \n", + "3 amnesia \n", + "4 DLC \n", + "5 Justine \n", + "6 Amnesia \n", + "7 justine \n", + "8 walkthrough\n", + "9 walk \n", + "10 through \n", + "11 pewdiepie \n", + "12 naked \n", + "13 scared \n", + "14 playthrough\n", + "15 the \n", + "16 dark \n", + "17 descent \n", + "18 dlc \n", + "19 100% \n", + "20 scary \n", + "21 funny \n", + "22 moments \n", + "23 moment \n", + "24 top \n", + "25 pewdie \n", + "26 ending \n", + "27 explained \n", + "28 monster \n", + "29 trailer \n", + "30 100 \n", + "31 part 3 \n", + "32 episode 3 \n", + "Name: 355, dtype: object\n", + "ssssssssssssssssssssssssssssssssss356ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 356, dtype: object\n", + "ssssssssssssssssssssssssssssssssss357ssssssssssssssssssssssssssssssssss\n", + "0 amnesia \n", + "1 fuck \n", + "2 scariest \n", + "3 moment \n", + "4 scary \n", + "5 horrible \n", + "6 scream \n", + "7 like \n", + "8 girl \n", + "9 reaction \n", + "10 funny \n", + "11 screaming \n", + "12 dark \n", + "13 descent \n", + "14 the \n", + "15 commentary \n", + "16 gothic \n", + "17 horror \n", + "18 moments \n", + "19 playthrough\n", + "20 first \n", + "21 lets \n", + "22 play \n", + "23 guide \n", + "24 prank \n", + "25 walkthrough\n", + "26 part \n", + "27 within \n", + "28 screams \n", + "29 subbed \n", + "30 scared \n", + "31 xdddd \n", + "32 turner \n", + "33 screamed \n", + "34 till \n", + "35 straight \n", + "36 tears \n", + "37 spoiler \n", + "38 yep \n", + "39 suiting \n", + "40 laughed \n", + "41 shriek \n", + "42 wheres \n", + "43 lmfao \n", + "44 yelping \n", + "45 upload \n", + "46 toby \n", + "Name: 357, dtype: object\n", + "ssssssssssssssssssssssssssssssssss358ssssssssssssssssssssssssssssssssss\n", + "0 SATIRE\n", + "Name: 358, dtype: object\n" + ] + } + ], + "source": [ + "for column in df3.T.columns:\n", + " print('ssssssssssssssssssssssssssssssssss' + str(column) + 'ssssssssssssssssssssssssssssssssss')\n", + " print(df3.T[column].dropna())" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleViewsLikeDislikeFavoriteCommentvideoIDtagsnameurl
0YOU HAD ONE JOB! - with editor Brad15,292,299.0385,260.04,080.00.029,859.0https://www.youtube.com/watch?v=B67OBHNCopk[SATIRE, reddit, you had one job, onejob]<pandas.io.formats.style.Styler object at 0x7f782f9170b8>
1Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE5,358,149.0378,460.03,950.00.038,075.0https://www.youtube.com/watch?v=kLM_9gBZIqY[SATIRE]<pandas.io.formats.style.Styler object at 0x7f782f9170b8>
2We broke another WORLD RECORD!8,557,324.0595,577.07,899.00.053,664.0https://www.youtube.com/watch?v=d1tAfXKc7-c[SATIRE]<pandas.io.formats.style.Styler object at 0x7f782f9170b8>
3FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~3,609,152.0218,517.03,125.00.017,595.0https://www.youtube.com/watch?v=bMLdNrB5hAo[SATIRE]<pandas.io.formats.style.Styler object at 0x7f782f9170b8>
4Don't Laugh Challenge, NEW SEASON!!!!!5,888,349.0569,878.07,822.00.029,373.0https://www.youtube.com/watch?v=Zgm_iM3f_ME[SATIRE]<pandas.io.formats.style.Styler object at 0x7f782f9170b8>
\n", + "
" + ], + "text/plain": [ + " title Views \\\n", + "0 YOU HAD ONE JOB! - with editor Brad1 5,292,299.0 \n", + "1 Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE 5,358,149.0 \n", + "2 We broke another WORLD RECORD! 8,557,324.0 \n", + "3 FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~ 3,609,152.0 \n", + "4 Don't Laugh Challenge, NEW SEASON!!!!! 5,888,349.0 \n", + "\n", + " Like Dislike Favorite Comment \\\n", + "0 385,260.0 4,080.0 0.0 29,859.0 \n", + "1 378,460.0 3,950.0 0.0 38,075.0 \n", + "2 595,577.0 7,899.0 0.0 53,664.0 \n", + "3 218,517.0 3,125.0 0.0 17,595.0 \n", + "4 569,878.0 7,822.0 0.0 29,373.0 \n", + "\n", + " videoID \\\n", + "0 https://www.youtube.com/watch?v=B67OBHNCopk \n", + "1 https://www.youtube.com/watch?v=kLM_9gBZIqY \n", + "2 https://www.youtube.com/watch?v=d1tAfXKc7-c \n", + "3 https://www.youtube.com/watch?v=bMLdNrB5hAo \n", + "4 https://www.youtube.com/watch?v=Zgm_iM3f_ME \n", + "\n", + " tags \\\n", + "0 [SATIRE, reddit, you had one job, onejob] \n", + "1 [SATIRE] \n", + "2 [SATIRE] \n", + "3 [SATIRE] \n", + "4 [SATIRE] \n", + "\n", + " nameurl \n", + "0 \n", + "1 \n", + "2 \n", + "3 \n", + "4 " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def make_clickable(val):\n", + " # target _blank to open new window\n", + " return '{}'.format(val, val)\n", + "\n", + "df['nameurl'] = df.style.format({'videoID': make_clickable})\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleViewsLikeDislikeFavoriteCommentvideoIDtagsnameurl
0YOU HAD ONE JOB! - with editor Brad15,292,299.0385,260.04,080.00.029,859.0https://www.youtube.com/watch?v=B67OBHNCopk[SATIRE, reddit, you had one job, onejob]XXXXX
1Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE5,358,149.0378,460.03,950.00.038,075.0https://www.youtube.com/watch?v=kLM_9gBZIqY[SATIRE]XXXXX
2We broke another WORLD RECORD!8,557,324.0595,577.07,899.00.053,664.0https://www.youtube.com/watch?v=d1tAfXKc7-c[SATIRE]XXXXX
3FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~3,609,152.0218,517.03,125.00.017,595.0https://www.youtube.com/watch?v=bMLdNrB5hAo[SATIRE]XXXXX
4Don't Laugh Challenge, NEW SEASON!!!!!5,888,349.0569,878.07,822.00.029,373.0https://www.youtube.com/watch?v=Zgm_iM3f_ME[SATIRE]XXXXX
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from IPython.display import HTML\n", + "\n", + "df['nameurl'] = df['videoID'].apply(lambda x: 'XXXXX'.format(x))\n", + "HTML(df.head().to_html(escape=False))" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleViewsLikeDislikeFavoriteCommentvideoIDtagsnameurl
77bitch lasagna124,994,006.06,176,065.0648,864.00.0924,648.0https://www.youtube.com/watch?v=6Dh-RL__uN4[SATIRE, tseries, t series, diss, track, pewdiepie, song, rap, mixtape, disstrack, diss track, bitch lasagna]XXXXX
263THE RUBY PLAYBUTTON / YouTube 50 Mil Sub Reward Unbox61,378,839.04,311,930.0145,857.00.0609,535.0https://www.youtube.com/watch?v=7Vj5M0qKh8g[pewdiepie, pewds, pdp, pewdie]XXXXX
33YouTube Rewind 2018 but it's actually good47,979,866.07,776,590.079,097.00.0705,084.0https://www.youtube.com/watch?v=By_Cn5ixYLg[rewind 2018, youtube rewind 2018]XXXXX
309GAME BANNED FROM KIDS? - Talking Angela37,174,431.0575,115.016,369.00.064,433.0https://www.youtube.com/watch?v=pzYxlKSgxh0[pewdiepie, pewdie, pewds, let's play, playthrough, walkthrough, play, walk, through, walk through, video games]XXXXX
229JAKE PAUL36,792,100.01,832,490.0144,973.00.0269,260.0https://www.youtube.com/watch?v=TuIcBPm90aM[pewdiepie, jake, paul, it's, everyday, bro, react, react world, fine]XXXXX
267DELETING MY CHANNEL35,035,463.01,728,372.0261,139.00.0220,740.0https://www.youtube.com/watch?v=Y39LE5ZoKjw[pewdiepie, pewds, pdp, pewdie, deleting, channel]XXXXX
257SHOOTING MY 50 MILLION AWARD!30,554,862.01,110,375.0131,648.00.0106,113.0https://www.youtube.com/watch?v=Jrvfoybj98Q[pewdiepie, pewds, pdp, pewdie]XXXXX
282THE DIAMOND PLAY BUTTON!! (Part 1)29,833,868.01,254,868.043,421.00.0120,324.0https://www.youtube.com/watch?v=VY4wCi1pPkU[pewdiepie, pewds, pdp, diamond, play button, playbutton, youtube, unboxing]XXXXX
45YouTube Rewind 2018 review27,723,233.02,213,948.095,125.00.0138,585.0https://www.youtube.com/watch?v=wYT1Qq6mo4I[SATIRE, youtube, rewind, meme, yea, review]XXXXX
163PewDiePie Hej Monika Remix by Party In Backyard26,513,160.0951,974.020,537.00.0140,487.0https://www.youtube.com/watch?v=Vk8UEWHYfEg[party in backyard, hej monika, monika, monica, song, pewdiepie, sing, singing]XXXXX
311The Impossible Quiz.26,013,637.0519,621.07,816.00.039,587.0https://www.youtube.com/watch?v=rOZ0OHaPmnk[pewdiepie, pewdie, pewds, let's play, playthrough, walkthrough, play, walk, through, walk through, video games, the imossible, quiz, question, questions, funny, reaction, the impossible quiz, all answers, answers, cheat]XXXXX
219THE MOST ANNOYING SOUND IN THE WORLD!25,912,961.0867,935.022,510.00.076,637.0https://www.youtube.com/watch?v=baylWdHClNE[5 weird, stuff, 5 weird stuff online]XXXXX
264WHO'S MORE LIKELY TO...?24,147,214.0838,813.011,568.00.079,185.0https://www.youtube.com/watch?v=jA0xR2Ho9UU[pewdiepie, pewds, pdp, pewdie, who is, more likely, markiplier, jacksepticeye, who is more likely, most likely]XXXXX
265BOTTLEFLIP CHALLENGE!23,462,006.0879,230.014,349.00.075,539.0https://www.youtube.com/watch?v=lyl6ibqnyis[pewdiepie, pewds, pdp, pewdie, Bottleflip, Challenge, Bottle, Dab, Meme, Jacksepticeye]XXXXX
225THE RICH LIFE OF PEWDIEPIE20,579,289.0728,175.022,673.00.042,467.0https://www.youtube.com/watch?v=GP9egt__qeI[pewdiepie, the rich life of pewdiepie, before he was famous, before he was famous pewdiepie, pewdiepie rich, pewdiepie net worth, how much money does pewdiepie make, how much money, youtube money, money, net worth, networth, rich, rich life, the rich life]XXXXX
41Bitch Lasagna v1.219,952,287.01,758,301.069,186.00.0152,529.0https://www.youtube.com/watch?v=PX5QgITQAwk[SATIRE]XXXXX
248TRY NOT TO LAUGH CHALLENGE #09 {Important Videos Edition}16,337,867.0591,088.012,407.00.052,244.0https://www.youtube.com/watch?v=IBhgOkorEZ4[pewdiepie, pewds, pdp, pewdie, try not, to, laugh, try not to laugh, dont laugh, challenge]XXXXX
164Trap Adventure 2 - WHO MADE THIS GAME AND WHY 😡😡? ! \" 🤰😡 - #00116,329,824.0725,795.017,000.00.039,152.0https://www.youtube.com/watch?v=C1ObitoLwhM[pewdiepie, trap adventure 2, rage, quit, game, videogame, trap, adventure, free download, link, trap adventure download, trap adventure 2 download, trap adventure 2 free download]XXXXX
260DOES HE MAKE IT?16,282,855.0618,113.011,354.00.034,363.0https://www.youtube.com/watch?v=EfnDkNpXDBk[pewdiepie, pewds, pdp, pewdie, to be continued, meme, compilation, continue, jojo, jojos, bizarre, adventure]XXXXX
52Bich Lasagna V2. - Beat Saber / PART 115,950,359.01,005,972.035,975.00.069,851.0https://www.youtube.com/watch?v=2kpR0BdouNE[SATIRE, beat, saber, vr, gameplay]XXXXX
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df2 = df.sort_values(by=['Views'], ascending=False).head(20)\n", + "\n", + "HTML(df2.to_html(escape=False))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "df.sort_values(by=['Views'], ascending=False).head(5).sort_index().plot()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "df.sort_values(by=['Views'], ascending=False)[['Views','title']].head(3).sort_index().plot()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "df['title_short'] = df['title'].str[:20]" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleViewsLikeDislikeFavoriteCommentvideoIDtagsnameurltitle_short
0YOU HAD ONE JOB! - with editor Brad15,292,299.0385,260.04,080.00.029,859.0https://www.youtube.com/watch?v=B67OBHNCopk[SATIRE, reddit, you had one job, onejob]<a href=\"https://www.youtube.com/watch?v=B67OBHNCopk\">XXXXX</a>YOU HAD ONE JOB! - w
1Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE5,358,149.0378,460.03,950.00.038,075.0https://www.youtube.com/watch?v=kLM_9gBZIqY[SATIRE]<a href=\"https://www.youtube.com/watch?v=kLM_9gBZIqY\">XXXXX</a>Demi Lovato DID a WH
2We broke another WORLD RECORD!8,557,324.0595,577.07,899.00.053,664.0https://www.youtube.com/watch?v=d1tAfXKc7-c[SATIRE]<a href=\"https://www.youtube.com/watch?v=d1tAfXKc7-c\">XXXXX</a>We broke another WOR
3FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~3,609,152.0218,517.03,125.00.017,595.0https://www.youtube.com/watch?v=bMLdNrB5hAo[SATIRE]<a href=\"https://www.youtube.com/watch?v=bMLdNrB5hAo\">XXXXX</a>FLOSSING in VR with
4Don't Laugh Challenge, NEW SEASON!!!!!5,888,349.0569,878.07,822.00.029,373.0https://www.youtube.com/watch?v=Zgm_iM3f_ME[SATIRE]<a href=\"https://www.youtube.com/watch?v=Zgm_iM3f_ME\">XXXXX</a>Don't Laugh Challeng
\n", + "
" + ], + "text/plain": [ + " title Views \\\n", + "0 YOU HAD ONE JOB! - with editor Brad1 5,292,299.0 \n", + "1 Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE 5,358,149.0 \n", + "2 We broke another WORLD RECORD! 8,557,324.0 \n", + "3 FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~ 3,609,152.0 \n", + "4 Don't Laugh Challenge, NEW SEASON!!!!! 5,888,349.0 \n", + "\n", + " Like Dislike Favorite Comment \\\n", + "0 385,260.0 4,080.0 0.0 29,859.0 \n", + "1 378,460.0 3,950.0 0.0 38,075.0 \n", + "2 595,577.0 7,899.0 0.0 53,664.0 \n", + "3 218,517.0 3,125.0 0.0 17,595.0 \n", + "4 569,878.0 7,822.0 0.0 29,373.0 \n", + "\n", + " videoID \\\n", + "0 https://www.youtube.com/watch?v=B67OBHNCopk \n", + "1 https://www.youtube.com/watch?v=kLM_9gBZIqY \n", + "2 https://www.youtube.com/watch?v=d1tAfXKc7-c \n", + "3 https://www.youtube.com/watch?v=bMLdNrB5hAo \n", + "4 https://www.youtube.com/watch?v=Zgm_iM3f_ME \n", + "\n", + " tags \\\n", + "0 [SATIRE, reddit, you had one job, onejob] \n", + "1 [SATIRE] \n", + "2 [SATIRE] \n", + "3 [SATIRE] \n", + "4 [SATIRE] \n", + "\n", + " nameurl \\\n", + "0 XXXXX \n", + "1 XXXXX \n", + "2 XXXXX \n", + "3 XXXXX \n", + "4 XXXXX \n", + "\n", + " title_short \n", + "0 YOU HAD ONE JOB! - w \n", + "1 Demi Lovato DID a WH \n", + "2 We broke another WOR \n", + "3 FLOSSING in VR with \n", + "4 Don't Laugh Challeng " + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "df.set_index('title_short', inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEICAYAAABYoZ8gAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAEg5JREFUeJzt3X+w3XWd3/HnCxLNYiJqcscpBgwrO6tZQiJegYUNkOIAamvKLKtkgK4Bhtku9de0WWx1klb+2Z3aXXbcAqaYpu46BGWpQxcDYbpOSY1YbgKTROKG7ZLVS2hzN/HHFkUSffePe2IvIfdHck9ycvN5PmYY7/l8vud734cZn/nyPefmpqqQJLXjlF4PIEk6vgy/JDXG8EtSYwy/JDXG8EtSYwy/JDXmhA1/kjVJ9iTZPoFjz0ry9SRPJdma5H3HY0ZJmopO2PADa4GrJ3jsp4EvV9U7geuAu47VUJI01Z2w4a+qx4F9I9eSvC3JI0k2J9mY5O0HDwde3/n6dGD3cRxVkqaUab0e4AitBn6nqp5NciHDV/b/EPg3wIYkHwFeB7yndyNK0oltyoQ/yUzgYuArSQ4uv7bzv8uAtVX175P8OvCnSc6tqp/3YFRJOqFNmfAzfFvqB1W16DB7N9N5P6CqvplkBjAH2HMc55OkKeGEvcd/qKr6EfBckt8CyLCFne3vAld01t8BzACGejKoJJ3gcqL+7ZxJ7gMuZ/jK/f8Aq4C/BO4G/gEwHVhXVZ9JMh/4j8BMht/o/b2q2tCLuSXpRHfChl+SdGxMmVs9kqTuOCHf3J0zZ07Nmzev12NI0pSxefPmv6uqvokce0KGf968eQwMDPR6DEmaMpL87USP9VaPJDXG8EtSYwy/JDXmhLzHL0mj2b9/P4ODg7z00ku9HqUnZsyYwdy5c5k+ffpRn8PwS5pSBgcHmTVrFvPmzWPE39vVhKpi7969DA4OcvbZZx/1ebzVI2lKeemll5g9e3Zz0QdIwuzZsyf9XzuGX9KU02L0D+rGazf8ktQYwy9JR2DJkiU8+uijr1i78847Wb58Oddee22Ppjoyhl+SjsCyZctYt27dK9bWrVvH8uXLeeCBB3o01ZEx/JJ0BK699loefvhhXn75ZQB27drF7t27OfPMMzn33HMB+NnPfsaKFSt497vfzXnnncfnP/95AG677TYeeughAK655hpuuukmANasWcOnPvUpXnzxRd7//vezcOFCzj33XO6///5j8hr8OKekKevf/tdv88zuH3X1nPPPeD2r/vGvjbr/pje9iQsuuID169ezdOlS1q1bxwc/+MFXvOn6hS98gdNPP50nn3ySn/70p1xyySVceeWVLF68mI0bN/KBD3yA559/nhdeeAGAjRs3ct111/HII49wxhln8PDDDwPwwx/+sKuv7SCv+CXpCI283bNu3TqWLVv2iv0NGzbwxS9+kUWLFnHhhReyd+9enn322V+E/5lnnmH+/Pm8+c1v5oUXXuCb3/wmF198MQsWLOCxxx7j9ttvZ+PGjZx++unHZH6v+CVNWWNdmR9LS5cu5ROf+ARbtmzhxz/+Me9617vYtWvXL/aris997nNcddVVr3ruD37wAx555BEuvfRS9u3bx5e//GVmzpzJrFmzmDVrFlu2bOFrX/san/70p7niiitYuXJl1+f3il+SjtDMmTNZsmQJN91006uu9gGuuuoq7r77bvbv3w/Azp07efHFFwG46KKLuPPOO7n00ktZvHgxn/3sZ1m8eDEAu3fv5rTTTuOGG25gxYoVbNmy5ZjM7xW/JB2FZcuWcc0117zqEz4At9xyC7t27eL888+nqujr6+OrX/0qAIsXL2bDhg2cc845vPWtb2Xfvn2/CP+2bdtYsWIFp5xyCtOnT+fuu+8+JrOfkL9zt7+/v/xFLJIOZ8eOHbzjHe/o9Rg9dbh/B0k2V1X/RJ7vrR5Jaozhl6TGGH5JU86JeIv6eOnGazf8kqaUGTNmsHfv3ibjf/Dv458xY8akzuOneiRNKXPnzmVwcJChoaFej9ITB38D12QYfklTyvTp0yf126fkrR5Jao7hl6TGTCj8SdYk2ZNk+yj7S5NsTfJ0koEkvzFi77eTPNv557e7Nbgk6ehM9Ip/LXD1GPv/DVhYVYuAm4B7AZK8CVgFXAhcAKxK8sajnlaSNGkTCn9VPQ7sG2P//9b//2zV64CDX18FPFZV+6rq+8BjjP0HiCTpGOvaPf4k1yT5DvAww1f9AG8BvjfisMHO2uGef2vnNtFAqx/TkqTjoWvhr6r/UlVvB/4JcMdRPH91VfVXVX9fX1+3xpIkHaLrn+rp3Bb65SRzgOeBM0dsz+2sSZJ6pCvhT3JOOr9wMsn5wGuBvcCjwJVJ3th5U/fKzpokqUcm9JO7Se4DLgfmJBlk+JM60wGq6h7gN4F/mmQ/8BPgQ503e/cluQN4snOqz1TVqG8SS5KOPX8RiySdBPxFLJKkURl+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWrMuOFPsibJniTbR9m/PsnWJNuSbEqycMTeJ5J8O8n2JPclmdHN4SVJR24iV/xrgavH2H8OuKyqFgB3AKsBkrwF+CjQX1XnAqcC101qWknSpE0b74CqejzJvDH2N414+AQw95Dz/1KS/cBpwO6jG1OS1C3dvsd/M7AeoKqeBz4LfBd4AfhhVW3o8veTJB2hroU/yRKGw3975/EbgaXA2cAZwOuS3DDG829NMpBkYGhoqFtjSZIO0ZXwJzkPuBdYWlV7O8vvAZ6rqqGq2g88CFw82jmqanVV9VdVf19fXzfGkiQdxqTDn+QshqN+Y1XtHLH1XeCiJKclCXAFsGOy30+SNDnjvrmb5D7gcmBOkkFgFTAdoKruAVYCs4G7hvvOgc6V+7eSPABsAQ4AT9H5xI8kqXdSVb2e4VX6+/trYGCg12NI0pSRZHNV9U/kWH9yV5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaM274k6xJsifJ9lH2r0+yNcm2JJuSLByx94YkDyT5TpIdSX69m8NLko7cRK741wJXj7H/HHBZVS0A7gBWj9j7Y+CRqno7sBDYcZRzSpK6ZNp4B1TV40nmjbG/acTDJ4C5AElOBy4FPtw57mXg5aMfVZLUDd2+x38zsL7z9dnAEPCfkjyV5N4krxvtiUluTTKQZGBoaKjLY0mSDupa+JMsYTj8t3eWpgHnA3dX1TuBF4FPjvb8qlpdVf1V1d/X19etsSRJh+hK+JOcB9wLLK2qvZ3lQWCwqr7VefwAw38QSJJ6aNLhT3IW8CBwY1XtPLheVf8b+F6SX+0sXQE8M9nvJ0manHHf3E1yH3A5MCfJILAKmA5QVfcAK4HZwF1JAA5UVX/n6R8BvpTkNcDfAMu7/QIkSUdmIp/qWTbO/i3ALaPsPQ30H25PktQb/uSuJDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDVm3PAnWZNkT5Lto+xfn2Rrkm1JNiVZeMj+qUmeSvIX3RpaknT0JnLFvxa4eoz954DLqmoBcAew+pD9jwE7jmo6SVLXjRv+qnoc2DfG/qaq+n7n4RPA3IN7SeYC7wfuneSckqQu6fY9/puB9SMe3wn8HvDz8Z6Y5NYkA0kGhoaGujyWJOmgroU/yRKGw3975/E/AvZU1eaJPL+qVldVf1X19/X1dWssSdIhpnXjJEnOY/h2znuram9n+RLgA0neB8wAXp/kz6rqhm58T0nS0Zn0FX+Ss4AHgRuraufB9ar6V1U1t6rmAdcBf2n0Jan3xr3iT3IfcDkwJ8kgsAqYDlBV9wArgdnAXUkADlRV/7EaWJI0OamqXs/wKv39/TUwMNDrMSRpykiyeaIX3f7kriQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmPGDX+SNUn2JNk+yv71SbYm2ZZkU5KFnfUzk3w9yTNJvp3kY90eXpJ05CZyxb8WuHqM/eeAy6pqAXAHsLqzfgD4F1U1H7gIuC3J/EnMKknqgnHDX1WPA/vG2N9UVd/vPHwCmNtZf6GqtnS+/ntgB/CWSU8sSZqUbt/jvxlYf+hiknnAO4Fvdfn7SZKO0LRunSjJEobD/xuHrM8E/hz4eFX9aIzn3wrcCnDWWWd1ayxJ0iG6csWf5DzgXmBpVe0dsT6d4eh/qaoeHOscVbW6qvqrqr+vr68bY0mSDmPS4U9yFvAgcGNV7RyxHuALwI6q+sPJfh9JUneMe6snyX3A5cCcJIPAKmA6QFXdA6wEZgN3DbeeA1XVD1wC3AhsS/J053T/uqq+1u0XIUmauHHDX1XLxtm/BbjlMOv/A8jRjyZJOhb8yV1Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jasy44U+yJsmeJNtH2b8+ydYk25JsSrJwxN7VSf4qyV8n+WQ3B5ckHZ2JXPGvBa4eY/854LKqWgDcAawGSHIq8B+A9wLzgWVJ5k9qWknSpI0b/qp6HNg3xv6mqvp+5+ETwNzO1xcAf11Vf1NVLwPrgKWTnFeSNEndvsd/M7C+8/VbgO+N2BvsrB1WkluTDCQZGBoa6vJYkqSDuhb+JEsYDv/tR/P8qlpdVf1V1d/X19etsSRJh5jWjZMkOQ+4F3hvVe3tLD8PnDnisLmdNUlSD036ij/JWcCDwI1VtXPE1pPAryQ5O8lrgOuAhyb7/SRJkzPuFX+S+4DLgTlJBoFVwHSAqroHWAnMBu5KAnCgc8vmQJJ/DjwKnAqsqapvH5NXIUmasFRVr2d4lf7+/hoYGOj1GJI0ZSTZXFX9EznWn9yVpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqTKqq1zO8SpIh4G97PccRmgP8Xa+HOM58zW3wNU8Nb62qvokceEKGfypKMlBV/b2e43jyNbfB13zy8VaPJDXG8EtSYwx/96zu9QA94Gtug6/5JOM9fklqjFf8ktQYwy9JjTH8ktQYwy9JjTH8ktQYwy9JjTH8ktQYw6+TRpI3JPndztdnJHmg8/WiJO8bcdyHk/xJl77n5Un+YpLn+HCSM7oxjzQRhl8nkzcAvwtQVbur6trO+iLgfaM+q4eSnAp8GDD8Om4Mv04mvw+8LcnTSb6SZHuS1wCfAT7UWf/QyCck6Uvy50me7PxzyWgnT3JZ5xxPJ3kqyazO1swkDyT5TpIvJUnn+Cs6x21LsibJazvru5L8QZItwDKgH/hS57y/dAz+vUivYPh1Mvkk8L+qahGwAqCqXgZWAvdX1aKquv+Q5/wx8EdV9W7gN4F7xzj/vwRu65x/MfCTzvo7gY8D84FfBi5JMgNYC3yoqhYA04B/NuJce6vq/Kr6M2AAuL4z30+QjjHDr9a9B/iTJE8DDwGvTzJzlGO/Afxhko8Cb6iqA531/1lVg1X1c+BpYB7wq8BzVbWzc8x/Bi4dca5D/wCSjptpvR5A6rFTgIuq6qXxDqyq30/yMMPvF3wjyVWdrZ+OOOxnTOz/Vy8e8aRSl3jFr5PJ3wOzjmAdYAPwkYMPkiwa7eRJ3lZV26rqD4AngbePMctfAfOSnNN5fCPw349wbumYMPw6aVTVXoavxLcD/27E1teB+Yd7cxf4KNCfZGuSZ4DfGeNbfLzzhvFWYD+wfoxZXgKWA19Jsg34OXDPKIevBe7xzV0dL/59/JLUGK/4JakxvrkrHSLJcuBjhyx/o6pu68U8Urd5q0eSGuOtHklqjOGXpMYYfklqjOGXpMb8P+PZ1UPntR/oAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "df.sort_values(by=['Views'], ascending=False)[['Views']].head(1).plot()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "df.reset_index(inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "%matplotlib inline\n", + "df.sort_values(by=['Views'], ascending=False)[['Views']].head(5).T.plot.bar()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEMCAYAAAA/Jfb8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAFTRJREFUeJzt3X2QXfV93/H3ByQgDjK2pY1rkIwYG8doMMawBmJXMgR7ENCi4BAbTUnaAmbaGjfjZFQrYwbbtJ3xA2lJXSAWhVDcCTLGLtEU8dQa15rUdrUI8yQClkExK3BYS/gJSnjwt3/cK7TIK+2VdLVHe+77NbOz95zz23u/Otr72d/9/c5DqgpJUrvs13QBkqT+M9wlqYUMd0lqIcNdklrIcJekFjLcJamFGg33JNcleTrJgz20fXOSu5Pcm+T+JGdMRY2SNB013XO/HljcY9tLgJuq6l3AucBVe6soSZruGg33qvoWsGX8uiRvSXJ7knuSrEny9q3Ngdd2Hx8CPDmFpUrStDKj6QImsAL4F1X1/SQn0umh/zbwaeDOJB8Dfh14f3MlStK+bZ8K9yQHA+8Bvppk6+oDu9+XAtdX1Z8m+S3gy0mOrqpfNlCqJO3T9qlwpzNM9JOqOnaCbRfQHZ+vqm8nOQiYAzw9hfVJ0rTQ9ITqq1TVz4DHk/weQDre2d38Q+DU7vqjgIOAsUYKlaR9XJq8KmSSG4GT6fTA/w74FPAN4GrgTcBMYGVVXZZkAXANcDCdydV/U1V3NlG3JO3rGg13SdLesU8Ny0iS+qOxCdU5c+bU/Pnzm3p5SZqW7rnnnh9X1dBk7RoL9/nz5zMyMtLUy0vStJTkb3tp57CMJLWQ4S5JLWS4S1IL7WtnqEoSAC+++CKjo6M8//zzTZfSiIMOOoi5c+cyc+bM3fp5w13SPml0dJRZs2Yxf/58xl1raiBUFZs3b2Z0dJQjjjhit57DYRlJ+6Tnn3+e2bNnD1ywAyRh9uzZe/SpxXCXtM8axGDfak//7Ya7JLWQY+6SpoX5y2/t6/Nt/OyZO91+yimnsHz5ck477bRX1l1xxRXcd999/PznP+fmm2/uaz39Nq3Dvd//2btjsl8QSdPT0qVLWbly5avCfeXKlXz+859n0aJFDVbWG4dlJGkC55xzDrfeeisvvPACABs3buTJJ59k3rx5HH300QC8/PLLLFu2jHe/+90cc8wxfOlLXwLgox/9KKtWrQLg7LPP5vzzzwfguuuu45Of/CTPPvssZ555Ju985zs5+uij+cpXvtL3+g13SZrAG97wBk444QRuu+02oNNr/9CHPvSqic5rr72WQw45hLVr17J27VquueYaHn/8cRYuXMiaNWsA2LRpE+vXrwdgzZo1LFq0iNtvv51DDz2U++67jwcffJDFixf3vX7DXZJ2YOvQDHTCfenSpa/afuedd3LDDTdw7LHHcuKJJ7J582a+//3vvxLu69evZ8GCBbzxjW/kqaee4tvf/jbvec97eMc73sFdd93FJz7xCdasWcMhhxzS99qn9Zi7JO1NS5Ys4eMf/zjr1q3jueee4/jjj2fjxo2vbK8qvvjFL75qXH6rn/zkJ9x+++0sWrSILVu2cNNNN3HwwQcza9YsZs2axbp161i9ejWXXHIJp556Kpdeemlfa5+0557kuiRPJ3lwB9v/SZL7kzyQ5P+Mu+epJE1rBx98MKeccgrnn3/+r/TaAU477TSuvvpqXnzxRQAeffRRnn32WQBOOukkrrjiChYtWsTChQu5/PLLWbhwIQBPPvkkr3nNazjvvPNYtmwZ69at63vtvfTcrwf+M3DDDrY/Dryvqp5JcjqwAjixP+VJUkdTR6YtXbqUs88++5XhmfEuvPBCNm7cyHHHHUdVMTQ0xC233ALAwoULufPOO3nrW9/K4YcfzpYtW14J9wceeIBly5ax3377MXPmTK6++uq+193TPVSTzAf+R1UdPUm71wMPVtVhkz3n8PBw7enNOjwUUmqvhx9+mKOOOqrpMho10T5Ick9VDU/2s/2eUL0AuG1HG5NclGQkycjY2FifX1qStFXfwj3JKXTC/RM7alNVK6pquKqGh4YmvQWgJGk39eVomSTHAP8FOL2qNvfjOSWpqgb24mG9DJnvzB733JO8Gfg68PtV9eiePp8kQedmFZs3b97jkJuOtl7P/aCDDtrt55i0557kRuBkYE6SUeBTwMxuAX8OXArMBq7q/oV9qZfBfknamblz5zI6Osqgzs9tvRPT7po03KvqVw/ufPX2C4ELd7sCSZrAzJkzd/suRPLyA5LUSoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgtNGu5JrkvydJIHd7A9Sf5Tkg1J7k9yXP/LlCTtil567tcDi3ey/XTgyO7XRcDVe16WJGlPTBruVfUtYMtOmiwBbqiO7wCvS/KmfhUoSdp1/RhzPwx4YtzyaHfdr0hyUZKRJCNjY2N9eGlJ0kSmdEK1qlZU1XBVDQ8NDU3lS0vSQOlHuG8C5o1bnttdJ0lqSD/CfRXwB92jZk4CflpVT/XheSVJu2nGZA2S3AicDMxJMgp8CpgJUFV/DqwGzgA2AM8B/3xvFStJ6s2k4V5VSyfZXsBH+1aRJGmPeYaqJLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQj2Fe5LFSR5JsiHJ8gm2vznJ3UnuTXJ/kjP6X6okqVeThnuS/YErgdOBBcDSJAu2a3YJcFNVvQs4F7iq34VKknrXS8/9BGBDVT1WVS8AK4El27Up4LXdx4cAT/avREnSruol3A8Dnhi3PNpdN96ngfOSjAKrgY9N9ERJLkoykmRkbGxsN8qVJPWiXxOqS4Hrq2oucAbw5SS/8txVtaKqhqtqeGhoqE8vLUnaXi/hvgmYN255bnfdeBcANwFU1beBg4A5/ShQkrTregn3tcCRSY5IcgCdCdNV27X5IXAqQJKj6IS74y6S1JBJw72qXgIuBu4AHqZzVMxDSS5Lcla32R8DH0lyH3Aj8M+qqvZW0ZKknZvRS6OqWk1nonT8ukvHPV4PvLe/pUmSdpdnqEpSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS3UU7gnWZzkkSQbkizfQZsPJVmf5KEkf9nfMiVJu2LGZA2S7A9cCXwAGAXWJllVVevHtTkS+BPgvVX1TJLf2FsFa2Lzl9/adAls/OyZTZcgqauXnvsJwIaqeqyqXgBWAku2a/MR4Mqqegagqp7ub5mSpF3RS7gfBjwxbnm0u268twFvS/LXSb6TZPFET5TkoiQjSUbGxsZ2r2JJ0qT6NaE6AzgSOBlYClyT5HXbN6qqFVU1XFXDQ0NDfXppSdL2egn3TcC8cctzu+vGGwVWVdWLVfU48CidsJckNaCXcF8LHJnkiCQHAOcCq7ZrcwudXjtJ5tAZpnmsj3VKknbBpOFeVS8BFwN3AA8DN1XVQ0kuS3JWt9kdwOYk64G7gWVVtXlvFS1J2rlJD4UEqKrVwOrt1l067nEBf9T9kiQ1zDNUJamFDHdJaiHDXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaqGeLvkrTSfzl9/adAls/OyZTZegAWfPXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklqop3BPsjjJI0k2JFm+k3a/m6SSDPevREnSrpo03JPsD1wJnA4sAJYmWTBBu1nAHwLf7XeRkqRd00vP/QRgQ1U9VlUvACuBJRO0+7fA54Dn+1ifJGk39BLuhwFPjFse7a57RZLjgHlVtdMrNiW5KMlIkpGxsbFdLlaS1Js9nlBNsh/wH4A/nqxtVa2oquGqGh4aGtrTl5Yk7UAv4b4JmDdueW533VazgKOBbybZCJwErHJSVZKa00u4rwWOTHJEkgOAc4FVWzdW1U+rak5Vza+q+cB3gLOqamSvVCxJmtSk4V5VLwEXA3cADwM3VdVDSS5LctbeLlCStOt6uhNTVa0GVm+37tIdtD15z8uS1A/elWpweZs9SQNh0P7QefkBSWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaiHDXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaiHDXZJayHCXpBYy3CWphXoK9ySLkzySZEOS5RNs/6Mk65Pcn+R/JTm8/6VKkno1abgn2R+4EjgdWAAsTbJgu2b3AsNVdQxwM/D5fhcqSepdLz33E4ANVfVYVb0ArASWjG9QVXdX1XPdxe8Ac/tbpiRpV/QS7ocBT4xbHu2u25ELgNsm2pDkoiQjSUbGxsZ6r1KStEv6OqGa5DxgGPjCRNurakVVDVfV8NDQUD9fWpI0zowe2mwC5o1bnttd9ypJ3g98EnhfVf19f8qTJO2OXnrua4EjkxyR5ADgXGDV+AZJ3gV8CTirqp7uf5mSpF0xabhX1UvAxcAdwMPATVX1UJLLkpzVbfYF4GDgq0m+l2TVDp5OkjQFehmWoapWA6u3W3fpuMfv73NdkqQ94BmqktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSC/UU7kkWJ3kkyYYkyyfYfmCSr3S3fzfJ/H4XKknq3aThnmR/4ErgdGABsDTJgu2aXQA8U1VvBf4j8Ll+FypJ6l0vPfcTgA1V9VhVvQCsBJZs12YJ8F+7j28GTk2S/pUpSdoVqaqdN0jOARZX1YXd5d8HTqyqi8e1ebDbZrS7/INumx9v91wXARd1F38TeKRf/5A9MAf48aStBoP7Yhv3xTbui232hX1xeFUNTdZoxlRUslVVrQBWTOVrTibJSFUNN13HvsB9sY37Yhv3xTbTaV/0MiyzCZg3bnlud92EbZLMAA4BNvejQEnSrusl3NcCRyY5IskBwLnAqu3arAL+affxOcA3arLxHknSXjPpsExVvZTkYuAOYH/guqp6KMllwEhVrQKuBb6cZAOwhc4fgOlinxomapj7Yhv3xTbui22mzb6YdEJVkjT9eIaqJLWQ4S5JLWS4S1ILGe6S1EJTehJTk5KsA74O3FhVP2i6nqYl2Q+gqn7ZPcT1aGBjVW1ptrKpleQ1wMVAAV+kc6TXB4G/AS6rql80WF5jkrweeLmqftZ0LU1KMkznHJ6XgUer6m8aLqlng9Rzfz3wOuDuJP83yceTHNp0UU1I8jvAU8CmJEuANcAXgPuT/ONGi5t61wNvBI4AbgWG6eyLAFc3V9bUS3JokhuS/JTOKfYPJvlhkk8nmdl0fVMpyfuSjACfBa6jc9mUa5N8M8m8nf/0PqKqBuILWDfu8ULgKuBHwN3ARU3XN8X74l7gH9AJtJ8Bv9ldfzidcxcar3EK98X3ut/T/X3IuOX7m65vivfFN4CTu48/SOcKr78O/DtgRdP1TfG+uBcY6j4+Avjv3ccfAO5sur5evgap5/7KVSqrak1V/SvgMDqXJ/6txqpqSFX9qKoeB35YVY901/0tg/Vp7hXVeeeu7n7fujxoJ4HMrqpvAlTV14FFVfVsVV0CLGq0sqm3f1WNdR//kE7Hh6q6i05u7PMGZsydCa5AWVUvA7d3vwZKkv2q6pfA+ePW7Q8c0FxVjRhJcnBV/aKqxu+LtwA/b7CuJowlOY/Op9kPAhsBupfvHrQ/+iNJrqXzaeYs4JvwyhzN/g3W1bOBOUM1yb+m89HqiaZraVqSdwMPVNXz262fD/zDqvpvTdTVlCQn0Omsr+3eiGYxnc7AKz35QZDkzcDldG7K8z1gWVU9lWQ2neGarzVa4BTqzjF8hM6+uI/OZVdeTvJrwG90P+Xu0wYp3H8KPAv8ALgR+Oq4j10DL8nsqhq4K3km+RSdu4zNAO4CTqTTc/0AcEdV/fsGy5N22yCF+73A8cD7gQ/T+ah1D52g/3pVDcxH8CSfBS6vqh93D/W6CfglMBP4g6r6340WOIWSPAAcCxxIZ0J1blX9rNtD+25VHdNogVOoe7nuC4DfYdu48ibgr4Brq+rFpmqbakleC/wJnUuc31ZVfzlu21XdObt92iCNo1VV/bKq7qyqC4BD6Rwxsxh4rNnSptyZte0uWV8APlyd+99+APjT5spqxEtV9XJVPQf8oLrHdVfV/6PzB2+QfJnOH7rPAGd0vz4DvBMYqKE64C/oHITxNeDcJF9LcmB320nNldW7QZpQfdU9Xbu9kFXAqu4kySCZkWRGVb0E/FpVrQWoqkfH/QIPiheSvKYb7sdvXZnkEAYv3I+vqrdtt24U+E6SR5soqEFvqarf7T6+JckngW8kOavJonbFIPXcP7yjDd039iC5Clid5LeB25P8Wfekjc/QmUgbJIu2/v93jx7aaibbbkAzKLYk+b2tZy9D56iqJB8GnmmwriYcOH4/dOdergG+BcxurKpdMDBj7nq1JCcD/xJ4G51PcE8AtwB/MUhjq9qme7TU54BTgJ90V7+OzgTz8u55EQMhyefpnKz0P7dbvxj4YlUd2UxlvTPcB1SSt9OZNPtujbt+SpLFVTVwx/2rI8mJdE7e+gHwdjon+K2vqtWNFtaAnbxHTq+q25qrrDeG+wDqHvP/UeBhOhNof1hVf9Xdtq6qjmuyPjVjgsNCT6Bz8s7AHRaa5GN0Lig3bd8jgzShqm0+Qmfy7Bfdj+I3J5lfVX/GdhPPGijnMPFhoZcD3wUGJtzpXChsWr9HDPfBtN/Wj5lVtbE7/n5zksOZJr+42ite6l6S47kkrzosNMmgHTk07d8jg3S0jLb5uyTHbl3o/hL/I2AO8I7GqlLTXhh3WPCgHxY67d8jjrkPoCRz6fTSfjTBtvdW1V83UJYaluTAqvr7CdbPAd5UVQ80UFYj2vAeMdwlqYUclpGkFjLcJamFDHdJaiHDXZJa6P8DEVM3h9QH/TcAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "%matplotlib inline\n", + "df.sort_values(by=['Views'], ascending=False)[['Views']].head(5).plot.bar()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} From c2ba27af59eeacb4e4d2523bb76d0f76c714db8e Mon Sep 17 00:00:00 2001 From: softhints Date: Mon, 11 Feb 2019 22:25:20 +0200 Subject: [PATCH 11/76] Youtube-PewDiePie --- .../Python_problems_for_beginners_1.ipynb | 66 +++++++++++++++++-- notebooks/youtube/Youtube-PewDiePie.ipynb | 2 +- 2 files changed, 61 insertions(+), 7 deletions(-) diff --git a/notebooks/python_problems/Python_problems_for_beginners_1.ipynb b/notebooks/python_problems/Python_problems_for_beginners_1.ipynb index 354d2eb..697835e 100644 --- a/notebooks/python_problems/Python_problems_for_beginners_1.ipynb +++ b/notebooks/python_problems/Python_problems_for_beginners_1.ipynb @@ -189,11 +189,11 @@ "\n", "Example n=4\n", "\n", - "0\n", - " 1\n", - " 2\n", - " 3\n", - " 4" + " 0\n", + " 1\n", + " 2\n", + " 3\n", + " 4" ] }, { @@ -216,7 +216,61 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [] + "source": [ + "0 - 1\n", + "1 - 3\n", + "2 - 5\n", + "\n", + "2 * i + 1" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " * \n", + " * * * \n", + "* * * * * \n" + ] + } + ], + "source": [ + "n = 3\n", + "\n", + "for i in range(n):\n", + " row = '* ' * (2 * i + 1) # calc the * for a given row based formula\n", + " print(row.center(n * 3))" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " *\n", + " ***\n", + " *****\n", + " *******\n", + "*********\n" + ] + } + ], + "source": [ + "n = 5\n", + "\n", + "for i in range(n):\n", + " print( ' ' * (n-i-1), end='')\n", + " print('*' * (2 * i + 1))" + ] } ], "metadata": { diff --git a/notebooks/youtube/Youtube-PewDiePie.ipynb b/notebooks/youtube/Youtube-PewDiePie.ipynb index bd11a94..1b7bec0 100644 --- a/notebooks/youtube/Youtube-PewDiePie.ipynb +++ b/notebooks/youtube/Youtube-PewDiePie.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ From 52ecdd942ad625bb90f682c767b44d60cedafee2 Mon Sep 17 00:00:00 2001 From: softhints Date: Tue, 12 Feb 2019 09:07:28 +0200 Subject: [PATCH 12/76] DataFrame_column_transformations --- .../DataFrame_column_transformations.ipynb | 1173 +++++++++++++++++ 1 file changed, 1173 insertions(+) create mode 100644 notebooks/DataFrame_column_transformations.ipynb diff --git a/notebooks/DataFrame_column_transformations.ipynb b/notebooks/DataFrame_column_transformations.ipynb new file mode 100644 index 0000000..451daca --- /dev/null +++ b/notebooks/DataFrame_column_transformations.ipynb @@ -0,0 +1,1173 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataframe column transformations\n", + "\n", + "* change type of a column - int to str\n", + "* change columns to category\n", + "* create new column by n characters from another column\n", + "* combine two columns into another column\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
parent_idchild_id
031114321
120103102
230004023
310002010
440235321
\n", + "
" + ], + "text/plain": [ + " parent_id child_id\n", + "0 3111 4321\n", + "1 2010 3102\n", + "2 3000 4023\n", + "3 1000 2010\n", + "4 4023 5321" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "df = pd.DataFrame(\n", + " {\n", + " 'parent_id': [3111, 2010, 3000, 1000, 4023, 3011, 3033, 5010, 3011, 3102, 2010, 4023, 2110, 2100, 1000, 5010, 2110, 1000, 5010, 3033],\n", + " 'child_id': [4321, 3102, 4023, 2010, 5321, 4200, 4113, 6525, 4010, 4001, 3011, 5010, 3000, 3033, 2110, 6100, 3111, 2100, 6016, 4311]\n", + " }\n", + ")\n", + "\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "parent_id int64\n", + "child_id int64\n", + "dtype: object" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dtypes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. change type of a column\n", + "* int to str\n", + "* str to int" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "df.parent_id = df.parent_id.astype('str')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "parent_id object\n", + "child_id int64\n", + "dtype: object" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "df.parent_id = df.parent_id.astype('int')" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "parent_id int64\n", + "child_id int64\n", + "dtype: object" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
parent_idchild_id
count20.00000020.000000
mean2885.8500003971.400000
std1263.5017871327.918014
min1000.0000002010.000000
25%2077.5000003027.500000
50%3011.0000004016.500000
75%3339.0000004493.250000
max5010.0000006525.000000
\n", + "
" + ], + "text/plain": [ + " parent_id child_id\n", + "count 20.000000 20.000000\n", + "mean 2885.850000 3971.400000\n", + "std 1263.501787 1327.918014\n", + "min 1000.000000 2010.000000\n", + "25% 2077.500000 3027.500000\n", + "50% 3011.000000 4016.500000\n", + "75% 3339.000000 4493.250000\n", + "max 5010.000000 6525.000000" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.describe()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. convert column to a category\n", + "\n", + "Two reasons for that\n", + "* performance - having small number of distinct values (lots of repetition in single column)\n", + "* sort - when the lexical order of a variable is not the same as the logical order " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "df.parent_id = df.parent_id.astype('category')" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "parent_id category\n", + "child_id int64\n", + "dtype: object" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
child_id
count20.000000
mean3971.400000
std1327.918014
min2010.000000
25%3027.500000
50%4016.500000
75%4493.250000
max6525.000000
\n", + "
" + ], + "text/plain": [ + " child_id\n", + "count 20.000000\n", + "mean 3971.400000\n", + "std 1327.918014\n", + "min 2010.000000\n", + "25% 3027.500000\n", + "50% 4016.500000\n", + "75% 4493.250000\n", + "max 6525.000000" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "df.child_id = df.child_id.astype('category')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Int64Index([2010, 2100, 2110, 3000, 3011, 3033, 3102, 3111, 4001, 4010, 4023,\n", + " 4113, 4200, 4311, 4321, 5010, 5321, 6016, 6100, 6525],\n", + " dtype='int64')" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['child_id'].cat.categories" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Int64Index([1000, 2010, 2100, 2110, 3000, 3011, 3033, 3102, 3111, 4023, 5010], dtype='int64')" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['parent_id'].cat.categories" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for column in ['parent_id', 'child_id']:\n", + " df[col] = df[column].astype('int')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. create new column by n characters from another column\n", + "\n", + "* get last n characters\n", + "* get first n characters\n", + "* get n characters from the middle" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "df['parent_id_last'] = df.apply(lambda row: str(row['parent_id'])[-3:], axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
parent_idchild_idparent_id_last
031114321111
120103102010
230004023000
310002010000
440235321023
\n", + "
" + ], + "text/plain": [ + " parent_id child_id parent_id_last\n", + "0 3111 4321 111\n", + "1 2010 3102 010\n", + "2 3000 4023 000\n", + "3 1000 2010 000\n", + "4 4023 5321 023" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "df['parent_id_first'] = df.apply(lambda row: str(row['parent_id'])[:2], axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
parent_idchild_idparent_id_lastparent_id_first
03111432111131
12010310201020
23000402300030
31000201000010
44023532102340
\n", + "
" + ], + "text/plain": [ + " parent_id child_id parent_id_last parent_id_first\n", + "0 3111 4321 111 31\n", + "1 2010 3102 010 20\n", + "2 3000 4023 000 30\n", + "3 1000 2010 000 10\n", + "4 4023 5321 023 40" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "df['parent_id_middle'] = df.apply(lambda row: str(row['child_id'])[1:3], axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
parent_idchild_idparent_id_lastparent_id_firstparent_id_middle
0311143211113132
1201031020102010
2300040230003002
3100020100001001
4402353210234032
\n", + "
" + ], + "text/plain": [ + " parent_id child_id parent_id_last parent_id_first parent_id_middle\n", + "0 3111 4321 111 31 32\n", + "1 2010 3102 010 20 10\n", + "2 3000 4023 000 30 02\n", + "3 1000 2010 000 10 01\n", + "4 4023 5321 023 40 32" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. combine two columns into another column" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "df['combined'] = df.apply(lambda row: str(row['parent_id']) + str(row['child_id']), axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
parent_idchild_idparent_id_lastparent_id_firstparent_id_middlecombined
031114321111313231114321
120103102010201020103102
230004023000300230004023
310002010000100110002010
440235321023403240235321
\n", + "
" + ], + "text/plain": [ + " parent_id child_id parent_id_last parent_id_first parent_id_middle combined\n", + "0 3111 4321 111 31 32 31114321\n", + "1 2010 3102 010 20 10 20103102\n", + "2 3000 4023 000 30 02 30004023\n", + "3 1000 2010 000 10 01 10002010\n", + "4 4023 5321 023 40 32 40235321" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "df['combined'] = df.apply(lambda row: str(row['parent_id'] + row['child_id']), axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
parent_idchild_idparent_id_lastparent_id_firstparent_id_middlecombined
03111432111131327432
12010310201020105112
23000402300030027023
31000201000010013010
44023532102340329344
\n", + "
" + ], + "text/plain": [ + " parent_id child_id parent_id_last parent_id_first parent_id_middle combined\n", + "0 3111 4321 111 31 32 7432\n", + "1 2010 3102 010 20 10 5112\n", + "2 3000 4023 000 30 02 7023\n", + "3 1000 2010 000 10 01 3010\n", + "4 4023 5321 023 40 32 9344" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "def comb(x, y):\n", + " return str(x) + str(y)\n", + "\n", + "df['comb'] = df.apply(lambda row: comb(row['parent_id'], row['child_id']), axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
parent_idchild_idparent_id_lastparent_id_firstparent_id_middlecombinedcomb
0311143211113132743231114321
1201031020102010511220103102
2300040230003002702330004023
3100020100001001301010002010
4402353210234032934440235321
\n", + "
" + ], + "text/plain": [ + " parent_id child_id parent_id_last parent_id_first parent_id_middle combined \\\n", + "0 3111 4321 111 31 32 7432 \n", + "1 2010 3102 010 20 10 5112 \n", + "2 3000 4023 000 30 02 7023 \n", + "3 1000 2010 000 10 01 3010 \n", + "4 4023 5321 023 40 32 9344 \n", + "\n", + " comb \n", + "0 31114321 \n", + "1 20103102 \n", + "2 30004023 \n", + "3 10002010 \n", + "4 40235321 " + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From ad75202f7fab7f3241bee3b97fd72d2b22ae527d Mon Sep 17 00:00:00 2001 From: softhints Date: Fri, 15 Feb 2019 22:26:59 +0200 Subject: [PATCH 13/76] python asterisk argument --- ... the usage of * - asterisk in Python.ipynb | 385 ++++++++++++++++++ 1 file changed, 385 insertions(+) create mode 100644 notebooks/What is the usage of * - asterisk in Python.ipynb diff --git a/notebooks/What is the usage of * - asterisk in Python.ipynb b/notebooks/What is the usage of * - asterisk in Python.ipynb new file mode 100644 index 0000000..6d0cb36 --- /dev/null +++ b/notebooks/What is the usage of * - asterisk in Python.ipynb @@ -0,0 +1,385 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# What is the usage of * - asterisk in Python\n", + "\n", + "* For multiplication and power operations.\n", + "* Extending collections\n", + "* Unpacking\n", + "* positional arguments and keyword arguments" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## For multiplication and power operations." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "30" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "5 * 6" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "2 ** 2" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "ename": "SyntaxError", + "evalue": "invalid syntax (, line 1)", + "output_type": "error", + "traceback": [ + "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m 2 *** 2\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" + ] + } + ], + "source": [ + "2 *** 2" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'aaaaa'" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'a' * 5" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'ffffff'" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'fff' * 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Extending collections" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[0] * 20 " + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[0, 1 , 2] * 5" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[[0, 1, 2], [3], [0, 1, 2], [3]]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[[0, 1 , 2], [3]] * 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Unpacking" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 3, 5, 7, 9]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "odds = [1, 3, 5, 7, 9]\n", + "*x, = odds\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 3, 5, 7]" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "*x,y = odds\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "9" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "x, *y, z = odds" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[3, 5, 7]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(1, 3, 5, 7, 9)\n", + "([1, 3, 5, 7, 9],)\n" + ] + } + ], + "source": [ + "odds = [1, 3, 5, 7, 9]\n", + "\n", + "def sum_all(*numbers):\n", + " print(numbers)\n", + "\n", + "sum_all(*odds)\n", + "\n", + "sum_all(odds)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## positional arguments and keyword arguments" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "('x', 'y', 'z', 'w', 'v')\n" + ] + } + ], + "source": [ + "def print_all(*args):\n", + " print(args) \n", + "print_all('x', 'y', 'z', 'w', 'v')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'x': 'x', 'y': 'y', 'z': 'z', 'w': 'w', 'v': 'v'}\n" + ] + } + ], + "source": [ + "def print_all(**kwargs):\n", + " print(kwargs)\n", + "print_all(x='x', y='y', z='z', w='w', v='v')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From abe1e286b53bc9e56aee2028b326cb4e2b5b6ed1 Mon Sep 17 00:00:00 2001 From: softhints Date: Fri, 15 Feb 2019 22:28:36 +0200 Subject: [PATCH 14/76] python asterisk argument --- ...thon.ipynb => What_is_the_usage_of_*_asterisk_in_Python.ipynb} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename notebooks/{What is the usage of * - asterisk in Python.ipynb => What_is_the_usage_of_*_asterisk_in_Python.ipynb} (100%) diff --git a/notebooks/What is the usage of * - asterisk in Python.ipynb b/notebooks/What_is_the_usage_of_*_asterisk_in_Python.ipynb similarity index 100% rename from notebooks/What is the usage of * - asterisk in Python.ipynb rename to notebooks/What_is_the_usage_of_*_asterisk_in_Python.ipynb From bfd2a3c21e7c7ea7705d22223b43f218ccfcde19 Mon Sep 17 00:00:00 2001 From: softhints Date: Sat, 16 Feb 2019 11:55:48 +0200 Subject: [PATCH 15/76] Image validation with Python --- notebooks/Image_validation_with_Python.ipynb | 423 +++++++++++++++++++ 1 file changed, 423 insertions(+) create mode 100644 notebooks/Image_validation_with_Python.ipynb diff --git a/notebooks/Image_validation_with_Python.ipynb b/notebooks/Image_validation_with_Python.ipynb new file mode 100644 index 0000000..c7e9e85 --- /dev/null +++ b/notebooks/Image_validation_with_Python.ipynb @@ -0,0 +1,423 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Image validation with Python\n", + "\n", + "* is a file valid image\n", + " * check file extension\n", + " * check the file with pil\n", + "* is the image blank\n", + "* is the image contains a pattern\n", + "\n", + "#### possible future video:\n", + "* multiple image validation\n", + "* validation url image without donwload\n", + "* search image in image" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## is a file valid image" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# check file extension\n", + "test_img = './csv/movie_metadata.csv'\n", + "test_img.lower().endswith(('.png', '.jpg', '.jpeg'))" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# check file extension\n", + "test_img = './csv/Selection_001.png'\n", + "test_img.lower().endswith(('.png', '.jpg', '.jpeg'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### check the file with pil\n", + "\n", + "`pip install Pillow`" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "def is_jpg(filename):\n", + " try:\n", + " i=Image.open(filename)\n", + " return i.format in ['PNG', 'JPEG']\n", + " except IOError:\n", + " return False\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_jpg('./csv') " + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_jpg('./csv/movie_metadata.csv') " + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_jpg('./csv/Selection_001.png') " + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_jpg('./csv/Selection_001.png') " + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_jpg('./csv/fire-and-water-2354583_960_720.jpg') " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## is the image blank" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "None\n" + ] + } + ], + "source": [ + "import json\n", + "from io import BytesIO\n", + "from PIL import Image\n", + "import requests\n", + "\n", + "remote_file = 'https://cdn.pixabay.com/photo/2013/03/29/07/34/girl-97433_960_720.jpg'\n", + "\n", + "response = requests.get(remote_file)\n", + "img = Image.open(BytesIO(response.content))\n", + "\n", + "clrs = img.getcolors()\n", + "print(clrs)" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from IPython.display import Image\n", + "from IPython.core.display import HTML \n", + "\n", + "color_image = './csv/Selection_139.png'\n", + "\n", + "Image(url= color_image)" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "None\n" + ] + } + ], + "source": [ + "import json\n", + "from io import BytesIO\n", + "from PIL import Image\n", + "import requests\n", + "\n", + "img = Image.open(color_image)\n", + "\n", + "clrs = img.getcolors()\n", + "print(clrs)" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from IPython.display import Image\n", + "from IPython.core.display import HTML \n", + "\n", + "blank_image = './csv/Selection_140.png'\n", + "\n", + "Image(url= blank_image)" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[(49128, (238, 238, 238))]\n" + ] + } + ], + "source": [ + "import json\n", + "from io import BytesIO\n", + "from PIL import Image\n", + "import requests\n", + "\n", + "img = Image.open(blank_image)\n", + "\n", + "clrs = img.getcolors()\n", + "print(clrs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## is the image contains a pattern" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\"Drawing\"\n", + "\"Drawing\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import cv2\n", + "import numpy as np\n", + "\n", + "img_rgb = cv2.imread('./csv/image_with_coin.jpg')\n", + "template = cv2.imread('./csv/coin.png')\n", + "w, h = template.shape[:-1]\n", + "\n", + "res = cv2.matchTemplate(img_rgb, template, cv2.TM_CCOEFF_NORMED)\n", + "threshold = .8\n", + "loc = np.where(res >= threshold)\n", + "for pt in zip(*loc[::-1]): \n", + " cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)\n", + "\n", + "cv2.imwrite('./csv/result.png', img_rgb)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import Image\n", + "from IPython.core.display import HTML \n", + "\n", + "Image(url= './csv/result.png')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 57e74b0c23335b667dd45ce4fb93e1f5c55d4ea4 Mon Sep 17 00:00:00 2001 From: softhints Date: Tue, 19 Feb 2019 12:33:19 +0200 Subject: [PATCH 16/76] Chapter_3_Functions_1.ipynb --- .../Think Python/Chapter_3_Functions_1.ipynb | 874 ++++++++++++++++++ 1 file changed, 874 insertions(+) create mode 100644 notebooks/Books/Think Python/Chapter_3_Functions_1.ipynb diff --git a/notebooks/Books/Think Python/Chapter_3_Functions_1.ipynb b/notebooks/Books/Think Python/Chapter_3_Functions_1.ipynb new file mode 100644 index 0000000..dca266f --- /dev/null +++ b/notebooks/Books/Think Python/Chapter_3_Functions_1.ipynb @@ -0,0 +1,874 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Think Python: How to Think Like a Computer Scientist" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Chapter 3 Functions\n", + "\n", + "* Function calls\n", + "* Math functions\n", + "* Composition\n", + "* Adding new functions\n", + "* Definitions and uses\n", + "* Flow of execution\n", + "* Parameters and arguments\n", + "------\n", + "* Variables and parameters are local\n", + "* Stack diagrams\n", + "* Fruitful functions and void functions\n", + "* Why functions?\n", + "* Debugging\n", + "* Glossary\n", + "* Exercises\n", + "\n", + "\n", + "> In the context of programming, a function is a named sequence of statements that performs a computation. When you define a function, you specify the name and the sequence of statements. Later, you can “call” the function by name." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Functions best practices\n", + "\n", + "* is name proper for the functionality\n", + "* it should do one thing and only one thing.\n", + "* has documentation\n", + "* relatively short one" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.1 Function calls" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "str" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# type is the function name\n", + "# 42 is the argument\n", + "\n", + "type('a')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> a function “takes” an argument and “returns” a result" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "32" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "int('32')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "ename": "ValueError", + "evalue": "invalid literal for int() with base 10: 'Hello'", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Hello'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mValueError\u001b[0m: invalid literal for int() with base 10: 'Hello'" + ], + "output_type": "error" + } + ], + "source": [ + "int('Hello')" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "int(3.99999)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "-2" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "int(-2.3)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "32.0" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "float(32)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3.14159" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "float('3.14159')" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'3.14159'" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "str(3.14159)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'32'" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "str(32)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.2 Math functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Python has a math module that provides most of the familiar mathematical functions. A module is a file that contains a collection of related functions." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import math\n", + "math" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> This format is called dot notation." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.2184874961635637" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example uses math.log10 to compute a signal-to-noise ratio in decibels \n", + "\n", + "signal_power = 5\n", + "noise_power = 3\n", + "ratio = signal_power / noise_power\n", + "decibels = 10 * math.log10(ratio)\n", + "decibels" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#The second example finds the sine of radians. The name of the variable is a \n", + "# hint that sin and the other trigonometric functions (cos, tan, etc.) take arguments in radians. \n", + "# To convert from degrees to radians, divide by 180 and multiply by π:\n", + "\n", + "radians = 0.7\n", + "height = math.sin(radians)\n", + "height" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The expression math.pi gets the variable pi from the math module. Its value is a \n", + "# floating-point approximation of π, accurate to about 15 digits.\n", + "\n", + "degrees = 45\n", + "radians = degrees / 180.0 * math.pi\n", + "math.sin(radians)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# verify the previous result by\n", + "\n", + "math.sqrt(2) / 2.0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> add meaningful and descriptive comments to your functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.3 Composition" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> One of the most useful features of programming languages is their ability to take small building blocks and compose them." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "ename": "NameError", + "evalue": "name 'degrees' is not defined", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdegrees\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;36m360.0\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m2\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpi\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mNameError\u001b[0m: name 'degrees' is not defined" + ], + "output_type": "error" + } + ], + "source": [ + "x = math.sin(degrees / 360.0 * 2 * math.pi)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.01745240643728351" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = math.sin(1 / 360.0 * 2 * math.pi)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = math.exp(math.log(x+1))\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "600" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "hours = 10\n", + "minutes = hours * 60\n", + "minutes" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "ename": "SyntaxError", + "evalue": "can't assign to operator (, line 1)", + "traceback": [ + "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m hours * 60 = minutes\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m can't assign to operator\n" + ], + "output_type": "error" + } + ], + "source": [ + "hours * 60 = minutes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> avoid confusing and misleading compositions\n", + "\n", + "> keep to the KISS principle - keep it simple, stupid" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.4 Adding new functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> A function definition specifies the name of a new function and the sequence of statements that run when the function is called." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "# def - is a keyword that indicates that this is a function definition\n", + "# print_lyrics - the function name\n", + "# () - indicate that this function doesn’t take any arguments.\n", + "\n", + "def print_lyrics():\n", + " print(\"I'm a lumberjack, and I'm okay.\")\n", + " print(\"I sleep all night and I work all day.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> The first line of the function definition is called the header; the rest is called the body. \n", + "\n", + "> Single quotes and double quotes do the same thing in most situations;" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "function" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(print_lyrics)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "print(print_lyrics)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> The syntax for calling the new function is the same as for built-in functions:" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "I'm a lumberjack, and I'm okay.\n", + "I sleep all night and I work all day.\n" + ] + } + ], + "source": [ + "print_lyrics()" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "def repeat_lyrics():\n", + " print_lyrics()\n", + " print_lyrics()" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "I'm a lumberjack, and I'm okay.\n", + "I sleep all night and I work all day.\n", + "I'm a lumberjack, and I'm okay.\n", + "I sleep all night and I work all day.\n" + ] + } + ], + "source": [ + "repeat_lyrics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.5 Definitions and uses" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> This program contains two function definitions: print_lyrics and repeat_lyrics. Function definitions get executed just like other statements, but the effect is to create function objects.\n", + "\n", + "> You have to create a function before you can run it. In other words, the function definition has to run before the function gets called." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "I'm a lumberjack, and I'm okay.\n", + "I sleep all night and I work all day.\n", + "I'm a lumberjack, and I'm okay.\n", + "I sleep all night and I work all day.\n" + ] + } + ], + "source": [ + "def print_lyrics():\n", + " print(\"I'm a lumberjack, and I'm okay.\")\n", + " print(\"I sleep all night and I work all day.\")\n", + "\n", + "def repeat_lyrics():\n", + " print_lyrics()\n", + " print_lyrics()\n", + "\n", + "repeat_lyrics()" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "ename": "NameError", + "evalue": "name 'repeat_lyrics_new' is not defined", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mrepeat_lyrics_new\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrepeat_lyrics_new\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mprint_lyrics\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mprint_lyrics\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mNameError\u001b[0m: name 'repeat_lyrics_new' is not defined" + ], + "output_type": "error" + } + ], + "source": [ + "repeat_lyrics_new()\n", + "\n", + "def repeat_lyrics_new():\n", + " print_lyrics()\n", + " print_lyrics()\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.6 Flow of execution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> To ensure that a function is defined before its first use, you have to know the order statements run in, which is called the flow of execution.\n", + "\n", + "> Execution always begins at the first statement of the program. Statements are run one at a time, in order from top to bottom.\n", + "\n", + "> In summary, when you read a program, you don’t always want to read from top to bottom. Sometimes it makes more sense if you follow the flow of execution.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\n", + "3\n", + "2\n" + ] + } + ], + "source": [ + "def print_lyrics_1():\n", + " print(\"1\")\n", + "\n", + "def print_lyrics_2():\n", + " print(\"2\") \n", + " \n", + "def print_lyrics_3():\n", + " print(\"3\")\n", + "\n", + "def repeat_lyrics():\n", + " print_lyrics_1()\n", + " print_lyrics_3()\n", + " print_lyrics_2()\n", + "\n", + "repeat_lyrics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.7 Parameters and arguments" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Some of the functions we have seen require arguments. For example, when you call math.sin you pass a number as an argument. Some functions take more than one argument: math.pow takes two, the base and the exponent.\n", + "\n", + "> Inside the function, the arguments are assigned to variables called parameters. " + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "def print_twice(bruce):\n", + " print(bruce)\n", + " print(bruce)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Spam\n", + "Spam\n", + "42\n", + "42\n", + "3.141592653589793\n", + "3.141592653589793\n" + ] + } + ], + "source": [ + "print_twice('Spam')\n", + "print_twice(42)\n", + "print_twice(math.pi)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Spam Spam Spam Spam \n", + "Spam Spam Spam Spam \n" + ] + } + ], + "source": [ + "print_twice('Spam '*4)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "-1.0\n", + "-1.0\n" + ] + } + ], + "source": [ + "print_twice(math.cos(math.pi))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> The argument is evaluated before the function is called, so in the examples the expressions 'Spam '*4 and math.cos(math.pi) are only evaluated once\n", + "\n", + "> The name of the variable we pass as an argument (michael) has nothing to do with the name of the parameter (bruce)." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Eric, the half a bee.\n", + "Eric, the half a bee.\n" + ] + } + ], + "source": [ + "michael = 'Eric, the half a bee.'\n", + "print_twice(michael)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From bbf5a4bec72ffae84077b1ba7aa7ec14a0d1f288 Mon Sep 17 00:00:00 2001 From: softhints Date: Wed, 20 Feb 2019 10:09:49 +0200 Subject: [PATCH 17/76] Dataframe_to_json_nested --- notebooks/Dataframe_to_json_nested.ipynb | 1117 ++++++++++++++++++++++ 1 file changed, 1117 insertions(+) create mode 100644 notebooks/Dataframe_to_json_nested.ipynb diff --git a/notebooks/Dataframe_to_json_nested.ipynb b/notebooks/Dataframe_to_json_nested.ipynb new file mode 100644 index 0000000..2f1dee5 --- /dev/null +++ b/notebooks/Dataframe_to_json_nested.ipynb @@ -0,0 +1,1117 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## pandas DataFrame generate n-level hierarchical JSON\n", + "\n", + "* hierarchical data\n", + "* mapping pandas columns\n", + "* Pretty print json and dataframe split\n", + "* generate n-level hierarchical JSON" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
parent_idchild_id
031114321
120103102
230004023
310002010
440235321
\n", + "
" + ], + "text/plain": [ + " parent_id child_id\n", + "0 3111 4321\n", + "1 2010 3102\n", + "2 3000 4023\n", + "3 1000 2010\n", + "4 4023 5321" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "import json\n", + "df = pd.DataFrame(\n", + " {\n", + " 'parent_id': [3111, 2010, 3000, 1000, 4023, 3011, 3033, 5010, 3011, 3102, 2010, 4023, 2110, 2100, 1000, 5010, 2110, 1000, 5010, 3033],\n", + " 'child_id': [4321, 3102, 4023, 2010, 5321, 4200, 4113, 6525, 4010, 4001, 3011, 5010, 3000, 3033, 2110, 6100, 3111, 2100, 6016, 4311]\n", + " }\n", + ")\n", + "\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "lst = json.loads(df.to_json(orient='split'))['data']\n", + "\n", + "# Build a directed graph and a list of all names that have no parent\n", + "graph = {name: set() for tup in lst for name in tup}\n", + "has_parent = {name: False for tup in lst for name in tup}\n", + "for parent, child in lst:\n", + " graph[parent].add(child)\n", + " has_parent[child] = True\n", + "\n", + "# All names that have absolutely no parent:\n", + "roots = [name for name, parents in has_parent.items() if not parents]\n", + "\n", + "# traversal of the graph (doesn't care about duplicates and cycles)\n", + "def traverse(hierarchy, graph, names):\n", + " for name in names:\n", + " hierarchy[name] = traverse({}, graph, graph[name])\n", + " return hierarchy\n", + "\n", + "result = traverse({}, graph, roots)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"1000\": {\n", + " \"2010\": {\n", + " \"3011\": {\n", + " \"4200\": {},\n", + " \"4010\": {}\n", + " },\n", + " \"3102\": {\n", + " \"4001\": {}\n", + " }\n", + " },\n", + " \"2100\": {\n", + " \"3033\": {\n", + " \"4113\": {},\n", + " \"4311\": {}\n", + " }\n", + " },\n", + " \"2110\": {\n", + " \"3000\": {\n", + " \"4023\": {\n", + " \"5321\": {},\n", + " \"5010\": {\n", + " \"6016\": {},\n", + " \"6100\": {},\n", + " \"6525\": {}\n", + " }\n", + " }\n", + " },\n", + " \"3111\": {\n", + " \"4321\": {}\n", + " }\n", + " }\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "import json\n", + "print(json.dumps(result, indent=2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Column Mapping" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "df.parent_id = df.parent_id.astype('category')" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "df.child_id = df.child_id.astype('category')" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Int64Index([2010, 2100, 2110, 3000, 3011, 3033, 3102, 3111, 4001, 4010, 4023,\n", + " 4113, 4200, 4311, 4321, 5010, 5321, 6016, 6100, 6525],\n", + " dtype='int64')" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['child_id'].cat.categories" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Int64Index([1000, 2010, 2100, 2110, 3000, 3011, 3033, 3102, 3111, 4023, 5010], dtype='int64')" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['parent_id'].cat.categories" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "df['parent_id_new'] = df.parent_id.map({1000:\"A\",\t2010:\"B\",\t2100:\"C\",\t2110:\"D\",\t3000:\"E\",\t3011:\"F\",\t3033:\"G\",\t3102:\"H\",\t3111:\"I\",\t4023:\"K\",\t5010:\"L\"\n", + "})" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "df['child_id_new'] = df.child_id.map({1000:\"A\",\t2010:\"B\",\t2100:\"C\",\t2110:\"D\",\t3000:\"E\",\t3011:\"F\",\t3033:\"G\",\t3102:\"H\",\t3111:\"I\",\t4023:\"K\",\t5010:\"L\",\t4001:\"M\",\t4010:\"N\",\t4113:\"O\",\t4200:\"P\",\t4311:\"Q\",\t4321:\"R\",\t6016:\"S\",\t6525:\"T\",\t6100:\"U\",\t5321:\"V\"\n", + "\n", + "})" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
parent_idchild_idparent_id_newchild_id_new
031114321IR
120103102BH
230004023EK
310002010AB
440235321KV
530114200FP
630334113GO
750106525LT
830114010FN
931024001HM
1020103011BF
1140235010KL
1221103000DE
1321003033CG
1410002110AD
1550106100LU
1621103111DI
1710002100AC
1850106016LS
1930334311GQ
\n", + "
" + ], + "text/plain": [ + " parent_id child_id parent_id_new child_id_new\n", + "0 3111 4321 I R\n", + "1 2010 3102 B H\n", + "2 3000 4023 E K\n", + "3 1000 2010 A B\n", + "4 4023 5321 K V\n", + "5 3011 4200 F P\n", + "6 3033 4113 G O\n", + "7 5010 6525 L T\n", + "8 3011 4010 F N\n", + "9 3102 4001 H M\n", + "10 2010 3011 B F\n", + "11 4023 5010 K L\n", + "12 2110 3000 D E\n", + "13 2100 3033 C G\n", + "14 1000 2110 A D\n", + "15 5010 6100 L U\n", + "16 2110 3111 D I\n", + "17 1000 2100 A C\n", + "18 5010 6016 L S\n", + "19 3033 4311 G Q" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pretty print json and dataframe split" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"index\": [\n", + " 0,\n", + " 1,\n", + " 2,\n", + " 3,\n", + " 4,\n", + " 5,\n", + " 6,\n", + " 7,\n", + " 8,\n", + " 9,\n", + " 10,\n", + " 11,\n", + " 12,\n", + " 13,\n", + " 14,\n", + " 15,\n", + " 16,\n", + " 17,\n", + " 18,\n", + " 19\n", + " ],\n", + " \"columns\": [\n", + " \"parent_id\",\n", + " \"child_id\",\n", + " \"parent_id_new\",\n", + " \"child_id_new\"\n", + " ],\n", + " \"data\": [\n", + " [\n", + " 3111,\n", + " 4321,\n", + " \"I\",\n", + " \"R\"\n", + " ],\n", + " [\n", + " 2010,\n", + " 3102,\n", + " \"B\",\n", + " \"H\"\n", + " ],\n", + " [\n", + " 3000,\n", + " 4023,\n", + " \"E\",\n", + " \"K\"\n", + " ],\n", + " [\n", + " 1000,\n", + " 2010,\n", + " \"A\",\n", + " \"B\"\n", + " ],\n", + " [\n", + " 4023,\n", + " 5321,\n", + " \"K\",\n", + " \"V\"\n", + " ],\n", + " [\n", + " 3011,\n", + " 4200,\n", + " \"F\",\n", + " \"P\"\n", + " ],\n", + " [\n", + " 3033,\n", + " 4113,\n", + " \"G\",\n", + " \"O\"\n", + " ],\n", + " [\n", + " 5010,\n", + " 6525,\n", + " \"L\",\n", + " \"T\"\n", + " ],\n", + " [\n", + " 3011,\n", + " 4010,\n", + " \"F\",\n", + " \"N\"\n", + " ],\n", + " [\n", + " 3102,\n", + " 4001,\n", + " \"H\",\n", + " \"M\"\n", + " ],\n", + " [\n", + " 2010,\n", + " 3011,\n", + " \"B\",\n", + " \"F\"\n", + " ],\n", + " [\n", + " 4023,\n", + " 5010,\n", + " \"K\",\n", + " \"L\"\n", + " ],\n", + " [\n", + " 2110,\n", + " 3000,\n", + " \"D\",\n", + " \"E\"\n", + " ],\n", + " [\n", + " 2100,\n", + " 3033,\n", + " \"C\",\n", + " \"G\"\n", + " ],\n", + " [\n", + " 1000,\n", + " 2110,\n", + " \"A\",\n", + " \"D\"\n", + " ],\n", + " [\n", + " 5010,\n", + " 6100,\n", + " \"L\",\n", + " \"U\"\n", + " ],\n", + " [\n", + " 2110,\n", + " 3111,\n", + " \"D\",\n", + " \"I\"\n", + " ],\n", + " [\n", + " 1000,\n", + " 2100,\n", + " \"A\",\n", + " \"C\"\n", + " ],\n", + " [\n", + " 5010,\n", + " 6016,\n", + " \"L\",\n", + " \"S\"\n", + " ],\n", + " [\n", + " 3033,\n", + " 4311,\n", + " \"G\",\n", + " \"Q\"\n", + " ]\n", + " ]\n", + "}\n" + ] + } + ], + "source": [ + "res = df.to_dict(orient='split')\n", + "import json\n", + "print(json.dumps(res, indent=2))" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"parent_id\": {\n", + " \"0\": 3111,\n", + " \"1\": 2010,\n", + " \"2\": 3000,\n", + " \"3\": 1000,\n", + " \"4\": 4023,\n", + " \"5\": 3011,\n", + " \"6\": 3033,\n", + " \"7\": 5010,\n", + " \"8\": 3011,\n", + " \"9\": 3102,\n", + " \"10\": 2010,\n", + " \"11\": 4023,\n", + " \"12\": 2110,\n", + " \"13\": 2100,\n", + " \"14\": 1000,\n", + " \"15\": 5010,\n", + " \"16\": 2110,\n", + " \"17\": 1000,\n", + " \"18\": 5010,\n", + " \"19\": 3033\n", + " },\n", + " \"child_id\": {\n", + " \"0\": 4321,\n", + " \"1\": 3102,\n", + " \"2\": 4023,\n", + " \"3\": 2010,\n", + " \"4\": 5321,\n", + " \"5\": 4200,\n", + " \"6\": 4113,\n", + " \"7\": 6525,\n", + " \"8\": 4010,\n", + " \"9\": 4001,\n", + " \"10\": 3011,\n", + " \"11\": 5010,\n", + " \"12\": 3000,\n", + " \"13\": 3033,\n", + " \"14\": 2110,\n", + " \"15\": 6100,\n", + " \"16\": 3111,\n", + " \"17\": 2100,\n", + " \"18\": 6016,\n", + " \"19\": 4311\n", + " },\n", + " \"parent_id_new\": {\n", + " \"0\": \"I\",\n", + " \"1\": \"B\",\n", + " \"2\": \"E\",\n", + " \"3\": \"A\",\n", + " \"4\": \"K\",\n", + " \"5\": \"F\",\n", + " \"6\": \"G\",\n", + " \"7\": \"L\",\n", + " \"8\": \"F\",\n", + " \"9\": \"H\",\n", + " \"10\": \"B\",\n", + " \"11\": \"K\",\n", + " \"12\": \"D\",\n", + " \"13\": \"C\",\n", + " \"14\": \"A\",\n", + " \"15\": \"L\",\n", + " \"16\": \"D\",\n", + " \"17\": \"A\",\n", + " \"18\": \"L\",\n", + " \"19\": \"G\"\n", + " },\n", + " \"child_id_new\": {\n", + " \"0\": \"R\",\n", + " \"1\": \"H\",\n", + " \"2\": \"K\",\n", + " \"3\": \"B\",\n", + " \"4\": \"V\",\n", + " \"5\": \"P\",\n", + " \"6\": \"O\",\n", + " \"7\": \"T\",\n", + " \"8\": \"N\",\n", + " \"9\": \"M\",\n", + " \"10\": \"F\",\n", + " \"11\": \"L\",\n", + " \"12\": \"E\",\n", + " \"13\": \"G\",\n", + " \"14\": \"D\",\n", + " \"15\": \"U\",\n", + " \"16\": \"I\",\n", + " \"17\": \"C\",\n", + " \"18\": \"S\",\n", + " \"19\": \"Q\"\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "res = df.to_dict()\n", + "import json\n", + "print(json.dumps(res, indent=2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Traverse a graph" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "lst = [('Linux','Debian'), ('Linux','Red Hat'), ('Debian','Ubuntu'), ('Debian','Knoppix'), \n", + " ('Ubuntu','Linux Mint'), ('Red Hat','CentOS'), ('Red Hat','Mandrake')]\n", + "\n", + "# Build a directed graph and a list of all names that have no parent\n", + "graph = {name: set() for tup in lst for name in tup}\n", + "has_parent = {name: False for tup in lst for name in tup}\n", + "for parent, child in lst:\n", + " graph[parent].add(child)\n", + " has_parent[child] = True\n", + "\n", + "# All names that have absolutely no parent:\n", + "roots = [name for name, parents in has_parent.items() if not parents]\n", + "\n", + "# traversal of the graph (doesn't care about duplicates and cycles)\n", + "def traverse(hierarchy, graph, names):\n", + " for name in names:\n", + " hierarchy[name] = traverse({}, graph, graph[name])\n", + " return hierarchy\n", + "\n", + "nested_json = traverse({}, graph, roots)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"Linux\": {\n", + " \"Debian\": {\n", + " \"Ubuntu\": {\n", + " \"Linux Mint\": {}\n", + " },\n", + " \"Knoppix\": {}\n", + " },\n", + " \"Red Hat\": {\n", + " \"Mandrake\": {},\n", + " \"CentOS\": {}\n", + " }\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "import json\n", + "print(json.dumps(nested_json, indent=2))" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'Linux': {'Debian', 'Red Hat'},\n", + " 'Debian': {'Knoppix', 'Ubuntu'},\n", + " 'Red Hat': {'CentOS', 'Mandrake'},\n", + " 'Ubuntu': {'Linux Mint'},\n", + " 'Knoppix': set(),\n", + " 'Linux Mint': set(),\n", + " 'CentOS': set(),\n", + " 'Mandrake': set()}" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Build a directed graph and a list of all names that have no parent\n", + "\n", + "graph = {name: set() for tup in lst for name in tup}\n", + "has_parent = {name: False for tup in lst for name in tup}\n", + "for parent, child in lst:\n", + " graph[parent].add(child)\n", + " has_parent[child] = True\n", + "\n", + "graph " + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'Linux': False,\n", + " 'Debian': True,\n", + " 'Red Hat': True,\n", + " 'Ubuntu': True,\n", + " 'Knoppix': True,\n", + " 'Linux Mint': True,\n", + " 'CentOS': True,\n", + " 'Mandrake': True}" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "has_parent" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Linux']" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# All names that have absolutely no parent:\n", + "roots = [name for name, parents in has_parent.items() if not parents]\n", + "roots" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
parent_idchild_id
031114321
120103102
230004023
310002010
440235321
\n", + "
" + ], + "text/plain": [ + " parent_id child_id\n", + "0 3111 4321\n", + "1 2010 3102\n", + "2 3000 4023\n", + "3 1000 2010\n", + "4 4023 5321" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "import json\n", + "\n", + "df = pd.DataFrame(\n", + " {\n", + " 'parent_id': [3111, 2010, 3000, 1000, 4023, 3011, 3033, 5010, 3011, 3102, 2010, 4023, 2110, 2100, 1000, 5010, 2110, 1000, 5010, 3033],\n", + " 'child_id': [4321, 3102, 4023, 2010, 5321, 4200, 4113, 6525, 4010, 4001, 3011, 5010, 3000, 3033, 2110, 6100, 3111, 2100, 6016, 4311]\n", + " }\n", + ")\n", + "\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "lst = json.loads(df.to_json(orient='split'))['data']\n", + "\n", + "# Build a directed graph and a list of all names that have no parent\n", + "graph = {name: set() for tup in lst for name in tup}\n", + "has_parent = {name: False for tup in lst for name in tup}\n", + "for parent, child in lst:\n", + " graph[parent].add(child)\n", + " has_parent[child] = True\n", + "\n", + "# All names that have absolutely no parent:\n", + "roots = [name for name, parents in has_parent.items() if not parents]\n", + "\n", + "# traversal of the graph (doesn't care about duplicates and cycles)\n", + "def traverse(hierarchy, graph, names):\n", + " for name in names:\n", + " hierarchy[name] = traverse({}, graph, graph[name])\n", + " return hierarchy\n", + "\n", + "result = traverse({}, graph, roots)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"1000\": {\n", + " \"2010\": {\n", + " \"3011\": {\n", + " \"4200\": {},\n", + " \"4010\": {}\n", + " },\n", + " \"3102\": {\n", + " \"4001\": {}\n", + " }\n", + " },\n", + " \"2100\": {\n", + " \"3033\": {\n", + " \"4113\": {},\n", + " \"4311\": {}\n", + " }\n", + " },\n", + " \"2110\": {\n", + " \"3000\": {\n", + " \"4023\": {\n", + " \"5321\": {},\n", + " \"5010\": {\n", + " \"6016\": {},\n", + " \"6100\": {},\n", + " \"6525\": {}\n", + " }\n", + " }\n", + " },\n", + " \"3111\": {\n", + " \"4321\": {}\n", + " }\n", + " }\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "import json\n", + "print(json.dumps(result, indent=2))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 96ec89bd76ba9532adc4869902f9b0c228dc4e75 Mon Sep 17 00:00:00 2001 From: softhints Date: Thu, 21 Feb 2019 12:06:49 +0200 Subject: [PATCH 18/76] Python_group_or_sort_list_of_lists_by_common_element --- ...sort_list_of_lists_by_common_element.ipynb | 705 ++++++++++++++++++ 1 file changed, 705 insertions(+) create mode 100644 notebooks/Python_group_or_sort_list_of_lists_by_common_element.ipynb diff --git a/notebooks/Python_group_or_sort_list_of_lists_by_common_element.ipynb b/notebooks/Python_group_or_sort_list_of_lists_by_common_element.ipynb new file mode 100644 index 0000000..e6ed72a --- /dev/null +++ b/notebooks/Python_group_or_sort_list_of_lists_by_common_element.ipynb @@ -0,0 +1,705 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Python group or sort list of lists by common element\n", + "\n", + " * Grouping of lists of list by position\n", + " * Grouping of lists of list by key\n", + " * Sort and group flatten lists of lists\n", + " * Grouping list of lists different sizes\n", + " \n", + " #### Bonus tips\n", + " \n", + " \n", + " * Sort list of lists elements\n", + " * sort maps by key or value\n", + " * Iterating list over every two elements\n", + " * Iterating list over every N elements" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# equaly sized list of lists \n", + "[[\"Linux\", 0], [\"Windows 7\",1], [\"Ubuntu\",0], [\"Windows 10\",1], [\"MacOS\",2], [\"Linux Mint\",0]]\n", + "\n", + "# Different sized list of lists \n", + "[[\"Linux\", 0, 22], [\"Windows 7\",1 , 5, 6], [\"Ubuntu\",0], [\"Linux Mint\"]]\n", + "\n", + "# flatten\n", + "[\"Linux\", 0, \"Windows 7\",1, \"Ubuntu\",0, \"Windows 10\",1, \"MacOS\",2, \"Linux Mint\",0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Grouping of lists of list by position (size 2)" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['Linux', 'Ubuntu', 'Linux Mint'], ['Windows 7', 'Windows 10'], ['MacOS']]" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# equaly sized list of lists \n", + "raw_list = [[\"Linux\", 0], [\"Windows 7\",1], [\"Ubuntu\",0], [\"Windows 10\",1], [\"MacOS\",2], [\"Linux Mint\",0]]\n", + "\n", + "keys = set(map(lambda x:x[1], raw_list))\n", + "new_list = [[y[0] for y in raw_list if y[1]==x] for x in keys]\n", + "\n", + "new_list" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{0: ['Linux', 'Ubuntu', 'Linux Mint'],\n", + " 1: ['Windows 7', 'Windows 10'],\n", + " 2: ['MacOS']}" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\n", + "raw_list = [[\"Linux\", 0], [\"Windows 7\",1], [\"Ubuntu\",0], [\"Windows 10\",1], [\"MacOS\",2], [\"Linux Mint\",0]]\n", + "\n", + "keys = set(map(lambda x:x[1], raw_list))\n", + "new_list = {x:[y[0] for y in raw_list if y[1]==x] for x in keys}\n", + "\n", + "new_list" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Grouping of lists of list by position (size 4)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'Ubuntu': [['Xenial Xerus', 0.4], ['Bionic Beaver', 0]],\n", + " 'Linux Mint': [['Rosa', 17.3], ['Sonya', 18.2]]}" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "raw_list = [\n", + " ['Linux Mint', 17, 'Rosa', 17.3], \n", + " ['Linux Mint', 18, 'Sonya', 18.2],\n", + " ['Ubuntu', 16, 'Xenial Xerus', 0.4],\n", + " ['Ubuntu', 18, 'Bionic Beaver', 0]]\n", + "\n", + "keys = set(map(lambda x:x[0], raw_list))\n", + "unsorted_map = {x:[y[2:] for y in raw_list if y[0]==x] for x in keys}\n", + "\n", + "unsorted_map" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### List of list different size" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "raw_list = [\n", + " ['Linux Mint', 17, 'Rosa', 17.3], \n", + " ['Linux Mint', 18, 'Sonya', 18.2],\n", + " ['Ubuntu', 16, 'Xenial Xerus', 0.4],\n", + " ['Ubuntu', 18, 'Bionic Beaver', 0],\n", + " \n", + " ['Windows', 7, 'Home'],\n", + " ['Windows', 7, 'Profesional'],\n", + " ['Windows', 10, 'Ultimate']\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'Ubuntu': [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']],\n", + " 'Linux Mint': [[17, 'Rosa'], [18, 'Sonya']],\n", + " 'Windows': [[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']]}" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "keys = set(map(lambda x:x[0], raw_list))\n", + "unsorted_map = {x:[y[1:3] for y in raw_list if y[0]==x] for x in keys}\n", + "unsorted_map" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Sort python map by key or value" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Linux Mint', 'Ubuntu', 'Windows']" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sorted(unsorted_map.keys())" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[[[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']],\n", + " [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']],\n", + " [[17, 'Rosa'], [18, 'Sonya']]]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sorted(unsorted_map.values())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Sort list of lists by key" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Linux Mint: [[17, 'Rosa'], [18, 'Sonya']]\n", + "Ubuntu: [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']]\n", + "Windows: [[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']]\n" + ] + } + ], + "source": [ + "for key in sorted(unsorted_map.keys()):\n", + " print (\"%s: %s\" % (key, unsorted_map[key]))" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Windows: [[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']]\n", + "Ubuntu: [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']]\n", + "Linux Mint: [[17, 'Rosa'], [18, 'Sonya']]\n" + ] + } + ], + "source": [ + "for key in sorted(unsorted_map.keys(), reverse=True):\n", + " print (\"%s: %s\" % (key, unsorted_map[key]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Sort and group flatten lists of lists" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "os_list = [\n", + " 'Ubuntu 18',\n", + " 'This article informs you about Ubuntu 18.04 release date,',\n", + " 'Released',\n", + " \n", + " 'Ubuntu 20',\n", + " 'The desktop image allows you to try Ubuntu without changing y..',\n", + " 'Not Released',\n", + " \n", + " 'Ubuntu 19',\n", + " 'Ubuntu is an open source software operating system that runs from',\n", + " 'Released',\n", + " \n", + " 'Linux mint 18',\n", + " 'Linux Mint is an elegant, easy to use, up to date and comfortable',\n", + " 'Released',\n", + " \n", + " 'Linux mint 20',\n", + " 'Suggestion: For Mint 20 to go full Debian',\n", + " 'Not Released',\n", + " \n", + " 'Linux mint 19',\n", + " 'Linux Mint 19 is a long term support release which will be supported until 2023',\n", + " 'Released',\n", + "\n", + " 'Windows 7',\n", + " 'Windows 7 is a personal computer operating system that was ..',\n", + " 'Windows 10',\n", + " 'Windows 10 is a series of personal computer operating systems',\n", + " \"Windows XP\",\n", + " 'Windows XP is old, and Microsoft no longer provides official support']" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1, 2]\n", + "[3, 4]\n", + "[5, 6]\n" + ] + } + ], + "source": [ + "# iterating over every two elements\n", + "test_list = [1, 2, 3, 4, 5, 6]\n", + "\n", + "for i in range(0, len(test_list), 2):\n", + " print (test_list[i:i+2])" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[0, 1, 2]\n", + "[3, 4, 5]\n", + "[6, 7, 8]\n", + "[9]\n" + ] + } + ], + "source": [ + "# iterating over every N elements\n", + "test_list = list(range(0, 10))\n", + "\n", + "for i in range(0, len(test_list), 3):\n", + " print (test_list[i:i+3])" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Windows 7',\n", + " 'Windows 7 is a personal computer operating system that was ..',\n", + " 'Windows 10',\n", + " 'Windows 10 is a series of personal computer operating systems',\n", + " 'Windows XP',\n", + " 'Windows XP is old, and Microsoft no longer provides official support']" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list3 = []\n", + "last = 0\n", + "for i in range(0, len(os_list), 3):\n", + " if i+2 < len(os_list) and os_list[i+2] in ['Released', 'Not Released']:\n", + " list3.append(os_list[i:i+3])\n", + " last = i+3\n", + "list2 = os_list[last:]\n", + "list2" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['Ubuntu 18',\n", + " 'This article informs you about Ubuntu 18.04 release date,',\n", + " 'Released'],\n", + " ['Ubuntu 20',\n", + " 'The desktop image allows you to try Ubuntu without changing y..',\n", + " 'Not Released'],\n", + " ['Ubuntu 19',\n", + " 'Ubuntu is an open source software operating system that runs from',\n", + " 'Released'],\n", + " ['Linux mint 18',\n", + " 'Linux Mint is an elegant, easy to use, up to date and comfortable',\n", + " 'Released'],\n", + " ['Linux mint 20',\n", + " 'Suggestion: For Mint 20 to go full Debian',\n", + " 'Not Released'],\n", + " ['Linux mint 19',\n", + " 'Linux Mint 19 is a long term support release which will be supported until 2023',\n", + " 'Released']]" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list3" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "def sortList(working_list, category, category2):\n", + " listx = []\n", + " listy = []\n", + " last_section = 0\n", + " for i in range(0, len(os_list) - 3, 3):\n", + " if working_list[i + 2] == category:\n", + " listy.append(working_list[i])\n", + " listy.append(working_list[i + 1])\n", + " last_section = i + 2\n", + " elif working_list[i + 2] == category2:\n", + " listx.append(working_list[i])\n", + " listx.append(working_list[i + 1])\n", + " last_section = i + 2\n", + "\n", + " if last_section > 0:\n", + " listz = working_list[(last_section + 1):]\n", + " else:\n", + " listz = working_list[(last_section):]\n", + "\n", + " return listx, listy, listz" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Ubuntu 20',\n", + " 'The desktop image allows you to try Ubuntu without changing y..',\n", + " 'Linux mint 20',\n", + " 'Suggestion: For Mint 20 to go full Debian']" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "listx, listy, listz = sortList(os_list, 'Released', 'Not Released')\n", + "listx" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Ubuntu 18',\n", + " 'This article informs you about Ubuntu 18.04 release date,',\n", + " 'Ubuntu 19',\n", + " 'Ubuntu is an open source software operating system that runs from',\n", + " 'Linux mint 18',\n", + " 'Linux Mint is an elegant, easy to use, up to date and comfortable',\n", + " 'Linux mint 19',\n", + " 'Linux Mint 19 is a long term support release which will be supported until 2023']" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "listy" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Windows 7',\n", + " 'Windows 7 is a personal computer operating system that was ..',\n", + " 'Windows 10',\n", + " 'Windows 10 is a series of personal computer operating systems',\n", + " 'Windows XP',\n", + " 'Windows XP is old, and Microsoft no longer provides official support']" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "listz" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Generic solution for flatten list" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['Ubuntu 18',\n", + " 'This article informs you about Ubuntu 18.04 release date,',\n", + " 'Released'],\n", + " ['Ubuntu 20',\n", + " 'The desktop image allows you to try Ubuntu without changing y..',\n", + " 'Not Released'],\n", + " ['Ubuntu 19',\n", + " 'Ubuntu is an open source software operating system that runs from',\n", + " 'Released'],\n", + " ['Linux mint 18',\n", + " 'Linux Mint is an elegant, easy to use, up to date and comfortable',\n", + " 'Released'],\n", + " ['Linux mint 20',\n", + " 'Suggestion: For Mint 20 to go full Debian',\n", + " 'Not Released'],\n", + " ['Linux mint 19',\n", + " 'Linux Mint 19 is a long term support release which will be supported until 2023',\n", + " 'Released']]" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "os_list = [\n", + " \n", + " \n", + " 'Windows 10',\n", + " 'Windows 10 is a series of personal computer operating systems',\n", + " \"Windows XP\",\n", + " 'Windows XP is old, and Microsoft no longer provides official support',\n", + " \n", + " 'Ubuntu 18',\n", + " 'This article informs you about Ubuntu 18.04 release date,',\n", + " 'Released',\n", + "\n", + " 'Ubuntu 20',\n", + " 'The desktop image allows you to try Ubuntu without changing y..',\n", + " 'Not Released',\n", + "\n", + " 'Windows 7',\n", + " 'Windows 7 is a personal computer operating system that was ..',\n", + "\n", + " 'Ubuntu 19',\n", + " 'Ubuntu is an open source software operating system that runs from',\n", + " 'Released',\n", + "\n", + " 'Linux mint 18',\n", + " 'Linux Mint is an elegant, easy to use, up to date and comfortable',\n", + " 'Released',\n", + "\n", + " 'Linux mint 20',\n", + " 'Suggestion: For Mint 20 to go full Debian',\n", + " 'Not Released',\n", + "\n", + " 'Linux mint 19',\n", + " 'Linux Mint 19 is a long term support release which will be supported until 2023',\n", + " 'Released',\n", + "\n", + "]\n", + "\n", + "list3 = []\n", + "list2 = []\n", + "cur = 0\n", + "\n", + "os_list_tmp = os_list\n", + "\n", + "while cur <= len(os_list_tmp):\n", + " cur = 0\n", + " if cur+2 < len(os_list_tmp) and os_list_tmp[cur+2] in ['Released', 'Not Released']:\n", + " list3.append(os_list_tmp[cur:cur+3])\n", + " cur = cur + 3\n", + " else:\n", + " list2.append(os_list_tmp[cur:cur+2])\n", + " cur = cur + 2\n", + " os_list_tmp = os_list_tmp[cur:]\n", + "list3" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['Windows 10',\n", + " 'Windows 10 is a series of personal computer operating systems'],\n", + " ['Windows XP',\n", + " 'Windows XP is old, and Microsoft no longer provides official support'],\n", + " ['Windows 7',\n", + " 'Windows 7 is a personal computer operating system that was ..']]" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 90051a3b0f0bd26d72a69175cf14b6afcad47238 Mon Sep 17 00:00:00 2001 From: softhints Date: Wed, 27 Feb 2019 20:05:43 +0200 Subject: [PATCH 19/76] Pandas_How_add_new_column_existing_DataFrame --- ...ow_add_new_column_existing_DataFrame.ipynb | 1248 +++++++++++++++++ 1 file changed, 1248 insertions(+) create mode 100644 notebooks/pandas/Pandas_How_add_new_column_existing_DataFrame.ipynb diff --git a/notebooks/pandas/Pandas_How_add_new_column_existing_DataFrame.ipynb b/notebooks/pandas/Pandas_How_add_new_column_existing_DataFrame.ipynb new file mode 100644 index 0000000..aabb4ae --- /dev/null +++ b/notebooks/pandas/Pandas_How_add_new_column_existing_DataFrame.ipynb @@ -0,0 +1,1248 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas How to add new column to existing DataFrame\n", + "\n", + "* add completely new column\n", + "* add new column based on existing column\n", + "* matching the content of the DataFrame\n", + "\n", + "Bonus\n", + "* how to merge/concat DataFrame and Series\n", + "* read csv use converters\n", + "* join list to a DataFrame\n", + "* check dataframe for duplicated data" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "def strip_space(text):\n", + " try:\n", + " return text.strip()\n", + " except AttributeError:\n", + " return text\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_year
0James CameronAvataravatar|future|marine|native|paraplegic237000000.02009.0
1Gore VerbinskiPirates of the Caribbean: At World's Endgoddess|marriage ceremony|marriage proposal|pi...300000000.02007.0
2Sam MendesSpectrebomb|espionage|sequel|spy|terrorist245000000.02015.0
3Christopher NolanThe Dark Knight Risesdeception|imprisonment|lawlessness|police offi...250000000.02012.0
4Doug WalkerStar Wars: Episode VII - The Force AwakensNaNNaNNaN
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "0 James Cameron Avatar \n", + "1 Gore Verbinski Pirates of the Caribbean: At World's End \n", + "2 Sam Mendes Spectre \n", + "3 Christopher Nolan The Dark Knight Rises \n", + "4 Doug Walker Star Wars: Episode VII - The Force Awakens \n", + "\n", + " plot_keywords budget title_year \n", + "0 avatar|future|marine|native|paraplegic 237000000.0 2009.0 \n", + "1 goddess|marriage ceremony|marriage proposal|pi... 300000000.0 2007.0 \n", + "2 bomb|espionage|sequel|spy|terrorist 245000000.0 2015.0 \n", + "3 deception|imprisonment|lawlessness|police offi... 250000000.0 2012.0 \n", + "4 NaN NaN NaN " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset#movie_metadata.csv\n", + "\n", + "# read a dataset movies\n", + "import pandas as pd\n", + "movies = pd.read_csv('../csv/movie_metadata.csv', \n", + " usecols=['title_year', 'movie_title', 'director_name', 'plot_keywords', 'budget']\n", + " ,converters = {'movie_title' : strip_space}\n", + " )\n", + "movies.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_year
0James CameronAvataravatar|future|marine|native|paraplegic237000000.02009.0
\n", + "
" + ], + "text/plain": [ + " director_name movie_title plot_keywords \\\n", + "0 James Cameron Avatar  avatar|future|marine|native|paraplegic \n", + "\n", + " budget title_year \n", + "0 237000000.0 2009.0 " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[movies.movie_title == 'Avatar ']" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "director_name object\n", + "movie_title object\n", + "plot_keywords object\n", + "budget float64\n", + "title_year float64\n", + "dtype: object" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "movies['movie_title'] = movies.movie_title.str.strip()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5043, 5)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(247, 5)" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[movies.movie_title.duplicated(keep=False)].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "movies.drop_duplicates(subset=['movie_title'], keep='first', inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(4916, 5)" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## add completely new column" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_yeare
0James CameronAvataravatar|future|marine|native|paraplegic237000000.02009.0NaN
1Gore VerbinskiPirates of the Caribbean: At World's Endgoddess|marriage ceremony|marriage proposal|pi...300000000.02007.0NaN
2Sam MendesSpectrebomb|espionage|sequel|spy|terrorist245000000.02015.0NaN
3Christopher NolanThe Dark Knight Risesdeception|imprisonment|lawlessness|police offi...250000000.02012.0NaN
4Doug WalkerStar Wars: Episode VII - The Force AwakensNaNNaNNaNNaN
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "0 James Cameron Avatar \n", + "1 Gore Verbinski Pirates of the Caribbean: At World's End \n", + "2 Sam Mendes Spectre \n", + "3 Christopher Nolan The Dark Knight Rises \n", + "4 Doug Walker Star Wars: Episode VII - The Force Awakens \n", + "\n", + " plot_keywords budget title_year \\\n", + "0 avatar|future|marine|native|paraplegic 237000000.0 2009.0 \n", + "1 goddess|marriage ceremony|marriage proposal|pi... 300000000.0 2007.0 \n", + "2 bomb|espionage|sequel|spy|terrorist 245000000.0 2015.0 \n", + "3 deception|imprisonment|lawlessness|police offi... 250000000.0 2012.0 \n", + "4 NaN NaN NaN \n", + "\n", + " e \n", + "0 NaN \n", + "1 NaN \n", + "2 NaN \n", + "3 NaN \n", + "4 NaN " + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "movies['e'] = np.NaN\n", + "movies.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_yearef
0James CameronAvataravatar|future|marine|native|paraplegic237000000.02009.0NaN1
1Gore VerbinskiPirates of the Caribbean: At World's Endgoddess|marriage ceremony|marriage proposal|pi...300000000.02007.0NaN1
2Sam MendesSpectrebomb|espionage|sequel|spy|terrorist245000000.02015.0NaN1
3Christopher NolanThe Dark Knight Risesdeception|imprisonment|lawlessness|police offi...250000000.02012.0NaN1
4Doug WalkerStar Wars: Episode VII - The Force AwakensNaNNaNNaNNaN1
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "0 James Cameron Avatar \n", + "1 Gore Verbinski Pirates of the Caribbean: At World's End \n", + "2 Sam Mendes Spectre \n", + "3 Christopher Nolan The Dark Knight Rises \n", + "4 Doug Walker Star Wars: Episode VII - The Force Awakens \n", + "\n", + " plot_keywords budget title_year \\\n", + "0 avatar|future|marine|native|paraplegic 237000000.0 2009.0 \n", + "1 goddess|marriage ceremony|marriage proposal|pi... 300000000.0 2007.0 \n", + "2 bomb|espionage|sequel|spy|terrorist 245000000.0 2015.0 \n", + "3 deception|imprisonment|lawlessness|police offi... 250000000.0 2012.0 \n", + "4 NaN NaN NaN \n", + "\n", + " e f \n", + "0 NaN 1 \n", + "1 NaN 1 \n", + "2 NaN 1 \n", + "3 NaN 1 \n", + "4 NaN 1 " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies['f'] = 1\n", + "movies.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## add new column based on existing column" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_yearefcentury
5038Scott SmithSigned Sealed Deliveredfraud|postal worker|prison|theft|trialNaN2013.0NaN1True
5039NaNThe Followingcult|fbi|hideout|prison escape|serial killerNaNNaNNaN1False
5040Benjamin RoberdsA Plague So PleasantNaN1400.02013.0NaN1True
5041Daniel HsiaShanghai CallingNaNNaN2012.0NaN1True
5042Jon GunnMy Date with Drewactress name in title|crush|date|four word tit...1100.02004.0NaN1True
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "5038 Scott Smith Signed Sealed Delivered \n", + "5039 NaN The Following \n", + "5040 Benjamin Roberds A Plague So Pleasant \n", + "5041 Daniel Hsia Shanghai Calling \n", + "5042 Jon Gunn My Date with Drew \n", + "\n", + " plot_keywords budget title_year \\\n", + "5038 fraud|postal worker|prison|theft|trial NaN 2013.0 \n", + "5039 cult|fbi|hideout|prison escape|serial killer NaN NaN \n", + "5040 NaN 1400.0 2013.0 \n", + "5041 NaN NaN 2012.0 \n", + "5042 actress name in title|crush|date|four word tit... 1100.0 2004.0 \n", + "\n", + " e f century \n", + "5038 NaN 1 True \n", + "5039 NaN 1 False \n", + "5040 NaN 1 True \n", + "5041 NaN 1 True \n", + "5042 NaN 1 True " + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies['century'] = movies['title_year'] > 2000\n", + "movies.tail()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_yearefcentury
5033Shane CarruthPrimerchanging the future|independent film|invention...7000.02004.0NaN121 Century
5034Neill Dela LlanaCavitejihad|mindanao|philippines|security guard|squa...7000.02005.0NaN121 Century
5035Robert RodriguezEl Mariachiassassin|death|guitar|gun|mariachi7000.01992.0NaN120 Century
5036Anthony ValloneThe Mongol Kingjewell|mongol|nostradamus|stepnicka|vallone3250.02005.0NaN121 Century
5037Edward BurnsNewlywedswritten and directed by cast member9000.02011.0NaN121 Century
5038Scott SmithSigned Sealed Deliveredfraud|postal worker|prison|theft|trialNaN2013.0NaN121 Century
5039NaNThe Followingcult|fbi|hideout|prison escape|serial killerNaNNaNNaN120 Century
5040Benjamin RoberdsA Plague So PleasantNaN1400.02013.0NaN121 Century
5041Daniel HsiaShanghai CallingNaNNaN2012.0NaN121 Century
5042Jon GunnMy Date with Drewactress name in title|crush|date|four word tit...1100.02004.0NaN121 Century
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "5033 Shane Carruth Primer \n", + "5034 Neill Dela Llana Cavite \n", + "5035 Robert Rodriguez El Mariachi \n", + "5036 Anthony Vallone The Mongol King \n", + "5037 Edward Burns Newlyweds \n", + "5038 Scott Smith Signed Sealed Delivered \n", + "5039 NaN The Following \n", + "5040 Benjamin Roberds A Plague So Pleasant \n", + "5041 Daniel Hsia Shanghai Calling \n", + "5042 Jon Gunn My Date with Drew \n", + "\n", + " plot_keywords budget title_year \\\n", + "5033 changing the future|independent film|invention... 7000.0 2004.0 \n", + "5034 jihad|mindanao|philippines|security guard|squa... 7000.0 2005.0 \n", + "5035 assassin|death|guitar|gun|mariachi 7000.0 1992.0 \n", + "5036 jewell|mongol|nostradamus|stepnicka|vallone 3250.0 2005.0 \n", + "5037 written and directed by cast member 9000.0 2011.0 \n", + "5038 fraud|postal worker|prison|theft|trial NaN 2013.0 \n", + "5039 cult|fbi|hideout|prison escape|serial killer NaN NaN \n", + "5040 NaN 1400.0 2013.0 \n", + "5041 NaN NaN 2012.0 \n", + "5042 actress name in title|crush|date|four word tit... 1100.0 2004.0 \n", + "\n", + " e f century \n", + "5033 NaN 1 21 Century \n", + "5034 NaN 1 21 Century \n", + "5035 NaN 1 20 Century \n", + "5036 NaN 1 21 Century \n", + "5037 NaN 1 21 Century \n", + "5038 NaN 1 21 Century \n", + "5039 NaN 1 20 Century \n", + "5040 NaN 1 21 Century \n", + "5041 NaN 1 21 Century \n", + "5042 NaN 1 21 Century " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies['century'] = movies.century.map({True:'21 Century', False:'20 Century'})\n", + "movies.tail(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## matching the content of the DataFrame" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#movies = movies.set_index('movie_title')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Avatar True\n", + "Spectre True\n", + "Name: watched, dtype: bool" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "watched = pd.Series([True, True], index=['Avatar', 'Spectre'], name='watched')\n", + "watched" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2,)" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "watched.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(4916, 8)" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version\n", + "of pandas will change to not sort by default.\n", + "\n", + "To accept the future behavior, pass 'sort=False'.\n", + "\n", + "To retain the current behavior and silence the warning, pass 'sort=True'.\n", + "\n", + " \"\"\"Entry point for launching an IPython kernel.\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_nameplot_keywordsbudgettitle_yearefcenturywatched
#HorrorTara Subkoffbullying|cyberbullying|girl|internet|throat sl...1500000.02015.0NaN121 CenturyNaN
10 Cloverfield LaneDan Trachtenbergalien|bunker|car crash|kidnapping|minimal cast15000000.02016.0NaN121 CenturyNaN
10 Days in a MadhouseTimothy HinesNaN12000000.02015.0NaN121 CenturyNaN
10 Things I Hate About YouGil Jungerdating|protective father|school|shrew|teen movie16000000.01999.0NaN120 CenturyNaN
10,000 B.C.Christopher BarnardNaNNaNNaNNaN120 CenturyNaN
\n", + "
" + ], + "text/plain": [ + " director_name \\\n", + "#Horror Tara Subkoff \n", + "10 Cloverfield Lane Dan Trachtenberg \n", + "10 Days in a Madhouse Timothy Hines \n", + "10 Things I Hate About You Gil Junger \n", + "10,000 B.C. Christopher Barnard \n", + "\n", + " plot_keywords \\\n", + "#Horror bullying|cyberbullying|girl|internet|throat sl... \n", + "10 Cloverfield Lane alien|bunker|car crash|kidnapping|minimal cast \n", + "10 Days in a Madhouse NaN \n", + "10 Things I Hate About You dating|protective father|school|shrew|teen movie \n", + "10,000 B.C. NaN \n", + "\n", + " budget title_year e f century watched \n", + "#Horror 1500000.0 2015.0 NaN 1 21 Century NaN \n", + "10 Cloverfield Lane 15000000.0 2016.0 NaN 1 21 Century NaN \n", + "10 Days in a Madhouse 12000000.0 2015.0 NaN 1 21 Century NaN \n", + "10 Things I Hate About You 16000000.0 1999.0 NaN 1 20 Century NaN \n", + "10,000 B.C. NaN NaN NaN 1 20 Century NaN " + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_concat = pd.concat([movies.set_index('movie_title'), watched.to_frame()], axis=1)\n", + "df_concat.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True 2\n", + "Name: watched, dtype: int64" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_concat.watched.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_nameplot_keywordsbudgettitle_yearefcenturywatched
AvatarJames Cameronavatar|future|marine|native|paraplegic237000000.02009.0NaN121 CenturyTrue
SpectreSam Mendesbomb|espionage|sequel|spy|terrorist245000000.02015.0NaN121 CenturyTrue
\n", + "
" + ], + "text/plain": [ + " director_name plot_keywords budget \\\n", + "Avatar James Cameron avatar|future|marine|native|paraplegic 237000000.0 \n", + "Spectre Sam Mendes bomb|espionage|sequel|spy|terrorist 245000000.0 \n", + "\n", + " title_year e f century watched \n", + "Avatar 2009.0 NaN 1 21 Century True \n", + "Spectre 2015.0 NaN 1 21 Century True " + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_concat[df_concat.watched == True]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From b076552b15e1ddd2e5071416530d42dc65ec1f9f Mon Sep 17 00:00:00 2001 From: softhints Date: Thu, 28 Feb 2019 10:08:18 +0200 Subject: [PATCH 20/76] Python Pandas find and drop duplicate data --- ..._Pandas_find_and_drop_duplicate_data.ipynb | 1603 +++++++++++++++++ 1 file changed, 1603 insertions(+) create mode 100644 notebooks/pandas/Python_Pandas_find_and_drop_duplicate_data.ipynb diff --git a/notebooks/pandas/Python_Pandas_find_and_drop_duplicate_data.ipynb b/notebooks/pandas/Python_Pandas_find_and_drop_duplicate_data.ipynb new file mode 100644 index 0000000..9e9e9be --- /dev/null +++ b/notebooks/pandas/Python_Pandas_find_and_drop_duplicate_data.ipynb @@ -0,0 +1,1603 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Python Pandas identify and drop duplicate data\n", + "\n", + "* identify duplicate rows in Pandas\n", + "* find duplicate values in a column\n", + "* identify duplicate values in several columns\n", + "* drop duplicated data in all columns\n", + "* drop duplicated data in several column\n", + "\n", + "Bonus\n", + "\n", + "* find duplicates in index\n", + "* find duplicate data in a row\n", + "* delete columns with duplicates" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_year
0James CameronAvataravatar|future|marine|native|paraplegic237000000.02009.0
1Gore VerbinskiPirates of the Caribbean: At World's Endgoddess|marriage ceremony|marriage proposal|pi...300000000.02007.0
2Sam MendesSpectrebomb|espionage|sequel|spy|terrorist245000000.02015.0
3Christopher NolanThe Dark Knight Risesdeception|imprisonment|lawlessness|police offi...250000000.02012.0
4Doug WalkerStar Wars: Episode VII - The Force AwakensNaNNaNNaN
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "0 James Cameron Avatar \n", + "1 Gore Verbinski Pirates of the Caribbean: At World's End \n", + "2 Sam Mendes Spectre \n", + "3 Christopher Nolan The Dark Knight Rises \n", + "4 Doug Walker Star Wars: Episode VII - The Force Awakens \n", + "\n", + " plot_keywords budget title_year \n", + "0 avatar|future|marine|native|paraplegic 237000000.0 2009.0 \n", + "1 goddess|marriage ceremony|marriage proposal|pi... 300000000.0 2007.0 \n", + "2 bomb|espionage|sequel|spy|terrorist 245000000.0 2015.0 \n", + "3 deception|imprisonment|lawlessness|police offi... 250000000.0 2012.0 \n", + "4 NaN NaN NaN " + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset#movie_metadata.csv\n", + "\n", + "# read a dataset movies\n", + "import pandas as pd\n", + "movies = pd.read_csv('../csv/movie_metadata.csv', \n", + " usecols=['title_year', 'movie_title', 'director_name', 'plot_keywords', 'budget']\n", + " )\n", + "movies['movie_title'] = movies.movie_title.str.strip()\n", + "movies.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## find duplicate rows in Pandas\n", + "\n", + "**subset** : column label or sequence of labels, optional\n", + "Only consider certain columns for identifying duplicates, by default use all of the columns\n", + "\n", + "**keep** : {‘first’, ‘last’, False}, default ‘first’\n", + "* first : Mark duplicates as True except for the first occurrence.\n", + "* last : Mark duplicates as True except for the last occurrence.\n", + "* False : Mark all duplicates as True." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5043, 5)" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(123, 5)" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[movies.duplicated(keep='first')].shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## find duplicate values in a column" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(127, 5)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[movies.movie_title.duplicated(keep='first')].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_year
137David YatesThe Legend of Tarzanafrica|capture|jungle|male objectification|tarzan180000000.02016.0
187Bill CondonThe Twilight Saga: Breaking Dawn - Part 2battle|friend|super strength|vampire|vision120000000.02012.0
204Hideaki AnnoGodzilla Resurgenceblood|godzilla|monster|sequelNaN2016.0
303Joe WrightPan1940s|child hero|fantasy world|orphan|referenc...150000000.02015.0
389Josh TrankFantastic Fourbox office flop|critically bashed|portal|telep...120000000.02015.0
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "137 David Yates The Legend of Tarzan \n", + "187 Bill Condon The Twilight Saga: Breaking Dawn - Part 2 \n", + "204 Hideaki Anno Godzilla Resurgence \n", + "303 Joe Wright Pan \n", + "389 Josh Trank Fantastic Four \n", + "\n", + " plot_keywords budget \\\n", + "137 africa|capture|jungle|male objectification|tarzan 180000000.0 \n", + "187 battle|friend|super strength|vampire|vision 120000000.0 \n", + "204 blood|godzilla|monster|sequel NaN \n", + "303 1940s|child hero|fantasy world|orphan|referenc... 150000000.0 \n", + "389 box office flop|critically bashed|portal|telep... 120000000.0 \n", + "\n", + " title_year \n", + "137 2016.0 \n", + "187 2012.0 \n", + "204 2016.0 \n", + "303 2015.0 \n", + "389 2015.0 " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[movies.movie_title.duplicated(keep='first')].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(127, 5)" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[movies.movie_title.duplicated(keep='last')].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(247, 5)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[movies.movie_title.duplicated(keep=False)].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_year
367Timur BekmambetovBen-HurNaNNaN2016.0
2613Timur BekmambetovBen-Hurchariot race|epic|false accusation|jerusalem|s...100000000.02016.0
3967Timur BekmambetovBen-Hurchariot race|epic|false accusation|jerusalem|s...100000000.02016.0
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "367 Timur Bekmambetov Ben-Hur \n", + "2613 Timur Bekmambetov Ben-Hur \n", + "3967 Timur Bekmambetov Ben-Hur \n", + "\n", + " plot_keywords budget \\\n", + "367 NaN NaN \n", + "2613 chariot race|epic|false accusation|jerusalem|s... 100000000.0 \n", + "3967 chariot race|epic|false accusation|jerusalem|s... 100000000.0 \n", + "\n", + " title_year \n", + "367 2016.0 \n", + "2613 2016.0 \n", + "3967 2016.0 " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[movies.movie_title == 'Ben-Hur']" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_year
63David YatesThe Legend of Tarzanafrica|capture|jungle|male objectification|tarzan180000000.02016.0
137David YatesThe Legend of Tarzanafrica|capture|jungle|male objectification|tarzan180000000.02016.0
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "63 David Yates The Legend of Tarzan \n", + "137 David Yates The Legend of Tarzan \n", + "\n", + " plot_keywords budget \\\n", + "63 africa|capture|jungle|male objectification|tarzan 180000000.0 \n", + "137 africa|capture|jungle|male objectification|tarzan 180000000.0 \n", + "\n", + " title_year \n", + "63 2016.0 \n", + "137 2016.0 " + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[movies.movie_title == 'The Legend of Tarzan']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## find duplicate values in several columns" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_year
6Sam RaimiSpider-Man 3sandman|spider man|symbiote|venom|villain258000000.02007.0
17Joss WhedonThe Avengersalien invasion|assassin|battle|iron man|soldier220000000.02012.0
25Peter JacksonKing Konganimal name in title|ape abducts a woman|goril...207000000.02005.0
30Sam MendesSkyfallbrawl|childhood home|computer cracker|intellig...200000000.02012.0
33Tim BurtonAlice in Wonderlandalice in wonderland|mistaking reality for drea...200000000.02010.0
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "6 Sam Raimi Spider-Man 3 \n", + "17 Joss Whedon The Avengers \n", + "25 Peter Jackson King Kong \n", + "30 Sam Mendes Skyfall \n", + "33 Tim Burton Alice in Wonderland \n", + "\n", + " plot_keywords budget title_year \n", + "6 sandman|spider man|symbiote|venom|villain 258000000.0 2007.0 \n", + "17 alien invasion|assassin|battle|iron man|soldier 220000000.0 2012.0 \n", + "25 animal name in title|ape abducts a woman|goril... 207000000.0 2005.0 \n", + "30 brawl|childhood home|computer cracker|intellig... 200000000.0 2012.0 \n", + "33 alice in wonderland|mistaking reality for drea... 200000000.0 2010.0 " + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[movies.duplicated(subset=['movie_title', 'title_year'], keep=False)].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titleplot_keywordsbudgettitle_year
6Sam RaimiSpider-Man 3sandman|spider man|symbiote|venom|villain258000000.02007.0
17Joss WhedonThe Avengersalien invasion|assassin|battle|iron man|soldier220000000.02012.0
25Peter JacksonKing Konganimal name in title|ape abducts a woman|goril...207000000.02005.0
30Sam MendesSkyfallbrawl|childhood home|computer cracker|intellig...200000000.02012.0
33Tim BurtonAlice in Wonderlandalice in wonderland|mistaking reality for drea...200000000.02010.0
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "6 Sam Raimi Spider-Man 3 \n", + "17 Joss Whedon The Avengers \n", + "25 Peter Jackson King Kong \n", + "30 Sam Mendes Skyfall \n", + "33 Tim Burton Alice in Wonderland \n", + "\n", + " plot_keywords budget title_year \n", + "6 sandman|spider man|symbiote|venom|villain 258000000.0 2007.0 \n", + "17 alien invasion|assassin|battle|iron man|soldier 220000000.0 2012.0 \n", + "25 animal name in title|ape abducts a woman|goril... 207000000.0 2005.0 \n", + "30 brawl|childhood home|computer cracker|intellig... 200000000.0 2012.0 \n", + "33 alice in wonderland|mistaking reality for drea... 200000000.0 2010.0 " + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies[movies.duplicated(subset=['movie_title', 'director_name', 'budget'], keep=False)].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Drop duplicates" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5043, 5)" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "movies.drop_duplicates(keep='first', inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(4920, 5)" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "movies.drop_duplicates(subset=['movie_title', 'director_name'], keep=False, inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(4918, 5)" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "movies.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## find duplicate data in a index" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame({\"X\":[\"A\", \"XX\", \"B\", \"C\"], \"Y\":[11,\"XX\",11,12], \"Z\":[\"X\",\"XX\",\"Y\",\"X\"], 0:[0,1,1,2]})\n", + "df.set_index(0, inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
XYZ
0
0A11X
1XXXXXX
1B11Y
2C12X
\n", + "
" + ], + "text/plain": [ + " X Y Z\n", + "0 \n", + "0 A 11 X\n", + "1 XX XX XX\n", + "1 B 11 Y\n", + "2 C 12 X" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
XYZ
0
1XXXXXX
1B11Y
\n", + "
" + ], + "text/plain": [ + " X Y Z\n", + "0 \n", + "1 XX XX XX\n", + "1 B 11 Y" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df.index.duplicated(keep=False)]" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "df = df[~df.index.duplicated(keep='last')]" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
XYZ
0
0A11X
1B11Y
2C12X
\n", + "
" + ], + "text/plain": [ + " X Y Z\n", + "0 \n", + "0 A 11 X\n", + "1 B 11 Y\n", + "2 C 12 X" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## find duplicate data in a row" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame({\"X\":[\"A\", \"XX\", \"B\", \"C\"], \"Y\":[11,\"XX\",11,12], \"Z\":[\"X\",\"XX\",\"Y\",\"X\"]})" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
XYZ
0A11X
1XXXXXX
2B11Y
3C12X
\n", + "
" + ], + "text/plain": [ + " X Y Z\n", + "0 A 11 X\n", + "1 XX XX XX\n", + "2 B 11 Y\n", + "3 C 12 X" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "RangeIndex(start=0, stop=4, step=1)" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "indexes = df.index\n", + "indexes" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "df = df.T" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123
XAXXBC
Y11XX1112
ZXXXYX
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3\n", + "X A XX B C\n", + "Y 11 XX 11 12\n", + "Z X XX Y X" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(0, 4)" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df.duplicated(keep='first')].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "X True\n", + "Y True\n", + "Z False\n", + "Name: 1, dtype: bool" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[1].duplicated(keep='last')" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "X False\n", + "Y True\n", + "Z True\n", + "Name: 1, dtype: bool" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[1].duplicated(keep='first')" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "X True\n", + "Y True\n", + "Z True\n", + "Name: 1, dtype: bool" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[1].duplicated(keep=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[1].duplicated(keep=False).sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0\n", + "3\n", + "0\n", + "0\n" + ] + } + ], + "source": [ + "for i in indexes:\n", + " print(df[i].duplicated(keep=False).sum())\n", + " if df[i].duplicated(keep=False).sum() == df.shape[0]:\n", + " df.drop(i, inplace=True, axis=1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
023
XABC
Y111112
ZXYX
\n", + "
" + ], + "text/plain": [ + " 0 2 3\n", + "X A B C\n", + "Y 11 11 12\n", + "Z X Y X" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
XYZ
0A11X
2B11Y
3C12X
\n", + "
" + ], + "text/plain": [ + " X Y Z\n", + "0 A 11 X\n", + "2 B 11 Y\n", + "3 C 12 X" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.T" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 2357a4b7f8e6ab394edfef8dc098a22b4c358efd Mon Sep 17 00:00:00 2001 From: softhints Date: Fri, 1 Mar 2019 12:27:47 +0200 Subject: [PATCH 21/76] Questions_and_Answers_1_Improve_OCR_and_tabula_range --- ...swers_1_Improve_OCR_and_tabula_range.ipynb | 513 ++++++++++++++++++ 1 file changed, 513 insertions(+) create mode 100644 notebooks/Q&A/Questions_and_Answers_1_Improve_OCR_and_tabula_range.ipynb diff --git a/notebooks/Q&A/Questions_and_Answers_1_Improve_OCR_and_tabula_range.ipynb b/notebooks/Q&A/Questions_and_Answers_1_Improve_OCR_and_tabula_range.ipynb new file mode 100644 index 0000000..27d4422 --- /dev/null +++ b/notebooks/Q&A/Questions_and_Answers_1_Improve_OCR_and_tabula_range.ipynb @@ -0,0 +1,513 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Questions and Answers 2 Improve OCR and tabula range" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Question 1\n", + "\n", + "#### Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2\n", + "\n", + "https://youtu.be/702lkQbZx50\n", + "\n", + "![Question 1](../images/Selection_177.png)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(29, 4)" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from tabula import read_pdf\n", + "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=3)\n", + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(69, 5)" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# specify page range 1 to 3 page\n", + "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages='1-3')\n", + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
BREADS & CEREALSPortion size *per 100 grams (3.5 oz)Unnamed: 3energy content
0Bagel ( 1 average )140 cals (45g)310 calsNaNMedium
1Biscuit digestives86 cals (per biscuit)480 calsNaNHigh
2Jaffa cake48 cals (per biscuit)370 calsNaNMed-High
3Bread white (thick slice)96 cals (1 slice 40g)240 calsNaNMedium
4Bread wholemeal (thick)88 cals (1 slice 40g)220 calsNaNLow-med
\n", + "
" + ], + "text/plain": [ + " BREADS & CEREALS Portion size * per 100 grams (3.5 oz) \\\n", + "0 Bagel ( 1 average ) 140 cals (45g) 310 cals \n", + "1 Biscuit digestives 86 cals (per biscuit) 480 cals \n", + "2 Jaffa cake 48 cals (per biscuit) 370 cals \n", + "3 Bread white (thick slice) 96 cals (1 slice 40g) 240 cals \n", + "4 Bread wholemeal (thick) 88 cals (1 slice 40g) 220 cals \n", + "\n", + " Unnamed: 3 energy content \n", + "0 NaN Medium \n", + "1 NaN High \n", + "2 NaN Med-High \n", + "3 NaN Medium \n", + "4 NaN Low-med " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
BREADS & CEREALSPortion size *per 100 grams (3.5 oz)Unnamed: 3energy content
64Sausage pork fried250 cals320 calsHighNaN
65Sausage pork grilled220 cals280 calsMed-HighNaN
66Sausage roll290 cals480 calsHighNaN
67Scampi fried in oil400 cals340 calsHighNaN
68Steak & kidney pie400 cals350 calsHighNaN
\n", + "
" + ], + "text/plain": [ + " BREADS & CEREALS Portion size * per 100 grams (3.5 oz) Unnamed: 3 \\\n", + "64 Sausage pork fried 250 cals 320 cals High \n", + "65 Sausage pork grilled 220 cals 280 cals Med-High \n", + "66 Sausage roll 290 cals 480 cals High \n", + "67 Scampi fried in oil 400 cals 340 cals High \n", + "68 Steak & kidney pie 400 cals 350 cals High \n", + "\n", + " energy content \n", + "64 NaN \n", + "65 NaN \n", + "66 NaN \n", + "67 NaN \n", + "68 NaN " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.tail()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(69, 5)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# create page range 1 to 3 page\n", + "pages=(str(1)+'-'+str(3))\n", + "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=pages)\n", + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(69, 5)" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# list all possible pages\n", + "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=[1,2,3])\n", + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(69, 5)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# list all possible pages using range\n", + "pages = list(range(1, 4))\n", + "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=pages)\n", + "df.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Question 2\n", + "\n", + "#### python extract text from image or pdf\n", + "\n", + "https://youtu.be/PK-GvWWQ03g\n", + "\n", + "![Question ](../images/Selection_178.png)\n", + "\n", + "python extract text from image or pdf\n", + "\n", + "https://blog.softhints.com/python-extract-text-from-image-or-pdf/\n", + "\n", + "Improve OCR Accuracy With Advanced Image Preprocessing\n", + "\n", + "https://docparser.com/blog/improve-ocr-accuracy/" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Question ](../images/Selection_174.png)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import pytesseract" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Java\n", + "\n", + "Python\n", + "\n", + "public class JavaPyramid1 {\n", + "public static void main(String[] args) {\n", + "for(int i=1; i<=5; i++) {\n", + "for(int j=0; j Date: Sat, 9 Mar 2019 11:54:39 +0200 Subject: [PATCH 22/76] Map the headers to a column with pandas? --- ..._the_headers_to_a_column_with_pandas.ipynb | 1071 +++++++++++++++++ 1 file changed, 1071 insertions(+) create mode 100644 notebooks/pandas/map_the_headers_to_a_column_with_pandas.ipynb diff --git a/notebooks/pandas/map_the_headers_to_a_column_with_pandas.ipynb b/notebooks/pandas/map_the_headers_to_a_column_with_pandas.ipynb new file mode 100644 index 0000000..5086ed9 --- /dev/null +++ b/notebooks/pandas/map_the_headers_to_a_column_with_pandas.ipynb @@ -0,0 +1,1071 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Map the headers to a column with pandas?\n", + "\n", + "Data set: Stack Over Flow 2018 insights\n", + "\n", + "* https://insights.stackoverflow.com/survey\n", + "* https://insights.stackoverflow.com/survey/2018#technology\n", + "\n", + "Topics\n", + "\n", + "* map a headers based on a value to a new column\n", + "\n", + "Bonus\n", + "\n", + "* pandas dot method - matrix multiplication\n", + "* understand np.where\n", + "* map single column of dataframe\n", + "* map all columns of a dataframe\n", + "* map and NaN\n", + "* check all distinct values in dataframe\n", + "* Optimize big data frames:\n", + " * Columns have mixed types. Specify dtype option on import or set low_memory=False.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "pd.set_option('display.max_colwidth', -1)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(98855, 129)\n" + ] + } + ], + "source": [ + "# read the data frame and see the data insight\n", + "df = pd.read_csv(\"../csv/stackoverflow/developer_survey_2018/survey_results_public.csv\", low_memory=False)\n", + "print(df.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
RespondentHobbyOpenSourceCountryStudentEmploymentFormalEducationUndergradMajorCompanySizeDevType...ExerciseGenderSexualOrientationEducationParentsRaceEthnicityAgeDependentsMilitaryUSSurveyTooLongSurveyEasy
01YesNoKenyaNoEmployed part-timeBachelor’s degree (BA, BS, B.Eng., etc.)Mathematics or statistics20 to 99 employeesFull-stack developer...3 - 4 times per weekMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)Black or of African descent25 - 34 years oldYesNaNThe survey was an appropriate lengthVery easy
13YesYesUnited KingdomNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)A natural science (ex. biology, chemistry, physics)10,000 or more employeesDatabase administrator;DevOps specialist;Full-stack developer;System administrator...Daily or almost every dayMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)White or of European descent35 - 44 years oldYesNaNThe survey was an appropriate lengthSomewhat easy
24YesYesUnited StatesNoEmployed full-timeAssociate degreeComputer science, computer engineering, or software engineering20 to 99 employeesEngineering manager;Full-stack developer...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
35NoNoUnited StatesNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)Computer science, computer engineering, or software engineering100 to 499 employeesFull-stack developer...I don't typically exerciseMaleStraight or heterosexualSome college/university study without earning a degreeWhite or of European descent35 - 44 years oldNoNoThe survey was an appropriate lengthSomewhat easy
47YesNoSouth AfricaYes, part-timeEmployed full-timeSome college/university study without earning a degreeComputer science, computer engineering, or software engineering10,000 or more employeesData or business analyst;Desktop or enterprise applications developer;Game or graphics developer;QA or test developer;Student...3 - 4 times per weekMaleStraight or heterosexualSome college/university study without earning a degreeWhite or of European descent18 - 24 years oldYesNaNThe survey was an appropriate lengthSomewhat easy
\n", + "

5 rows × 129 columns

\n", + "
" + ], + "text/plain": [ + " Respondent Hobby OpenSource Country Student \\\n", + "0 1 Yes No Kenya No \n", + "1 3 Yes Yes United Kingdom No \n", + "2 4 Yes Yes United States No \n", + "3 5 No No United States No \n", + "4 7 Yes No South Africa Yes, part-time \n", + "\n", + " Employment FormalEducation \\\n", + "0 Employed part-time Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Employed full-time Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "2 Employed full-time Associate degree \n", + "3 Employed full-time Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "4 Employed full-time Some college/university study without earning a degree \n", + "\n", + " UndergradMajor \\\n", + "0 Mathematics or statistics \n", + "1 A natural science (ex. biology, chemistry, physics) \n", + "2 Computer science, computer engineering, or software engineering \n", + "3 Computer science, computer engineering, or software engineering \n", + "4 Computer science, computer engineering, or software engineering \n", + "\n", + " CompanySize \\\n", + "0 20 to 99 employees \n", + "1 10,000 or more employees \n", + "2 20 to 99 employees \n", + "3 100 to 499 employees \n", + "4 10,000 or more employees \n", + "\n", + " DevType \\\n", + "0 Full-stack developer \n", + "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", + "2 Engineering manager;Full-stack developer \n", + "3 Full-stack developer \n", + "4 Data or business analyst;Desktop or enterprise applications developer;Game or graphics developer;QA or test developer;Student \n", + "\n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "1 ... Daily or almost every day Male Straight or heterosexual \n", + "2 ... NaN NaN NaN \n", + "3 ... I don't typically exercise Male Straight or heterosexual \n", + "4 ... 3 - 4 times per week Male Straight or heterosexual \n", + "\n", + " EducationParents \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "2 NaN \n", + "3 Some college/university study without earning a degree \n", + "4 Some college/university study without earning a degree \n", + "\n", + " RaceEthnicity Age Dependents MilitaryUS \\\n", + "0 Black or of African descent 25 - 34 years old Yes NaN \n", + "1 White or of European descent 35 - 44 years old Yes NaN \n", + "2 NaN NaN NaN NaN \n", + "3 White or of European descent 35 - 44 years old No No \n", + "4 White or of European descent 18 - 24 years old Yes NaN \n", + "\n", + " SurveyTooLong SurveyEasy \n", + "0 The survey was an appropriate length Very easy \n", + "1 The survey was an appropriate length Somewhat easy \n", + "2 NaN NaN \n", + "3 The survey was an appropriate length Somewhat easy \n", + "4 The survey was an appropriate length Somewhat easy \n", + "\n", + "[5 rows x 129 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# examples\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
HobbyOpenSourceStudent
0YesNoNo
1YesYesNo
2YesYesNo
3NoNoNo
4YesNoYes, part-time
\n", + "
" + ], + "text/plain": [ + " Hobby OpenSource Student\n", + "0 Yes No No \n", + "1 Yes Yes No \n", + "2 Yes Yes No \n", + "3 No No No \n", + "4 Yes No Yes, part-time" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# create new data frame with 3 columns\n", + "columns = ['Hobby', 'OpenSource', 'Student']\n", + "df_answers = df[columns]\n", + "df_answers.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0.0\n", + "1 0.0\n", + "2 0.0\n", + "3 0.0\n", + "4 NaN \n", + "Name: Student, dtype: float64" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# map single column of dataframe\n", + "df_answers.Student.map( {'Yes':1, 'No':0}).head()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['No', 'Yes, part-time', nan, 'Yes, full-time'], dtype=object)" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# check all distinct values in dataframe\n", + "df_answers.Student.unique()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
HobbyOpenSourceStudent
0100
1110
2110
3000
4100
\n", + "
" + ], + "text/plain": [ + " Hobby OpenSource Student\n", + "0 1 0 0 \n", + "1 1 1 0 \n", + "2 1 1 0 \n", + "3 0 0 0 \n", + "4 1 0 0 " + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# map all columns of a dataframe\n", + "import numpy as np\n", + "new_values = {'Yes':1, 'No':0, 'Yes, part-time':0, 'Yes, full-time':0, np.NaN:0}\n", + "\n", + "df_answers = df_answers.apply(lambda x: x.map( new_values ))\n", + "\n", + "df_answers.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
HobbyOpenSourceStudentanswer
0100Hobby
1110HobbyOpenSource
2110HobbyOpenSource
3000
4100Hobby
\n", + "
" + ], + "text/plain": [ + " Hobby OpenSource Student answer\n", + "0 1 0 0 Hobby \n", + "1 1 1 0 HobbyOpenSource\n", + "2 1 1 0 HobbyOpenSource\n", + "3 0 0 0 \n", + "4 1 0 0 Hobby " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# map headers to columns way 1\n", + "df_answers['answer'] = np.where(df_answers, df_answers.columns, '').sum(axis=1)\n", + "df_answers.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([4, 5, 6]),)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a = np.array([1, 2,3, 4, 9, 7, 8, 6])\n", + "np.where(a > 6)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([9, 7, 8])" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a = np.array([1, 2,3, 4, 9, 7, 8, 6])\n", + "a[np.where(a > 6)]" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1, 8],\n", + " [3, 4]])" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.where([[True, False], [True, True]],\n", + " [[1, 2], [3, 4]],\n", + " [[9, 8], [7, 6]])" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[9, 8],\n", + " [7, 6]])" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.where([[False, False], [False, False]],\n", + " [[1, 2], [3, 4]],\n", + " [[9, 8], [7, 6]])" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1, 2],\n", + " [3, 4]])" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.where([[True, True], [True, True]],\n", + " [[1, 2], [3, 4]],\n", + " [[9, 8], [7, 6]])" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
HobbyOpenSourceStudent
0100
1110
2110
3000
4100
\n", + "
" + ], + "text/plain": [ + " Hobby OpenSource Student\n", + "0 1 0 0 \n", + "1 1 1 0 \n", + "2 1 1 0 \n", + "3 0 0 0 \n", + "4 1 0 0 " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_answers.drop('answer', axis=1, inplace=True)\n", + "df_answers.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
HobbyOpenSourceStudentanswer
0100Hobby
1110HobbyOpenSource
2110HobbyOpenSource
3000
4100Hobby
\n", + "
" + ], + "text/plain": [ + " Hobby OpenSource Student answer\n", + "0 1 0 0 Hobby \n", + "1 1 1 0 HobbyOpenSource\n", + "2 1 1 0 HobbyOpenSource\n", + "3 0 0 0 \n", + "4 1 0 0 Hobby " + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# map headers to columns way 2\n", + "df_answers.assign(answer=df_answers.dot(df_answers.columns)).head()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01
012
145
\n", + "
" + ], + "text/plain": [ + " 0 1\n", + "0 1 2\n", + "1 4 5" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a = pd.DataFrame([[1, 2], \n", + " [4, 5]])\n", + "b = pd.DataFrame([[1, 0], \n", + " [0, 1]])\n", + "a.dot(b)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01
020
180
\n", + "
" + ], + "text/plain": [ + " 0 1\n", + "0 2 0\n", + "1 8 0" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a = pd.DataFrame([[1, 2], \n", + " [4, 5]])\n", + "b = pd.DataFrame([[2, 0], \n", + " [0, 0]])\n", + "a.dot(b)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 773351d7cd6103fd7d8fad04e53a8dd5a8efaa67 Mon Sep 17 00:00:00 2001 From: softhints Date: Sat, 16 Mar 2019 12:06:12 +0200 Subject: [PATCH 23/76] Pandas_count_values_in_a_column_of_type_list --- ...ount_values_in_a_column_of_type_list.ipynb | 2040 +++++++++++++++++ 1 file changed, 2040 insertions(+) create mode 100644 notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb diff --git a/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb b/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb new file mode 100644 index 0000000..91e9752 --- /dev/null +++ b/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb @@ -0,0 +1,2040 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pandas count values in a column of type list?\n", + "\n", + "Data set: Stack Over Flow 2018 insights\n", + "\n", + "* https://insights.stackoverflow.com/survey\n", + "* https://insights.stackoverflow.com/survey/2018#technology\n", + "\n", + "Topics\n", + "\n", + "* expand list column\n", + "* value_counts for list column\n", + "\n", + "Bonus\n", + "\n", + "* combine head and tail \n", + "* slicing iloc with range\n", + "* value_count on all columns\n", + "* sum per column\n", + "* do a sum of several columns\n", + "* sum all columns with iteration\n", + "* be careful when you chain operations with pandas" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "pd.set_option('display.max_colwidth', -1)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(98855, 129)\n" + ] + } + ], + "source": [ + "# read the data frame and see the data insight\n", + "df = pd.read_csv(\"../csv/stackoverflow/developer_survey_2018/survey_results_public.csv\", low_memory=False)\n", + "print(df.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
RespondentHobbyOpenSourceCountryStudentEmploymentFormalEducationUndergradMajorCompanySizeDevType...ExerciseGenderSexualOrientationEducationParentsRaceEthnicityAgeDependentsMilitaryUSSurveyTooLongSurveyEasy
01YesNoKenyaNoEmployed part-timeBachelor’s degree (BA, BS, B.Eng., etc.)Mathematics or statistics20 to 99 employeesFull-stack developer...3 - 4 times per weekMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)Black or of African descent25 - 34 years oldYesNaNThe survey was an appropriate lengthVery easy
13YesYesUnited KingdomNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)A natural science (ex. biology, chemistry, physics)10,000 or more employeesDatabase administrator;DevOps specialist;Full-stack developer;System administrator...Daily or almost every dayMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)White or of European descent35 - 44 years oldYesNaNThe survey was an appropriate lengthSomewhat easy
\n", + "

2 rows × 129 columns

\n", + "
" + ], + "text/plain": [ + " Respondent Hobby OpenSource Country Student Employment \\\n", + "0 1 Yes No Kenya No Employed part-time \n", + "1 3 Yes Yes United Kingdom No Employed full-time \n", + "\n", + " FormalEducation \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "\n", + " UndergradMajor \\\n", + "0 Mathematics or statistics \n", + "1 A natural science (ex. biology, chemistry, physics) \n", + "\n", + " CompanySize \\\n", + "0 20 to 99 employees \n", + "1 10,000 or more employees \n", + "\n", + " DevType \\\n", + "0 Full-stack developer \n", + "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", + "\n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "1 ... Daily or almost every day Male Straight or heterosexual \n", + "\n", + " EducationParents RaceEthnicity \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent \n", + "\n", + " Age Dependents MilitaryUS \\\n", + "0 25 - 34 years old Yes NaN \n", + "1 35 - 44 years old Yes NaN \n", + "\n", + " SurveyTooLong SurveyEasy \n", + "0 The survey was an appropriate length Very easy \n", + "1 The survey was an appropriate length Somewhat easy \n", + "\n", + "[2 rows x 129 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
RespondentHobbyOpenSourceCountryStudentEmploymentFormalEducationUndergradMajorCompanySizeDevType...ExerciseGenderSexualOrientationEducationParentsRaceEthnicityAgeDependentsMilitaryUSSurveyTooLongSurveyEasy
01YesNoKenyaNoEmployed part-timeBachelor’s degree (BA, BS, B.Eng., etc.)Mathematics or statistics20 to 99 employeesFull-stack developer...3 - 4 times per weekMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)Black or of African descent25 - 34 years oldYesNaNThe survey was an appropriate lengthVery easy
13YesYesUnited KingdomNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)A natural science (ex. biology, chemistry, physics)10,000 or more employeesDatabase administrator;DevOps specialist;Full-stack developer;System administrator...Daily or almost every dayMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)White or of European descent35 - 44 years oldYesNaNThe survey was an appropriate lengthSomewhat easy
98853101544YesNoRussian FederationNoIndependent contractor, freelancer, or self-employedSome college/university study without earning a degreeNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
98854101548YesYesCambodiaNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", + "

4 rows × 129 columns

\n", + "
" + ], + "text/plain": [ + " Respondent Hobby OpenSource Country Student \\\n", + "0 1 Yes No Kenya No \n", + "1 3 Yes Yes United Kingdom No \n", + "98853 101544 Yes No Russian Federation No \n", + "98854 101548 Yes Yes Cambodia NaN \n", + "\n", + " Employment \\\n", + "0 Employed part-time \n", + "1 Employed full-time \n", + "98853 Independent contractor, freelancer, or self-employed \n", + "98854 NaN \n", + "\n", + " FormalEducation \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "98853 Some college/university study without earning a degree \n", + "98854 NaN \n", + "\n", + " UndergradMajor \\\n", + "0 Mathematics or statistics \n", + "1 A natural science (ex. biology, chemistry, physics) \n", + "98853 NaN \n", + "98854 NaN \n", + "\n", + " CompanySize \\\n", + "0 20 to 99 employees \n", + "1 10,000 or more employees \n", + "98853 NaN \n", + "98854 NaN \n", + "\n", + " DevType \\\n", + "0 Full-stack developer \n", + "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", + "98853 NaN \n", + "98854 NaN \n", + "\n", + " ... Exercise Gender \\\n", + "0 ... 3 - 4 times per week Male \n", + "1 ... Daily or almost every day Male \n", + "98853 ... NaN NaN \n", + "98854 ... NaN NaN \n", + "\n", + " SexualOrientation EducationParents \\\n", + "0 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", + "\n", + " RaceEthnicity Age Dependents MilitaryUS \\\n", + "0 Black or of African descent 25 - 34 years old Yes NaN \n", + "1 White or of European descent 35 - 44 years old Yes NaN \n", + "98853 NaN NaN NaN NaN \n", + "98854 NaN NaN NaN NaN \n", + "\n", + " SurveyTooLong SurveyEasy \n", + "0 The survey was an appropriate length Very easy \n", + "1 The survey was an appropriate length Somewhat easy \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", + "\n", + "[4 rows x 129 columns]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# combine head and tail variant 1\n", + "rows = 2\n", + "df.head(rows).append(df.tail(rows))" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
RespondentHobbyOpenSourceCountryStudentEmploymentFormalEducationUndergradMajorCompanySizeDevType...ExerciseGenderSexualOrientationEducationParentsRaceEthnicityAgeDependentsMilitaryUSSurveyTooLongSurveyEasy
01YesNoKenyaNoEmployed part-timeBachelor’s degree (BA, BS, B.Eng., etc.)Mathematics or statistics20 to 99 employeesFull-stack developer...3 - 4 times per weekMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)Black or of African descent25 - 34 years oldYesNaNThe survey was an appropriate lengthVery easy
13YesYesUnited KingdomNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)A natural science (ex. biology, chemistry, physics)10,000 or more employeesDatabase administrator;DevOps specialist;Full-stack developer;System administrator...Daily or almost every dayMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)White or of European descent35 - 44 years oldYesNaNThe survey was an appropriate lengthSomewhat easy
98853101544YesNoRussian FederationNoIndependent contractor, freelancer, or self-employedSome college/university study without earning a degreeNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
98854101548YesYesCambodiaNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", + "

4 rows × 129 columns

\n", + "
" + ], + "text/plain": [ + " Respondent Hobby OpenSource Country Student \\\n", + "0 1 Yes No Kenya No \n", + "1 3 Yes Yes United Kingdom No \n", + "98853 101544 Yes No Russian Federation No \n", + "98854 101548 Yes Yes Cambodia NaN \n", + "\n", + " Employment \\\n", + "0 Employed part-time \n", + "1 Employed full-time \n", + "98853 Independent contractor, freelancer, or self-employed \n", + "98854 NaN \n", + "\n", + " FormalEducation \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "98853 Some college/university study without earning a degree \n", + "98854 NaN \n", + "\n", + " UndergradMajor \\\n", + "0 Mathematics or statistics \n", + "1 A natural science (ex. biology, chemistry, physics) \n", + "98853 NaN \n", + "98854 NaN \n", + "\n", + " CompanySize \\\n", + "0 20 to 99 employees \n", + "1 10,000 or more employees \n", + "98853 NaN \n", + "98854 NaN \n", + "\n", + " DevType \\\n", + "0 Full-stack developer \n", + "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", + "98853 NaN \n", + "98854 NaN \n", + "\n", + " ... Exercise Gender \\\n", + "0 ... 3 - 4 times per week Male \n", + "1 ... Daily or almost every day Male \n", + "98853 ... NaN NaN \n", + "98854 ... NaN NaN \n", + "\n", + " SexualOrientation EducationParents \\\n", + "0 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", + "\n", + " RaceEthnicity Age Dependents MilitaryUS \\\n", + "0 Black or of African descent 25 - 34 years old Yes NaN \n", + "1 White or of European descent 35 - 44 years old Yes NaN \n", + "98853 NaN NaN NaN NaN \n", + "98854 NaN NaN NaN NaN \n", + "\n", + " SurveyTooLong SurveyEasy \n", + "0 The survey was an appropriate length Very easy \n", + "1 The survey was an appropriate length Somewhat easy \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", + "\n", + "[4 rows x 129 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# combine head and tail variant 2\n", + "# ranges with iloc\n", + "rows = 2\n", + "df.iloc[pd.np.r_[:rows, -rows:0]]" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 JavaScript;Python;HTML;CSS \n", + "1 JavaScript;Python;Bash/Shell \n", + "2 NaN \n", + "3 C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell\n", + "4 C;C++;Java;Matlab;R;SQL;Bash/Shell \n", + "98850 NaN \n", + "98851 NaN \n", + "98852 NaN \n", + "98853 NaN \n", + "98854 NaN \n", + "Name: LanguageWorkedWith, dtype: object" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# get examples from column LanguageWorkedWith\n", + "rows = 5\n", + "df.LanguageWorkedWith.iloc[pd.np.r_[:rows, -rows:0]]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "C#;JavaScript;SQL;HTML;CSS 1347\n", + "JavaScript;PHP;SQL;HTML;CSS 1235\n", + "Java 1030\n", + "JavaScript;HTML;CSS 881 \n", + "C#;JavaScript;SQL;TypeScript;HTML;CSS 828 \n", + "C;Go;Hack;Java;JavaScript;Perl;PHP;Python;SQL;TypeScript;HTML;CSS;Bash/Shell 1 \n", + "C;C++;Java;JavaScript;PHP;SQL;VBA;Visual Basic 6;HTML;CSS 1 \n", + "Assembly;C;C++;Java;JavaScript;Matlab;PHP;Python;R;SQL;TypeScript;Visual Basic 6;HTML;CSS 1 \n", + "C;C++;Java;JavaScript;Matlab;PHP;Python;Ruby;SQL;HTML;CSS 1 \n", + "Java;JavaScript;PHP;Scala;SQL;Kotlin;HTML;CSS;Bash/Shell 1 \n", + "Name: LanguageWorkedWith, dtype: int64" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# value counts for the same column\n", + "df.LanguageWorkedWith.value_counts().iloc[pd.np.r_[:rows, -rows:0]]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(98855, 38)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# expand the column on separator\n", + "df_lang = df.LanguageWorkedWith.str.split(';', expand=True)\n", + "df_lang.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(78334, 38)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_lang = df_lang.dropna(how='all')\n", + "df_lang.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123456789...28293031323334353637
0JavaScriptPythonHTMLCSSNoneNoneNoneNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
1JavaScriptPythonBash/ShellNoneNoneNoneNoneNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
3C#JavaScriptSQLTypeScriptHTMLCSSBash/ShellNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
4CC++JavaMatlabRSQLBash/ShellNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
5JavaJavaScriptPythonTypeScriptHTMLCSSNoneNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
\n", + "

5 rows × 38 columns

\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4 5 6 \\\n", + "0 JavaScript Python HTML CSS None None None \n", + "1 JavaScript Python Bash/Shell None None None None \n", + "3 C# JavaScript SQL TypeScript HTML CSS Bash/Shell \n", + "4 C C++ Java Matlab R SQL Bash/Shell \n", + "5 Java JavaScript Python TypeScript HTML CSS None \n", + "\n", + " 7 8 9 ... 28 29 30 31 32 33 34 35 \\\n", + "0 None None None ... None None None None None None None None \n", + "1 None None None ... None None None None None None None None \n", + "3 None None None ... None None None None None None None None \n", + "4 None None None ... None None None None None None None None \n", + "5 None None None ... None None None None None None None None \n", + "\n", + " 36 37 \n", + "0 None None \n", + "1 None None \n", + "3 None None \n", + "4 None None \n", + "5 None None \n", + "\n", + "[5 rows x 38 columns]" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_lang.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123456789...28293031323334353637
Assembly5760.0NaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
Bash/Shell29.0465.01221.01929.02882.04442.04844.04269.03311.02562.0...3.01.02.02.0NaN1.0NaNNaN2.035.0
C13335.04707.0NaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
C#16969.04321.03990.01674.0NaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
C++7042.09275.03555.0NaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", + "

5 rows × 38 columns

\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4 5 6 7 \\\n", + "Assembly 5760.0 NaN NaN NaN NaN NaN NaN NaN \n", + "Bash/Shell 29.0 465.0 1221.0 1929.0 2882.0 4442.0 4844.0 4269.0 \n", + "C 13335.0 4707.0 NaN NaN NaN NaN NaN NaN \n", + "C# 16969.0 4321.0 3990.0 1674.0 NaN NaN NaN NaN \n", + "C++ 7042.0 9275.0 3555.0 NaN NaN NaN NaN NaN \n", + "\n", + " 8 9 ... 28 29 30 31 32 33 34 35 36 \\\n", + "Assembly NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + "Bash/Shell 3311.0 2562.0 ... 3.0 1.0 2.0 2.0 NaN 1.0 NaN NaN 2.0 \n", + "C NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C# NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C++ NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + "\n", + " 37 \n", + "Assembly NaN \n", + "Bash/Shell 35.0 \n", + "C NaN \n", + "C# NaN \n", + "C++ NaN \n", + "\n", + "[5 rows x 38 columns]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# get languages as count / numbers\n", + "# how to use value counts for the whole dataframe\n", + "df_lang_num = df_lang.fillna(0).apply(pd.Series.value_counts)\n", + "df_lang_num.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123456789...28293031323334353637
Assembly0.073531NaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
Bash/Shell0.0003700.0059360.0155870.0246250.0367910.0567060.0618380.0544970.0422680.032706...0.0000380.0000130.0000260.000026NaN0.000013NaNNaN0.0000260.000447
C0.1702330.060089NaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
C#0.2166240.0551610.0509360.021370NaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
C++0.0898970.1184030.045383NaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", + "

5 rows × 38 columns

\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4 5 \\\n", + "Assembly 0.073531 NaN NaN NaN NaN NaN \n", + "Bash/Shell 0.000370 0.005936 0.015587 0.024625 0.036791 0.056706 \n", + "C 0.170233 0.060089 NaN NaN NaN NaN \n", + "C# 0.216624 0.055161 0.050936 0.021370 NaN NaN \n", + "C++ 0.089897 0.118403 0.045383 NaN NaN NaN \n", + "\n", + " 6 7 8 9 ... 28 \\\n", + "Assembly NaN NaN NaN NaN ... NaN \n", + "Bash/Shell 0.061838 0.054497 0.042268 0.032706 ... 0.000038 \n", + "C NaN NaN NaN NaN ... NaN \n", + "C# NaN NaN NaN NaN ... NaN \n", + "C++ NaN NaN NaN NaN ... NaN \n", + "\n", + " 29 30 31 32 33 34 35 36 \\\n", + "Assembly NaN NaN NaN NaN NaN NaN NaN NaN \n", + "Bash/Shell 0.000013 0.000026 0.000026 NaN 0.000013 NaN NaN 0.000026 \n", + "C NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C# NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C++ NaN NaN NaN NaN NaN NaN NaN NaN \n", + "\n", + " 37 \n", + "Assembly NaN \n", + "Bash/Shell 0.000447 \n", + "C NaN \n", + "C# NaN \n", + "C++ NaN \n", + "\n", + "[5 rows x 38 columns]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# get languages as percentage / ratio\n", + "# value counts, parameters and lambda\n", + "df_lang_per = df_lang.fillna(0).apply(lambda x: pd.value_counts(x, normalize=True))\n", + "df_lang_per.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "value_counts() missing 1 required positional argument: 'self'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# why for value counts and parameters you need lambda\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf_lang_per\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf_lang\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfillna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSeries\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnormalize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: value_counts() missing 1 required positional argument: 'self'" + ] + } + ], + "source": [ + "# why for value counts and parameters you need lambda\n", + "df_lang_per = df_lang.fillna(0).apply(pd.Series.value_counts(normalize=True))" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 31.800036\n", + "JavaScript 0.698113 \n", + "HTML 0.684607 \n", + "CSS 0.650790 \n", + "SQL 0.570250 \n", + "Java 0.453456 \n", + "Bash/Shell 0.397937 \n", + "Python 0.387558 \n", + "C# 0.344091 \n", + "PHP 0.307287 \n", + "Name: total, dtype: float64" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# getting the percentage of use for each language\n", + "df_lang_per['total'] = df_lang_per.sum(axis=1)\n", + "df_lang_per.sort_values('total', ascending=False)['total'].head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2491024.0\n", + "JavaScript 54686.0 \n", + "HTML 53628.0 \n", + "CSS 50979.0 \n", + "SQL 44670.0 \n", + "Name: total, dtype: float64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# getting the number of use for each language\n", + "df_lang_num['total'] = df_lang_num.sum(axis=1)\n", + "df_lang_num.sort_values('total', ascending=False)['total'].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123456789...28293031323334353637
0JavaScriptPythonHTMLCSSNoneNoneNoneNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
1JavaScriptPythonBash/ShellNoneNoneNoneNoneNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
3C#JavaScriptSQLTypeScriptHTMLCSSBash/ShellNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
4CC++JavaMatlabRSQLBash/ShellNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
5JavaJavaScriptPythonTypeScriptHTMLCSSNoneNoneNoneNone...NoneNoneNoneNoneNoneNoneNoneNoneNoneNone
\n", + "

5 rows × 38 columns

\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4 5 6 \\\n", + "0 JavaScript Python HTML CSS None None None \n", + "1 JavaScript Python Bash/Shell None None None None \n", + "3 C# JavaScript SQL TypeScript HTML CSS Bash/Shell \n", + "4 C C++ Java Matlab R SQL Bash/Shell \n", + "5 Java JavaScript Python TypeScript HTML CSS None \n", + "\n", + " 7 8 9 ... 28 29 30 31 32 33 34 35 \\\n", + "0 None None None ... None None None None None None None None \n", + "1 None None None ... None None None None None None None None \n", + "3 None None None ... None None None None None None None None \n", + "4 None None None ... None None None None None None None None \n", + "5 None None None ... None None None None None None None None \n", + "\n", + " 36 37 \n", + "0 None None \n", + "1 None None \n", + "3 None None \n", + "4 None None \n", + "5 None None \n", + "\n", + "[5 rows x 38 columns]" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_lang.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "C# 16969\n", + "C 13335\n", + "JavaScript 12150\n", + "Java 12087\n", + "C++ 7042 \n", + "Name: 0, dtype: int64" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# get value counts for first column\n", + "df_lang[0].value_counts().head()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "JavaScript 19532\n", + "Java 10175\n", + "C++ 9275 \n", + "PHP 6450 \n", + "C 4707 \n", + "Name: 1, dtype: int64" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# get value counts for second column\n", + "df_lang[1].value_counts().head()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/generic.py:7441: RuntimeWarning: '<' not supported between instances of 'str' and 'float', sort order is undefined for incomparable objects\n", + " return_indexers=True)\n", + "/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/generic.py:7441: RuntimeWarning: '<' not supported between instances of 'float' and 'str', sort order is undefined for incomparable objects\n", + " return_indexers=True)\n" + ] + }, + { + "data": { + "text/plain": [ + "JavaScript 48938.0\n", + "Java 32991.0\n", + "C# 26954.0\n", + "SQL 24727.0\n", + "Python 19063.0\n", + "dtype: float64" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# do a sum of several columns\n", + "df_comb_col = df_lang[0].value_counts(dropna=False) + df_lang[1].value_counts(dropna=False) + df_lang[2].value_counts(dropna=False)+ df_lang[3].value_counts(dropna=False)\n", + "df_comb_col.sort_values(ascending=False).head()" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "df_comb = pd.DataFrame()\n", + "lang_index = []\n", + "df_lang.fillna(0, inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
total
JavaScript54686
HTML53628
CSS50979
SQL44670
Java35521
Rust1857
Kotlin3508
Cobol590
Ocaml470
CSS50979
\n", + "
" + ], + "text/plain": [ + " total\n", + "JavaScript 54686\n", + "HTML 53628\n", + "CSS 50979\n", + "SQL 44670\n", + "Java 35521\n", + "Rust 1857 \n", + "Kotlin 3508 \n", + "Cobol 590 \n", + "Ocaml 470 \n", + "CSS 50979" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# sum all columns in dataframe with iteration\n", + "for col in df_lang.columns:\n", + " if col == 0:\n", + " df_comb['total'] = df_lang[col].fillna(0).value_counts()\n", + " lang_index = df_lang[col].value_counts().index\n", + " else:\n", + " col_ser = df_lang[col].fillna(0).value_counts()\n", + " col_ser = col_ser.reindex(lang_index, fill_value=0)\n", + " df_comb['total'] = df_comb['total'] + col_ser\n", + "df_comb.sort_values('total', ascending=False).head(rows).append(df_comb.tail(rows))\n" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
total
JavaScript54686
HTML53628
CSS50979
SQL44670
Java35521
Erlang886
Cobol590
Ocaml470
Julia430
Hack254
\n", + "
" + ], + "text/plain": [ + " total\n", + "JavaScript 54686\n", + "HTML 53628\n", + "CSS 50979\n", + "SQL 44670\n", + "Java 35521\n", + "Erlang 886 \n", + "Cobol 590 \n", + "Ocaml 470 \n", + "Julia 430 \n", + "Hack 254 " + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_comb = df_comb.sort_values('total', ascending=False)\n", + "df_comb.head(rows).append(df_comb.tail(rows))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 0966389d26393b488322a5b7effdbfa2835edd2e Mon Sep 17 00:00:00 2001 From: softhints Date: Sat, 16 Mar 2019 12:06:35 +0200 Subject: [PATCH 24/76] urllib --- test.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/test.py b/test.py index 75d9766..e31890f 100644 --- a/test.py +++ b/test.py @@ -1 +1,5 @@ -print('hello world') +import urllib.parse + +f = 'Pandas count values in a column of type list' +ff = urllib.parse.quote_plus(f) +print(ff.replace('+', '_')) \ No newline at end of file From 777dd63c8fc31914b26a35769518c9f90f29c38d Mon Sep 17 00:00:00 2001 From: softhints Date: Sun, 17 Mar 2019 11:16:00 +0200 Subject: [PATCH 25/76] Chapter 4 Case study: interface design --- ...apter_4__Case_study_interface_design.ipynb | 618 ++++++++++++++++++ 1 file changed, 618 insertions(+) create mode 100644 notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb diff --git a/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb b/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb new file mode 100644 index 0000000..853b283 --- /dev/null +++ b/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb @@ -0,0 +1,618 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Chapter 4 Case study: interface design\n", + "\n", + "> This chapter presents a case study that demonstrates a process for designing functions that work together.\n", + "\n", + "\n", + "\n", + "* The turtle module\n", + "* Simple repetition\n", + "* Exercises\n", + "* **Encapsulation**\n", + "* **Generalization**\n", + "* **Interface design**\n", + "* **Refactoring**\n", + "* **A development plan**\n", + "* **docstring**\n", + "* Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4.1 The turtle module" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'0.23.4'" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas\n", + "pandas.__version__" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import turtle" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> The turtle module (with a lowercase ’t’) provides a function called Turtle (with an uppercase ’T’) that creates a Turtle object, which we assign to a variable named bob. Printing bob displays something like:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "# mypolygon.py\n", + "import turtle\n", + "bob = turtle.Turtle()\n", + "print(bob)\n", + "turtle.mainloop()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# draw a right angle\n", + "import turtle\n", + "bob = turtle.Turtle()\n", + "bob.fd(100)\n", + "bob.lt(90)\n", + "bob.fd(100)\n", + "turtle.mainloop()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# A method is similar to a function, but it uses slightly different syntax. \n", + "import turtle\n", + "bob = turtle.Turtle()\n", + "bob.fd(100)\n", + "bob.lt(90)\n", + "bob.fd(100)\n", + "turtle.mainloop()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* A **function** is a piece of code that is called by name. It can be passed data to operate on (i.e. the parameters) and can optionally return data (the return value). All data that is passed to a function is explicitly passed.

\n", + "\n", + "* A **method** is a piece of code that is called by a name **that is associated with an object**. In most respects it is identical to a function except for two key differences:\n", + " * A method is implicitly passed the object on which it was called.\n", + " * A method is able to operate on data that is contained within the class" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4.2 Simple repetition" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "# square\n", + "import turtle\n", + "bob3 = turtle.Turtle()\n", + "\n", + "bob3.fd(100)\n", + "bob3.lt(90)\n", + "\n", + "bob3.fd(100)\n", + "bob3.lt(90)\n", + "\n", + "bob3.fd(100)\n", + "bob3.lt(90)\n", + "\n", + "bob3.fd(100)\n", + "\n", + "turtle.mainloop()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> A for statement is also called a loop because the flow of execution runs through the body and then loops back to the top" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Hello!\n", + "Hello!\n", + "Hello!\n", + "Hello!\n" + ] + } + ], + "source": [ + "for i in range(4):\n", + " print('Hello!')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# square \n", + "import turtle\n", + "bob = turtle.Turtle()\n", + "for i in range(4):\n", + " bob.fd(100)\n", + " bob.lt(90)\n", + "\n", + "turtle.done()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Did you notice the difference between both programs?\n", + "\n", + "**The art of cognitive blindspots | Kyle Eschen**\n", + "\n", + "https://www.youtube.com/watch?reload=9&v=OOG65rSM5fA" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4.3 Exercises\n", + "\n", + "1. Write a function called square that takes a parameter named t, which is a turtle. It should use the turtle to draw a square.\n", + "Write a function call that passes bob as an argument to square, and then run the program again.

\n", + "\n", + "2. Add another parameter, named length, to square. Modify the body so length of the sides is length, and then modify the function call to provide a second argument. Run the program again. Test your program with a range of values for length.

\n", + "\n", + "3. Make a copy of square and change the name to polygon. Add another parameter named n and modify the body so it draws an n-sided regular polygon. Hint: The exterior angles of an n-sided regular polygon are 360/n degrees.

\n", + "\n", + "4. Write a function called circle that takes a turtle, t, and radius, r, as parameters and that draws an approximate circle by calling polygon with an appropriate length and number of sides. Test your function with a range of values of r.\n", + "Hint: figure out the circumference of the circle and make sure that length * n = circumference.

\n", + "\n", + "5. Make a more general version of circle called arc that takes an additional parameter angle, which determines what fraction of a circle to draw. angle is in units of degrees, so when angle=360, arc should draw a complete circle." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4.4 Encapsulation\n", + "\n", + "> Wrapping a piece of code up in a function is called encapsulation. \n", + "\n", + "The major advantages: \n", + "* code re-use\n", + "* shorter programs (it is more concise to call a function twice than to copy and paste the body)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# square \n", + "import turtle\n", + "\n", + "def square(t):\n", + " for i in range(4):\n", + " t.fd(100)\n", + " t.lt(90)\n", + "\n", + "bob = turtle.Turtle()\n", + "square(bob)\n", + "turtle.done()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> The innermost statements, fd and lt are indented twice to show that they are inside the for loop, which is inside the function definition. The next line, square(bob), is flush with the left margin, which indicates the end of both the for loop and the function definition." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Inside the function, t refers to the same turtle bob, so t.lt(90) has the same effect as bob.lt(90). In that case, why not call the parameter bob? " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4.5 Generalization\n", + "\n", + "> Adding a parameter to a function is called generalization because it makes the function more general: in the previous version, the square is always the same size; in this version it can be any size." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# add a length parameter to square. \n", + "import turtle\n", + "\n", + "def square(t, length):\n", + " for i in range(4):\n", + " t.fd(length)\n", + " t.lt(90)\n", + "\n", + "\n", + "\n", + "bob = turtle.Turtle()\n", + "square(bob, 100)\n", + "\n", + "turtle.done()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Instead of drawing squares, polygon draws regular polygons with any number of sides.\n", + "import turtle\n", + "\n", + "def polygon(t, n, length):\n", + " angle = 360 / n\n", + " for i in range(n):\n", + " t.fd(length)\n", + " t.lt(angle)\n", + "\n", + "bob = turtle.Turtle()\n", + "polygon(bob, 21, 70)\n", + "\n", + "turtle.done()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> When a function has more than a few numeric arguments, it is easy to forget what they are, or what order they should be in. In that case it is often a good idea to include the names of the parameters in the argument list:\n", + "\n", + "```python\n", + "polygon(bob, n=7, length=70)```\n", + "\n", + "> These are called keyword arguments because they include the parameter names as “keywords” (not to be confused with Python keywords like while and def)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4.6 Interface design\n", + "\n", + "> The interface of a function is a summary of how it is used: \n", + "\n", + "* what are the parameters? \n", + "* What does the function do? \n", + "* And what is the return value? \n", + "\n", + "> An interface is “clean” if it allows the caller to do what they want without dealing with unnecessary details.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The next step is to write circle, which takes a radius, r, as a parameter. \n", + "import turtle\n", + "import math\n", + "\n", + "def polygon(t, n, length):\n", + " angle = 360 / n\n", + " for i in range(n):\n", + " t.fd(length)\n", + " t.lt(angle)\n", + "\n", + "def circle(t, r):\n", + " circumference = 2 * math.pi * r\n", + " n = 50\n", + " length = circumference / n\n", + " polygon(t, n, length)\n", + "\n", + "bob = turtle.Turtle()\n", + "circle(bob, 75)\n", + "\n", + "turtle.done()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# One limitation of this solution is that n is a constant,\n", + "import turtle\n", + "import math\n", + "\n", + "def polygon(t, n, length):\n", + " angle = 360 / n\n", + " for i in range(n):\n", + " t.fd(length)\n", + " t.lt(angle)\n", + "\n", + "def circle(t, r):\n", + " circumference = 2 * math.pi * r\n", + " n = int(circumference / 3) + 3\n", + " length = circumference / n\n", + " polygon(t, n, length)\n", + "\n", + "bob = turtle.Turtle()\n", + "circle(bob, 75)\n", + "\n", + "turtle.done()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4.7 Refactoring\n", + "\n", + "> This process—rearranging a program to improve interfaces and facilitate code re-use—is called refactoring. In this case, we noticed that there was similar code in arc and polygon, so we “factored it out” into polyline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "# copy of polygon and transform it into arc\n", + "import turtle\n", + "import math\n", + "\n", + "def arc(t, r, angle):\n", + " arc_length = 2 * math.pi * r * angle / 360\n", + " n = int(arc_length / 3) + 1\n", + " step_length = arc_length / n\n", + " step_angle = angle / n\n", + " \n", + " for i in range(n):\n", + " t.fd(step_length)\n", + " t.lt(step_angle)\n", + "\n", + "bob = turtle.Turtle()\n", + "arc(bob, 100, 180)\n", + "\n", + "turtle.done()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# general function polyline\n", + "# rewrite polygon and arc to use polyline\n", + "\n", + "import turtle\n", + "import math\n", + "\n", + "def polyline(t, n, length, angle):\n", + " for i in range(n):\n", + " t.fd(length)\n", + " t.lt(angle)\n", + "\n", + "def polygon(t, n, length):\n", + " angle = 360.0 / n\n", + " polyline(t, n, length, angle)\n", + "\n", + "def arc(t, r, angle):\n", + " arc_length = 2 * math.pi * r * angle / 360\n", + " n = int(arc_length / 3) + 1\n", + " step_length = arc_length / n\n", + " step_angle = float(angle) / n\n", + " polyline(t, n, step_length, step_angle)\n", + " \n", + "def circle(t, r):\n", + " arc(t, r, 360)\n", + "\n", + "bob = turtle.Turtle()\n", + "arc(bob, 100, 180)\n", + "\n", + "turtle.done()\n", + "\n", + "import os\n", + "os._exit(00)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4.8 A development plan\n", + "\n", + "1. Start by writing a small program with no function definitions.

\n", + "2. Once you get the program working, identify a coherent piece of it, encapsulate the piece in a function and give it a name.

\n", + "3. Generalize the function by adding appropriate parameters.

\n", + "4. Repeat steps 1–3 until you have a set of working functions. Copy and paste working code to avoid retyping (and re-debugging).

\n", + "5. Look for opportunities to improve the program by refactoring. For example, if you have similar code in several places, consider factoring it into an appropriately general function.

\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4.9 docstring\n", + "\n", + "> A docstring is a string at the beginning of a function that explains the interface (“doc” is short for “documentation”)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "polyline\n", + "square\n" + ] + } + ], + "source": [ + "import turtle\n", + "\n", + "def polyline():\n", + " \"\"\"Draws n line segments with the given length and\n", + " angle (in degrees) between them. t is a turtle.\n", + " \"\"\" \n", + " print('polyline')\n", + " #for i in range(n):\n", + " # t.fd(length)\n", + " # t.lt(angle)\n", + " \n", + "def square():\n", + " print('square')\n", + " \n", + "polyline() \n", + "square()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4.10 Debugging\n", + "\n", + "> If the preconditions are satisfied and the postconditions are not, the bug is in the function. If your pre- and postconditions are clear, they can help with debugging." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From e827b237b5afcfaba011a0cbb961534ecd27c5b0 Mon Sep 17 00:00:00 2001 From: softhints Date: Mon, 18 Mar 2019 10:57:47 +0200 Subject: [PATCH 26/76] How_to_Optimize_and_Speed_Up_Pandas --- .../How_to_Optimize_and_Speed_Up_Pandas.ipynb | 2373 +++++++++++++++++ 1 file changed, 2373 insertions(+) create mode 100644 notebooks/pandas/How_to_Optimize_and_Speed_Up_Pandas.ipynb diff --git a/notebooks/pandas/How_to_Optimize_and_Speed_Up_Pandas.ipynb b/notebooks/pandas/How_to_Optimize_and_Speed_Up_Pandas.ipynb new file mode 100644 index 0000000..7325f23 --- /dev/null +++ b/notebooks/pandas/How_to_Optimize_and_Speed_Up_Pandas.ipynb @@ -0,0 +1,2373 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3 Simple ways to optimize pandas\n", + "\n", + "1. **Optimize datatypes**\n", + "2. **Use built-in functions**\n", + "3. **Search for smart alternative**\n", + "4. **Do tests** " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Bonus tips:\n", + "1. **Use NumPy arrays/matrix**\n", + "\n", + "```python \n", + "# Convert the frame to its Numpy-array representation. Deprecated since version 0.23.0\n", + "numpy_matrix = df.as_matrix() \n", + "\n", + "#Return a Numpy representation of the DataFrame.\n", + "df.values\n", + "\n", + "# Convert the DataFrame to a NumPy array\n", + "df.to_numpy() \n", + "```\n", + "2. **Optimize data when you read it**\n", + "```python \n", + "pandas.read_csv('foo.csv', dtype={'a': 'int'})\n", + "```\n", + "3. **Convert dates to Datetime**\n", + "\n", + "```python \n", + "df['start_date'] = pd.to_datetime(df['start_date'])\n", + "```\n", + "\n", + "4. **Loop Pandas data in smart way ( iterrows, itertuples, zip )**\n", + "\n", + "https://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandas\n", + "\n", + "```python\n", + "for i,r in t.iterrows(): # 0.5639059543609619\n", + "for ir in t.itertuples(): # 0.017839908599853516\n", + "for r in zip(t['a'], t['b']): # 0.005645036697387695\n", + "``` " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(98855, 129)\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "pd.set_option(\"display.max_columns\", None) # or 1000\n", + "pd.set_option(\"display.max_rows\", None) # or 1000\n", + "pd.set_option(\"display.max_colwidth\", 500) # or 199\n", + "pd.set_option(\"display.expand_frame_repr\", True) # or 199\n", + "\n", + "# read the data frame and see the data insight\n", + "df = pd.read_csv(\"../csv/stackoverflow/developer_survey_2018/survey_results_public.csv\", low_memory=False)\n", + "print(df.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Respondent int64\n", + "Hobby object\n", + "OpenSource object\n", + "Country object\n", + "Student object\n", + "dtype: object" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dtypes.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
RespondentHobbyOpenSourceCountryStudentEmploymentFormalEducationUndergradMajorCompanySizeDevTypeYearsCodingYearsCodingProfJobSatisfactionCareerSatisfactionHopeFiveYearsJobSearchStatusLastNewJobAssessJob1AssessJob2AssessJob3AssessJob4AssessJob5AssessJob6AssessJob7AssessJob8AssessJob9AssessJob10AssessBenefits1AssessBenefits2AssessBenefits3AssessBenefits4AssessBenefits5AssessBenefits6AssessBenefits7AssessBenefits8AssessBenefits9AssessBenefits10AssessBenefits11JobContactPriorities1JobContactPriorities2JobContactPriorities3JobContactPriorities4JobContactPriorities5JobEmailPriorities1JobEmailPriorities2JobEmailPriorities3JobEmailPriorities4JobEmailPriorities5JobEmailPriorities6JobEmailPriorities7UpdateCVCurrencySalarySalaryTypeConvertedSalaryCurrencySymbolCommunicationToolsTimeFullyProductiveEducationTypesSelfTaughtTypesTimeAfterBootcampHackathonReasonsAgreeDisagree1AgreeDisagree2AgreeDisagree3LanguageWorkedWithLanguageDesireNextYearDatabaseWorkedWithDatabaseDesireNextYearPlatformWorkedWithPlatformDesireNextYearFrameworkWorkedWithFrameworkDesireNextYearIDEOperatingSystemNumberMonitorsMethodologyVersionControlCheckInCodeAdBlockerAdBlockerDisableAdBlockerReasonsAdsAgreeDisagree1AdsAgreeDisagree2AdsAgreeDisagree3AdsActionsAdsPriorities1AdsPriorities2AdsPriorities3AdsPriorities4AdsPriorities5AdsPriorities6AdsPriorities7AIDangerousAIInterestingAIResponsibleAIFutureEthicsChoiceEthicsReportEthicsResponsibleEthicalImplicationsStackOverflowRecommendStackOverflowVisitStackOverflowHasAccountStackOverflowParticipateStackOverflowJobsStackOverflowDevStoryStackOverflowJobsRecommendStackOverflowConsiderMemberHypotheticalTools1HypotheticalTools2HypotheticalTools3HypotheticalTools4HypotheticalTools5WakeTimeHoursComputerHoursOutsideSkipMealsErgonomicDevicesExerciseGenderSexualOrientationEducationParentsRaceEthnicityAgeDependentsMilitaryUSSurveyTooLongSurveyEasy
01YesNoKenyaNoEmployed part-timeBachelor’s degree (BA, BS, B.Eng., etc.)Mathematics or statistics20 to 99 employeesFull-stack developer3-5 years3-5 yearsExtremely satisfiedExtremely satisfiedWorking as a founder or co-founder of my own companyI’m not actively looking, but I am open to new opportunitiesLess than a year ago10.07.08.01.02.05.03.04.09.06.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3.01.04.02.05.05.06.07.02.01.04.03.0My job status or other personal status changedNaNNaNMonthlyNaNKESSlackOne to three monthsTaught yourself a new language, framework, or tool without taking a formal course;Participated in a hackathonThe official documentation and/or standards for the technology;A book or e-book from O’Reilly, Apress, or a similar publisher;Questions & answers on Stack Overflow;Online developer communities other than Stack Overflow (ex. forums, listservs, IRC channels, etc.)NaNTo build my professional networkStrongly agreeStrongly agreeNeither Agree nor DisagreeJavaScript;Python;HTML;CSSJavaScript;Python;HTML;CSSRedis;SQL Server;MySQL;PostgreSQL;Amazon RDS/Aurora;Microsoft Azure (Tables, CosmosDB, SQL, etc)Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/Aurora;Microsoft Azure (Tables, CosmosDB, SQL, etc)AWS;Azure;Linux;FirebaseAWS;Azure;Linux;FirebaseDjango;ReactDjango;ReactKomodo;Vim;Visual Studio CodeLinux-based1Agile;ScrumGitMultiple times per dayYesNoNaNStrongly agreeStrongly agreeStrongly agreeSaw an online advertisement and then researched it (without clicking on the ad);Stopped going to a website because of their advertising1.05.04.07.02.06.03.0Artificial intelligence surpassing human intelligence (\"the singularity\")Algorithms making important decisionsThe developers or the people creating the AII'm excited about the possibilities more than worried about the dangers.NoYes, and publiclyUpper management at the company/organizationYes10 (Very Likely)Multiple times per dayYesI have never participated in Q&A on Stack OverflowNo, I knew that Stack Overflow had a jobs board but have never used or visited itYesNaNYesExtremely interestedExtremely interestedExtremely interestedExtremely interestedExtremely interestedBetween 5:00 - 6:00 AM9 - 12 hours1 - 2 hoursNeverStanding desk3 - 4 times per weekMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)Black or of African descent25 - 34 years oldYesNaNThe survey was an appropriate lengthVery easy
13YesYesUnited KingdomNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)A natural science (ex. biology, chemistry, physics)10,000 or more employeesDatabase administrator;DevOps specialist;Full-stack developer;System administrator30 or more years18-20 yearsModerately dissatisfiedNeither satisfied nor dissatisfiedWorking in a different or more specialized technical role than the one I'm in nowI am actively looking for a jobMore than 4 years ago1.07.010.08.02.05.04.03.06.09.01.05.03.07.010.04.011.09.06.02.08.03.01.05.02.04.01.03.04.05.02.06.07.0I saw an employer’s advertisementBritish pounds sterling (£)51000Yearly70841.0GBPConfluence;Office / productivity suite (Microsoft Office, Google Suite, etc.);Slack;Other wiki tool (Github, Google Sites, proprietary software, etc.)One to three monthsTaught yourself a new language, framework, or tool without taking a formal course;Contributed to open source softwareThe official documentation and/or standards for the technology;Questions & answers on Stack OverflowNaNNaNAgreeAgreeNeither Agree nor DisagreeJavaScript;Python;Bash/ShellGo;PythonRedis;PostgreSQL;MemcachedPostgreSQLLinuxLinuxDjangoReactIPython / Jupyter;Sublime Text;VimLinux-based2NaNGit;SubversionA few times per weekYesYesThe website I was visiting asked me to disable itSomewhat agreeNeither agree nor disagreeNeither agree nor disagreeNaN3.05.01.04.06.07.02.0Increasing automation of jobsIncreasing automation of jobsThe developers or the people creating the AII'm excited about the possibilities more than worried about the dangers.Depends on what it isDepends on what it isUpper management at the company/organizationYes10 (Very Likely)A few times per month or weeklyYesA few times per month or weeklyYesNo, I have one but it's out of date7YesA little bit interestedA little bit interestedA little bit interestedA little bit interestedA little bit interestedBetween 6:01 - 7:00 AM5 - 8 hours30 - 59 minutesNeverErgonomic keyboard or mouseDaily or almost every dayMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)White or of European descent35 - 44 years oldYesNaNThe survey was an appropriate lengthSomewhat easy
\n", + "
" + ], + "text/plain": [ + " Respondent Hobby OpenSource Country Student Employment \\\n", + "0 1 Yes No Kenya No Employed part-time \n", + "1 3 Yes Yes United Kingdom No Employed full-time \n", + "\n", + " FormalEducation \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "\n", + " UndergradMajor \\\n", + "0 Mathematics or statistics \n", + "1 A natural science (ex. biology, chemistry, physics) \n", + "\n", + " CompanySize \\\n", + "0 20 to 99 employees \n", + "1 10,000 or more employees \n", + "\n", + " DevType \\\n", + "0 Full-stack developer \n", + "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", + "\n", + " YearsCoding YearsCodingProf JobSatisfaction \\\n", + "0 3-5 years 3-5 years Extremely satisfied \n", + "1 30 or more years 18-20 years Moderately dissatisfied \n", + "\n", + " CareerSatisfaction \\\n", + "0 Extremely satisfied \n", + "1 Neither satisfied nor dissatisfied \n", + "\n", + " HopeFiveYears \\\n", + "0 Working as a founder or co-founder of my own company \n", + "1 Working in a different or more specialized technical role than the one I'm in now \n", + "\n", + " JobSearchStatus \\\n", + "0 I’m not actively looking, but I am open to new opportunities \n", + "1 I am actively looking for a job \n", + "\n", + " LastNewJob AssessJob1 AssessJob2 AssessJob3 AssessJob4 \\\n", + "0 Less than a year ago 10.0 7.0 8.0 1.0 \n", + "1 More than 4 years ago 1.0 7.0 10.0 8.0 \n", + "\n", + " AssessJob5 AssessJob6 AssessJob7 AssessJob8 AssessJob9 AssessJob10 \\\n", + "0 2.0 5.0 3.0 4.0 9.0 6.0 \n", + "1 2.0 5.0 4.0 3.0 6.0 9.0 \n", + "\n", + " AssessBenefits1 AssessBenefits2 AssessBenefits3 AssessBenefits4 \\\n", + "0 NaN NaN NaN NaN \n", + "1 1.0 5.0 3.0 7.0 \n", + "\n", + " AssessBenefits5 AssessBenefits6 AssessBenefits7 AssessBenefits8 \\\n", + "0 NaN NaN NaN NaN \n", + "1 10.0 4.0 11.0 9.0 \n", + "\n", + " AssessBenefits9 AssessBenefits10 AssessBenefits11 JobContactPriorities1 \\\n", + "0 NaN NaN NaN 3.0 \n", + "1 6.0 2.0 8.0 3.0 \n", + "\n", + " JobContactPriorities2 JobContactPriorities3 JobContactPriorities4 \\\n", + "0 1.0 4.0 2.0 \n", + "1 1.0 5.0 2.0 \n", + "\n", + " JobContactPriorities5 JobEmailPriorities1 JobEmailPriorities2 \\\n", + "0 5.0 5.0 6.0 \n", + "1 4.0 1.0 3.0 \n", + "\n", + " JobEmailPriorities3 JobEmailPriorities4 JobEmailPriorities5 \\\n", + "0 7.0 2.0 1.0 \n", + "1 4.0 5.0 2.0 \n", + "\n", + " JobEmailPriorities6 JobEmailPriorities7 \\\n", + "0 4.0 3.0 \n", + "1 6.0 7.0 \n", + "\n", + " UpdateCV \\\n", + "0 My job status or other personal status changed \n", + "1 I saw an employer’s advertisement \n", + "\n", + " Currency Salary SalaryType ConvertedSalary \\\n", + "0 NaN NaN Monthly NaN \n", + "1 British pounds sterling (£) 51000 Yearly 70841.0 \n", + "\n", + " CurrencySymbol \\\n", + "0 KES \n", + "1 GBP \n", + "\n", + " CommunicationTools \\\n", + "0 Slack \n", + "1 Confluence;Office / productivity suite (Microsoft Office, Google Suite, etc.);Slack;Other wiki tool (Github, Google Sites, proprietary software, etc.) \n", + "\n", + " TimeFullyProductive \\\n", + "0 One to three months \n", + "1 One to three months \n", + "\n", + " EducationTypes \\\n", + "0 Taught yourself a new language, framework, or tool without taking a formal course;Participated in a hackathon \n", + "1 Taught yourself a new language, framework, or tool without taking a formal course;Contributed to open source software \n", + "\n", + " SelfTaughtTypes \\\n", + "0 The official documentation and/or standards for the technology;A book or e-book from O’Reilly, Apress, or a similar publisher;Questions & answers on Stack Overflow;Online developer communities other than Stack Overflow (ex. forums, listservs, IRC channels, etc.) \n", + "1 The official documentation and/or standards for the technology;Questions & answers on Stack Overflow \n", + "\n", + " TimeAfterBootcamp HackathonReasons AgreeDisagree1 \\\n", + "0 NaN To build my professional network Strongly agree \n", + "1 NaN NaN Agree \n", + "\n", + " AgreeDisagree2 AgreeDisagree3 LanguageWorkedWith \\\n", + "0 Strongly agree Neither Agree nor Disagree JavaScript;Python;HTML;CSS \n", + "1 Agree Neither Agree nor Disagree JavaScript;Python;Bash/Shell \n", + "\n", + " LanguageDesireNextYear \\\n", + "0 JavaScript;Python;HTML;CSS \n", + "1 Go;Python \n", + "\n", + " DatabaseWorkedWith \\\n", + "0 Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/Aurora;Microsoft Azure (Tables, CosmosDB, SQL, etc) \n", + "1 Redis;PostgreSQL;Memcached \n", + "\n", + " DatabaseDesireNextYear \\\n", + "0 Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/Aurora;Microsoft Azure (Tables, CosmosDB, SQL, etc) \n", + "1 PostgreSQL \n", + "\n", + " PlatformWorkedWith PlatformDesireNextYear FrameworkWorkedWith \\\n", + "0 AWS;Azure;Linux;Firebase AWS;Azure;Linux;Firebase Django;React \n", + "1 Linux Linux Django \n", + "\n", + " FrameworkDesireNextYear IDE OperatingSystem \\\n", + "0 Django;React Komodo;Vim;Visual Studio Code Linux-based \n", + "1 React IPython / Jupyter;Sublime Text;Vim Linux-based \n", + "\n", + " NumberMonitors Methodology VersionControl CheckInCode \\\n", + "0 1 Agile;Scrum Git Multiple times per day \n", + "1 2 NaN Git;Subversion A few times per week \n", + "\n", + " AdBlocker AdBlockerDisable \\\n", + "0 Yes No \n", + "1 Yes Yes \n", + "\n", + " AdBlockerReasons AdsAgreeDisagree1 \\\n", + "0 NaN Strongly agree \n", + "1 The website I was visiting asked me to disable it Somewhat agree \n", + "\n", + " AdsAgreeDisagree2 AdsAgreeDisagree3 \\\n", + "0 Strongly agree Strongly agree \n", + "1 Neither agree nor disagree Neither agree nor disagree \n", + "\n", + " AdsActions \\\n", + "0 Saw an online advertisement and then researched it (without clicking on the ad);Stopped going to a website because of their advertising \n", + "1 NaN \n", + "\n", + " AdsPriorities1 AdsPriorities2 AdsPriorities3 AdsPriorities4 \\\n", + "0 1.0 5.0 4.0 7.0 \n", + "1 3.0 5.0 1.0 4.0 \n", + "\n", + " AdsPriorities5 AdsPriorities6 AdsPriorities7 \\\n", + "0 2.0 6.0 3.0 \n", + "1 6.0 7.0 2.0 \n", + "\n", + " AIDangerous \\\n", + "0 Artificial intelligence surpassing human intelligence (\"the singularity\") \n", + "1 Increasing automation of jobs \n", + "\n", + " AIInteresting \\\n", + "0 Algorithms making important decisions \n", + "1 Increasing automation of jobs \n", + "\n", + " AIResponsible \\\n", + "0 The developers or the people creating the AI \n", + "1 The developers or the people creating the AI \n", + "\n", + " AIFuture \\\n", + "0 I'm excited about the possibilities more than worried about the dangers. \n", + "1 I'm excited about the possibilities more than worried about the dangers. \n", + "\n", + " EthicsChoice EthicsReport \\\n", + "0 No Yes, and publicly \n", + "1 Depends on what it is Depends on what it is \n", + "\n", + " EthicsResponsible EthicalImplications \\\n", + "0 Upper management at the company/organization Yes \n", + "1 Upper management at the company/organization Yes \n", + "\n", + " StackOverflowRecommend StackOverflowVisit \\\n", + "0 10 (Very Likely) Multiple times per day \n", + "1 10 (Very Likely) A few times per month or weekly \n", + "\n", + " StackOverflowHasAccount StackOverflowParticipate \\\n", + "0 Yes I have never participated in Q&A on Stack Overflow \n", + "1 Yes A few times per month or weekly \n", + "\n", + " StackOverflowJobs \\\n", + "0 No, I knew that Stack Overflow had a jobs board but have never used or visited it \n", + "1 Yes \n", + "\n", + " StackOverflowDevStory StackOverflowJobsRecommend \\\n", + "0 Yes NaN \n", + "1 No, I have one but it's out of date 7 \n", + "\n", + " StackOverflowConsiderMember HypotheticalTools1 \\\n", + "0 Yes Extremely interested \n", + "1 Yes A little bit interested \n", + "\n", + " HypotheticalTools2 HypotheticalTools3 HypotheticalTools4 \\\n", + "0 Extremely interested Extremely interested Extremely interested \n", + "1 A little bit interested A little bit interested A little bit interested \n", + "\n", + " HypotheticalTools5 WakeTime HoursComputer \\\n", + "0 Extremely interested Between 5:00 - 6:00 AM 9 - 12 hours \n", + "1 A little bit interested Between 6:01 - 7:00 AM 5 - 8 hours \n", + "\n", + " HoursOutside SkipMeals ErgonomicDevices \\\n", + "0 1 - 2 hours Never Standing desk \n", + "1 30 - 59 minutes Never Ergonomic keyboard or mouse \n", + "\n", + " Exercise Gender SexualOrientation \\\n", + "0 3 - 4 times per week Male Straight or heterosexual \n", + "1 Daily or almost every day Male Straight or heterosexual \n", + "\n", + " EducationParents RaceEthnicity \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent \n", + "\n", + " Age Dependents MilitaryUS \\\n", + "0 25 - 34 years old Yes NaN \n", + "1 35 - 44 years old Yes NaN \n", + "\n", + " SurveyTooLong SurveyEasy \n", + "0 The survey was an appropriate length Very easy \n", + "1 The survey was an appropriate length Somewhat easy " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['No', 'Yes'], dtype=object)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.OpenSource.unique()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['No', 'Yes, part-time', nan, 'Yes, full-time'], dtype=object)" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.Student.unique()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "No 70399\n", + "Yes, full-time 18394\n", + "Yes, part-time 6108\n", + "NaN 3954\n", + "Name: Student, dtype: int64" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.Student.value_counts(dropna=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Optimize datatypes\n", + "\n", + "* donwcast numbers when is not needed\n", + "* use categories" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0.75 MB\n", + "0.38 MB\n" + ] + } + ], + "source": [ + "# optimize storage for dataframe with numbers\n", + "\n", + "gl_int = df.select_dtypes(include=[\"int\"])\n", + "converted_int = gl_int.apply(pd.to_numeric, downcast=\"unsigned\")\n", + "\n", + "\n", + "def mem_usage(pandas_obj):\n", + " if isinstance(pandas_obj, pd.DataFrame):\n", + " usage_b = pandas_obj.memory_usage(deep=True).sum()\n", + " else: # we assume if not a df it's a series\n", + " usage_b = pandas_obj.memory_usage(deep=True)\n", + " usage_mb = usage_b / 1024 ** 2 # convert bytes to megabytes\n", + " return \"{:03.2f} MB\".format(usage_mb)\n", + "\n", + "print(mem_usage(gl_int))\n", + "print(mem_usage(converted_int))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "622.07 MB\n", + "572.40 MB\n" + ] + } + ], + "source": [ + "# convert columns to categorical\n", + "\n", + "def as_categorical(df):\n", + " df['CompanySize'] = df.CompanySize.astype('category')\n", + " df['Country'] = df.Country.astype('category')\n", + " df['Hobby'] = df.Hobby.astype('category')\n", + " df['YearsCoding'] = df.YearsCoding.astype('category')\n", + " df['Employment'] = df.Employment.astype('category')\n", + " df['LastNewJob'] = df.LastNewJob.astype('category')\n", + " df['JobSatisfaction'] = df.JobSatisfaction.astype('category')\n", + " df['CareerSatisfaction'] = df.CareerSatisfaction.astype('category') \n", + "\n", + "print(mem_usage(df))\n", + "as_categorical(df)\n", + "print(mem_usage(df))" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "572.40 MB\n", + "566.89 MB\n" + ] + } + ], + "source": [ + "print(mem_usage(df))\n", + "df['OpenSource'] = df.OpenSource.astype('category')\n", + "print(mem_usage(df))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Use built-in functions" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 5765285 function calls (5689280 primitive calls) in 3.477 seconds\n", + "\n", + " Ordered by: standard name\n", + "\n", + " ncalls tottime percall cumtime percall filename:lineno(function)\n", + " 7999 0.005 0.000 0.008 0.000 :416(parent)\n", + " 192010 0.093 0.000 0.210 0.000 :997(_handle_fromlist)\n", + " 1 0.000 0.000 3.446 3.446 :5(before)\n", + " 1 0.019 0.019 3.445 3.445 :6()\n", + " 1 0.030 0.030 3.477 3.477 :1()\n", + " 4001 0.003 0.000 0.006 0.000 __init__.py:205(iteritems)\n", + " 1 0.000 0.000 0.012 0.012 _decorators.py:136(wrapper)\n", + " 9 0.000 0.000 0.001 0.000 _methods.py:26(_amax)\n", + " 9 0.000 0.000 0.001 0.000 _methods.py:30(_amin)\n", + " 8003 0.004 0.000 0.042 0.000 _methods.py:42(_any)\n", + " 1 0.000 0.000 0.000 0.000 _methods.py:45(_all)\n", + " 1 0.000 0.000 0.000 0.000 _validators.py:114(_check_for_invalid_keys)\n", + " 1 0.000 0.000 0.000 0.000 _validators.py:130(validate_kwargs)\n", + " 1 0.000 0.000 0.000 0.000 _validators.py:32(_check_for_default_values)\n", + " 10 0.000 0.000 0.000 0.000 _weakrefset.py:70(__contains__)\n", + " 10 0.000 0.000 0.000 0.000 abc.py:180(__instancecheck__)\n", + " 1 0.000 0.000 0.000 0.000 algorithms.py:141(_reconstruct_data)\n", + " 14 0.000 0.000 0.000 0.000 algorithms.py:1421(_get_take_nd_function)\n", + " 9 0.000 0.000 0.003 0.000 algorithms.py:1454(take)\n", + " 14 0.000 0.000 0.098 0.007 algorithms.py:1548(take_nd)\n", + " 1 0.000 0.000 0.000 0.000 algorithms.py:172(_ensure_arraylike)\n", + " 1 0.000 0.000 0.002 0.002 algorithms.py:224(_get_data_algo)\n", + " 1 0.000 0.000 0.008 0.008 algorithms.py:449(_factorize_array)\n", + " 2 0.000 0.000 0.000 0.000 algorithms.py:48(_ensure_data)\n", + " 1 0.000 0.000 0.012 0.012 algorithms.py:576(factorize)\n", + " 1 0.000 0.000 0.000 0.000 base.py:1329(_get_names)\n", + " 3998 0.008 0.000 0.015 0.000 base.py:1695(_convert_slice_indexer)\n", + " 3 0.000 0.000 0.000 0.000 base.py:2033(__contains__)\n", + " 3998 0.010 0.000 0.221 0.000 base.py:2067(__getitem__)\n", + " 1 0.000 0.000 0.000 0.000 base.py:2179(take)\n", + " 1 0.000 0.000 0.000 0.000 base.py:2445(equals)\n", + " 4000 0.042 0.000 0.174 0.000 base.py:255(__new__)\n", + " 11994 0.005 0.000 0.006 0.000 base.py:4132(_validate_indexer)\n", + " 4001 0.009 0.000 0.019 0.000 base.py:473(_simple_new)\n", + " 8000 0.003 0.000 0.004 0.000 base.py:4914(_ensure_index)\n", + " 3999 0.012 0.000 0.194 0.000 base.py:520(_shallow_copy_with_infer)\n", + " 84052 0.051 0.000 0.093 0.000 base.py:61(is_dtype)\n", + " 1 0.000 0.000 0.000 0.000 base.py:615(is_)\n", + " 4001 0.002 0.000 0.002 0.000 base.py:635(_reset_identity)\n", + " 71988 0.023 0.000 0.032 0.000 base.py:641(__len__)\n", + " 1 0.000 0.000 0.000 0.000 base.py:662(dtype)\n", + " 6 0.000 0.000 0.000 0.000 base.py:672(values)\n", + " 3 0.000 0.000 0.000 0.000 base.py:677(_values)\n", + " 2 0.000 0.000 0.000 0.000 base.py:711(get_values)\n", + " 1 0.000 0.000 0.000 0.000 base.py:893(tolist)\n", + " 3999 0.002 0.000 0.003 0.000 base.py:904(_coerce_to_ndarray)\n", + " 1 0.000 0.000 0.000 0.000 base.py:912(__iter__)\n", + " 3999 0.003 0.000 0.006 0.000 base.py:920(_get_attributes_dict)\n", + " 3999 0.002 0.000 0.003 0.000 base.py:922()\n", + " 13 0.000 0.000 0.000 0.000 cast.py:257(maybe_promote)\n", + " 35991 0.020 0.000 0.060 0.000 cast.py:600(coerce_indexer_dtype)\n", + " 35991 0.022 0.000 0.082 0.000 categorical.py:147(_maybe_to_categorical)\n", + " 9 0.000 0.000 0.004 0.000 categorical.py:1774(take_nd)\n", + " 18 0.000 0.000 0.000 0.000 categorical.py:1841(__len__)\n", + " 35982 0.111 0.000 1.120 0.000 categorical.py:1943(__getitem__)\n", + " 35991 0.073 0.000 0.932 0.000 categorical.py:267(__init__)\n", + " 35991 0.023 0.000 0.036 0.000 categorical.py:381(categories)\n", + " 35991 0.017 0.000 0.027 0.000 categorical.py:420(ordered)\n", + " 179982 0.026 0.000 0.026 0.000 categorical.py:425(dtype)\n", + " 35991 0.006 0.000 0.006 0.000 categorical.py:434(_constructor)\n", + " 4001 0.003 0.000 0.021 0.000 common.py:1043(is_datetime64_any_dtype)\n", + " 3 0.000 0.000 0.000 0.000 common.py:1170(is_datetime_or_timedelta_dtype)\n", + " 14 0.000 0.000 0.000 0.000 common.py:122(is_sparse)\n", + " 1 0.000 0.000 0.000 0.000 common.py:1294(is_datetimelike_v_numeric)\n", + " 3 0.000 0.000 0.000 0.000 common.py:1405(needs_i8_conversion)\n", + " 6 0.000 0.000 0.000 0.000 common.py:1527(is_float_dtype)\n", + " 6 0.000 0.000 0.000 0.000 common.py:1578(is_bool_dtype)\n", + " 43 0.000 0.000 0.000 0.000 common.py:1688(is_extension_array_dtype)\n", + " 1 0.000 0.000 0.000 0.000 common.py:1717(is_complex_dtype)\n", + " 8006 0.006 0.000 0.008 0.000 common.py:1784(_get_dtype)\n", + "12027/12026 0.012 0.000 0.019 0.000 common.py:1835(_get_dtype_type)\n", + " 35991 0.020 0.000 0.122 0.000 common.py:195(is_categorical)\n", + " 41 0.000 0.000 0.000 0.000 common.py:227(is_datetimetz)\n", + " 1 0.000 0.000 0.000 0.000 common.py:301(_asarray_tuplesafe)\n", + " 4003 0.004 0.000 0.013 0.000 common.py:332(is_datetime64_dtype)\n", + " 35982 0.019 0.000 0.023 0.000 common.py:359(is_null_slice)\n", + " 4048 0.002 0.000 0.005 0.000 common.py:369(is_datetime64tz_dtype)\n", + " 3999 0.002 0.000 0.003 0.000 common.py:395(_apply_if_callable)\n", + " 4003 0.003 0.000 0.011 0.000 common.py:407(is_timedelta64_dtype)\n", + " 6 0.000 0.000 0.000 0.000 common.py:444(is_period_dtype)\n", + " 8015 0.003 0.000 0.012 0.000 common.py:477(is_interval_dtype)\n", + " 43995 0.020 0.000 0.054 0.000 common.py:513(is_categorical_dtype)\n", + " 2 0.000 0.000 0.000 0.000 common.py:546(is_string_dtype)\n", + " 1 0.000 0.000 0.000 0.000 common.py:647(is_datetimelike)\n", + " 4002 0.003 0.000 0.011 0.000 common.py:692(is_dtype_equal)\n", + " 4004 0.005 0.000 0.009 0.000 common.py:858(is_signed_integer_dtype)\n", + " 5 0.000 0.000 0.000 0.000 common.py:89(is_object_dtype)\n", + " 5 0.000 0.000 0.000 0.000 common.py:907(is_unsigned_integer_dtype)\n", + " 71982 0.032 0.000 0.542 0.000 dtypes.py:137(__init__)\n", + " 71982 0.062 0.000 0.511 0.000 dtypes.py:156(_finalize)\n", + " 1 0.000 0.000 0.000 0.000 dtypes.py:266(construct_from_string)\n", + " 71982 0.064 0.000 0.148 0.000 dtypes.py:278(validate_ordered)\n", + " 71982 0.094 0.000 0.301 0.000 dtypes.py:298(validate_categories)\n", + " 36009 0.006 0.000 0.006 0.000 dtypes.py:30(__unicode__)\n", + " 36009 0.014 0.000 0.019 0.000 dtypes.py:33(__str__)\n", + " 35991 0.063 0.000 0.392 0.000 dtypes.py:331(update_dtype)\n", + " 71982 0.013 0.000 0.013 0.000 dtypes.py:363(categories)\n", + " 71982 0.010 0.000 0.010 0.000 dtypes.py:370(ordered)\n", + " 3 0.000 0.000 0.000 0.000 dtypes.py:401(__new__)\n", + " 3 0.000 0.000 0.000 0.000 dtypes.py:459(construct_from_string)\n", + " 6 0.000 0.000 0.000 0.000 dtypes.py:584(is_dtype)\n", + " 4015 0.005 0.000 0.009 0.000 dtypes.py:707(is_dtype)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:2664(__getitem__)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:2690(_getitem_column)\n", + " 3999 0.001 0.000 0.001 0.000 frame.py:320(_constructor)\n", + " 3999 0.013 0.000 0.026 0.000 frame.py:334(__init__)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:538(axes)\n", + " 3998 0.003 0.000 0.006 0.000 frame.py:844(__len__)\n", + " 3999 0.006 0.000 0.006 0.000 generic.py:124(__init__)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:1264(_check_label_or_level_ambiguity)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:1294()\n", + " 1 0.000 0.000 0.000 0.000 generic.py:1520(__contains__)\n", + " 3999 0.004 0.000 0.005 0.000 generic.py:178(_init_mgr)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:2484(_get_item_cache)\n", + " 3998 0.019 0.000 3.216 0.001 generic.py:2583(_slice)\n", + " 3999 0.005 0.000 0.009 0.000 generic.py:2603(_set_is_copy)\n", + " 1 0.000 0.000 0.101 0.101 generic.py:2783(_take)\n", + " 4002 0.004 0.000 0.005 0.000 generic.py:364(_get_axis_number)\n", + " 4001 0.004 0.000 0.007 0.000 generic.py:377(_get_axis_name)\n", + " 4001 0.003 0.000 0.011 0.000 generic.py:390(_get_axis)\n", + " 3999 0.003 0.000 0.008 0.000 generic.py:394(_get_block_manager_axis)\n", + " 3999 0.003 0.000 0.004 0.000 generic.py:4345(__finalize__)\n", + " 4001 0.004 0.000 0.004 0.000 generic.py:4378(__setattr__)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:438(_info_axis)\n", + " 2 0.000 0.000 0.001 0.001 generic.py:4423(_protect_consolidate)\n", + " 2 0.000 0.000 0.001 0.001 generic.py:4433(_consolidate_inplace)\n", + " 2 0.000 0.000 0.001 0.001 generic.py:4436(f)\n", + " 3999 0.002 0.000 0.004 0.000 generic.py:458(ndim)\n", + " 1 0.000 0.000 0.001 0.001 generic.py:6592(groupby)\n", + " 324113 0.097 0.000 0.163 0.000 generic.py:7(_check)\n", + " 1 0.000 0.000 0.001 0.001 groupby.py:2143(groupby)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:2196(__init__)\n", + " 3999 0.003 0.000 3.418 0.001 groupby.py:2217(get_iterator)\n", + " 1 0.000 0.000 0.012 0.012 groupby.py:2231(_get_splitter)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:2235(_get_group_keys)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:2295(levels)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:2297()\n", + " 1 0.000 0.000 0.012 0.012 groupby.py:2333(group_info)\n", + " 1 0.000 0.000 0.012 0.012 groupby.py:2350(_get_compressed_labels)\n", + " 1 0.000 0.000 0.012 0.012 groupby.py:2351()\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:2939(__init__)\n", + " 2 0.000 0.000 0.012 0.006 groupby.py:3067(labels)\n", + " 2 0.000 0.000 0.000 0.000 groupby.py:3089(group_index)\n", + " 1 0.000 0.000 0.012 0.012 groupby.py:3095(_make_labels)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:3114(_get_grouper)\n", + " 2 0.000 0.000 0.000 0.000 groupby.py:3228()\n", + " 2 0.000 0.000 0.000 0.000 groupby.py:3229()\n", + " 2 0.000 0.000 0.000 0.000 groupby.py:3230()\n", + " 2 0.000 0.000 0.000 0.000 groupby.py:3235()\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:3258(is_in_axis)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:3268(is_in_obj)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:3327(_is_label_like)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:3332(_convert_grouper)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:5021(__init__)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:5028(slabels)\n", + " 1 0.000 0.000 0.001 0.001 groupby.py:5033(sort_idx)\n", + " 3999 0.008 0.000 3.402 0.001 groupby.py:5038(__iter__)\n", + " 1 0.000 0.000 0.102 0.102 groupby.py:5057(_get_sorted_data)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:5075(__init__)\n", + " 3998 0.008 0.000 3.292 0.001 groupby.py:5092(_chop)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:5120(get_splitter)\n", + " 1 0.000 0.000 0.001 0.001 groupby.py:567(__init__)\n", + " 1 0.000 0.000 0.000 0.000 groupby.py:881(__iter__)\n", + " 3998 0.008 0.000 3.284 0.001 indexing.py:1463(__getitem__)\n", + " 3998 0.003 0.000 3.220 0.001 indexing.py:147(_slice)\n", + " 3998 0.007 0.000 3.270 0.001 indexing.py:2040(_get_slice_axis)\n", + " 3998 0.003 0.000 3.274 0.001 indexing.py:2075(_getitem_axis)\n", + " 1 0.000 0.000 0.000 0.000 indexing.py:2441(maybe_convert_indices)\n", + " 9 0.000 0.000 0.002 0.000 indexing.py:2484(validate_indices)\n", + " 3998 0.001 0.000 0.001 0.000 indexing.py:2564(need_slice)\n", + " 3998 0.010 0.000 0.042 0.000 indexing.py:263(_convert_slice_indexer)\n", + " 10 0.000 0.000 0.000 0.000 inference.py:251(is_list_like)\n", + " 10 0.000 0.000 0.000 0.000 inference.py:287(is_array_like)\n", + " 1 0.000 0.000 0.000 0.000 inference.py:415(is_hashable)\n", + " 47988 0.052 0.000 0.108 0.000 internals.py:116(__init__)\n", + " 3 0.000 0.000 0.095 0.032 internals.py:1237(take_nd)\n", + " 47988 0.013 0.000 0.013 0.000 internals.py:127(_check_ndim)\n", + " 95976 0.034 0.000 0.070 0.000 internals.py:166(_consolidate_key)\n", + " 35991 0.041 0.000 0.112 0.000 internals.py:1723(__init__)\n", + " 18 0.000 0.000 0.000 0.000 internals.py:1745(shape)\n", + " 35991 0.037 0.000 0.217 0.000 internals.py:1864(__init__)\n", + " 35991 0.013 0.000 0.068 0.000 internals.py:1868(_maybe_coerce_values)\n", + " 9 0.000 0.000 0.000 0.000 internals.py:1891(fill_value)\n", + " 9 0.000 0.000 0.004 0.000 internals.py:1947(take_nd)\n", + " 35982 0.062 0.000 1.213 0.000 internals.py:1976(_slice)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:203(internal_values)\n", + " 3 0.000 0.000 0.000 0.000 internals.py:229(fill_value)\n", + " 3999 0.006 0.000 0.024 0.000 internals.py:2298(__init__)\n", + " 156015 0.026 0.000 0.026 0.000 internals.py:233(mgr_locs)\n", + " 47988 0.018 0.000 0.024 0.000 internals.py:237(mgr_locs)\n", + " 35991 0.066 0.000 0.403 0.000 internals.py:2552(__init__)\n", + " 47988 0.033 0.000 0.535 0.000 internals.py:269(make_block_same_class)\n", + " 11994 0.009 0.000 0.009 0.000 internals.py:310(_slice)\n", + " 47988 0.043 0.000 0.502 0.000 internals.py:3191(make_block)\n", + " 4000 0.026 0.000 0.488 0.000 internals.py:3265(__init__)\n", + " 4000 0.004 0.000 0.008 0.000 internals.py:3266()\n", + " 47976 0.135 0.000 1.962 0.000 internals.py:328(getitem_block)\n", + " 16001 0.065 0.000 0.100 0.000 internals.py:3307(shape)\n", + " 48003 0.011 0.000 0.035 0.000 internals.py:3309()\n", + " 55998 0.013 0.000 0.017 0.000 internals.py:3311(ndim)\n", + " 7999 0.250 0.000 0.530 0.000 internals.py:3363(_rebuild_blknos_and_blklocs)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3384(_get_items)\n", + " 6 0.000 0.000 0.000 0.000 internals.py:348(shape)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3488(_verify_integrity)\n", + " 26 0.000 0.000 0.000 0.000 internals.py:3490()\n", + " 143992 0.043 0.000 0.059 0.000 internals.py:352(dtype)\n", + " 48012 0.026 0.000 0.116 0.000 internals.py:356(ftype)\n", + " 4003 0.001 0.000 0.001 0.000 internals.py:3776(is_consolidated)\n", + " 4001 0.010 0.000 0.141 0.000 internals.py:3784(_consolidate_check)\n", + " 4001 0.013 0.000 0.130 0.000 internals.py:3785()\n", + " 3998 0.025 0.000 3.150 0.001 internals.py:3869(get_slice)\n", + " 3998 0.020 0.000 1.982 0.000 internals.py:3879()\n", + " 2 0.000 0.000 0.001 0.001 internals.py:4085(consolidate)\n", + " 4001 0.010 0.000 0.433 0.000 internals.py:4101(_consolidate_inplace)\n", + " 1 0.000 0.000 0.100 0.100 internals.py:4388(reindex_indexer)\n", + " 1 0.000 0.000 0.100 0.100 internals.py:4423()\n", + " 1 0.000 0.000 0.101 0.101 internals.py:4518(take)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:4684(_block)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4718(dtype)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4745(internal_values)\n", + " 3999 0.037 0.000 0.192 0.000 internals.py:5057(_consolidate)\n", + " 95976 0.025 0.000 0.095 0.000 internals.py:5063()\n", + " 15996 0.005 0.000 0.007 0.000 internals.py:5074(_merge_blocks)\n", + " 15996 0.022 0.000 0.036 0.000 internals.py:5101(_extend_blocks)\n", + " 9 0.000 0.000 0.000 0.000 missing.py:112(_isna_new)\n", + " 9 0.000 0.000 0.000 0.000 missing.py:32(isna)\n", + " 1 0.000 0.000 0.000 0.000 missing.py:376(array_equivalent)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:2389(array_equal)\n", + " 4000 0.011 0.000 0.046 0.000 numeric.py:35(__new__)\n", + " 51 0.000 0.000 0.000 0.000 numeric.py:433(asarray)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:504(asanyarray)\n", + " 7996 0.032 0.000 0.056 0.000 numeric.py:630(require)\n", + " 3999 0.007 0.000 0.202 0.000 numeric.py:64(_shallow_copy)\n", + " 15992 0.007 0.000 0.009 0.000 numeric.py:701()\n", + " 1 0.000 0.000 0.000 0.000 range.py:169(_data)\n", + " 1 0.000 0.000 0.000 0.000 range.py:173(_int64index)\n", + " 1 0.000 0.000 0.000 0.000 range.py:260(_shallow_copy)\n", + " 1 0.000 0.000 0.000 0.000 range.py:315(equals)\n", + " 8 0.000 0.000 0.000 0.000 range.py:481(__len__)\n", + " 1 0.000 0.000 0.000 0.000 series.py:412(dtype)\n", + " 1 0.000 0.000 0.000 0.000 series.py:465(_values)\n", + " 1 0.000 0.000 0.001 0.001 sorting.py:321(get_group_index_sorter)\n", + " 4001 0.001 0.000 0.001 0.000 {built-in method __new__ of type object at 0x9cff80}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method builtins.all}\n", + " 3 0.000 0.000 0.000 0.000 {built-in method builtins.any}\n", + " 4000 0.001 0.000 0.001 0.000 {built-in method builtins.callable}\n", + " 1 0.000 0.000 3.477 3.477 {built-in method builtins.exec}\n", + " 416168 0.096 0.000 0.096 0.000 {built-in method builtins.getattr}\n", + " 276050 0.115 0.000 0.115 0.000 {built-in method builtins.hasattr}\n", + " 4 0.000 0.000 0.000 0.000 {built-in method builtins.hash}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method builtins.id}\n", + " 956308 0.220 0.000 0.383 0.000 {built-in method builtins.isinstance}\n", + " 24077 0.005 0.000 0.005 0.000 {built-in method builtins.issubclass}\n", + " 4002 0.002 0.000 0.002 0.000 {built-in method builtins.iter}\n", + "424055/348051 0.085 0.000 0.111 0.000 {built-in method builtins.len}\n", + " 8 0.000 0.000 0.000 0.000 {built-in method builtins.max}\n", + " 3998 0.002 0.000 0.002 0.000 {built-in method builtins.min}\n", + " 3999 0.017 0.000 0.068 0.000 {built-in method builtins.sorted}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method builtins.sum}\n", + " 95990 0.067 0.000 0.067 0.000 {built-in method numpy.core.multiarray.arange}\n", + " 8048 0.016 0.000 0.016 0.000 {built-in method numpy.core.multiarray.array}\n", + " 16012 0.047 0.000 0.047 0.000 {built-in method numpy.core.multiarray.empty}\n", + " 3999 0.001 0.000 0.001 0.000 {built-in method pandas._libs.algos.ensure_int16}\n", + " 17 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_int64}\n", + " 31992 0.006 0.000 0.006 0.000 {built-in method pandas._libs.algos.ensure_int8}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_object}\n", + " 3 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_platform_int}\n", + " 71992 0.009 0.000 0.009 0.000 {built-in method pandas._libs.lib.is_bool}\n", + " 13 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_float}\n", + " 12008 0.002 0.000 0.002 0.000 {built-in method pandas._libs.lib.is_integer}\n", + " 4007 0.009 0.000 0.009 0.000 {built-in method pandas._libs.lib.is_scalar}\n", + " 9 0.000 0.000 0.000 0.000 {built-in method pandas._libs.missing.checknull}\n", + " 1 0.000 0.000 0.000 0.000 {method 'all' of 'numpy.ndarray' objects}\n", + " 8003 0.005 0.000 0.047 0.000 {method 'any' of 'numpy.ndarray' objects}\n", + " 47990 0.005 0.000 0.005 0.000 {method 'append' of 'list' objects}\n", + " 2 0.003 0.001 0.003 0.001 {method 'argsort' of 'numpy.ndarray' objects}\n", + " 11 0.000 0.000 0.000 0.000 {method 'astype' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'copy' of 'dict' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}\n", + " 15998 0.010 0.000 0.010 0.000 {method 'fill' of 'numpy.ndarray' objects}\n", + " 48012 0.049 0.000 0.068 0.000 {method 'format' of 'str' objects}\n", + " 8019 0.002 0.000 0.002 0.000 {method 'get' of 'dict' objects}\n", + " 1 0.006 0.006 0.006 0.006 {method 'get_labels' of 'pandas._libs.hashtable.PyObjectHashTable' objects}\n", + " 8000 0.002 0.000 0.002 0.000 {method 'items' of 'dict' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects}\n", + " 9 0.000 0.000 0.001 0.000 {method 'max' of 'numpy.ndarray' objects}\n", + " 9 0.000 0.000 0.001 0.000 {method 'min' of 'numpy.ndarray' objects}\n", + " 2 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects}\n", + " 8022 0.040 0.000 0.040 0.000 {method 'reduce' of 'numpy.ufunc' objects}\n", + " 7999 0.002 0.000 0.002 0.000 {method 'rpartition' of 'str' objects}\n", + " 3 0.000 0.000 0.000 0.000 {method 'search' of '_sre.SRE_Pattern' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}\n", + " 5 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects}\n", + " 2 0.000 0.000 0.000 0.000 {method 'take' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'to_array' of 'pandas._libs.hashtable.ObjectVector' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'tolist' of 'numpy.ndarray' objects}\n", + " 3999 0.001 0.000 0.001 0.000 {method 'update' of 'dict' objects}\n", + " 7996 0.001 0.000 0.001 0.000 {method 'upper' of 'str' objects}\n", + " 6 0.000 0.000 0.000 0.000 {method 'view' of 'numpy.ndarray' objects}\n", + " 1 0.001 0.001 0.001 0.001 {pandas._libs.algos.groupsort_indexer}\n", + " 1 0.000 0.000 0.000 0.000 {pandas._libs.algos.take_1d_int16_int16}\n", + " 2 0.001 0.000 0.001 0.000 {pandas._libs.algos.take_1d_int64_int64}\n", + " 8 0.001 0.000 0.001 0.000 {pandas._libs.algos.take_1d_int8_int8}\n", + " 1 0.000 0.000 0.000 0.000 {pandas._libs.algos.take_2d_axis0_int64_int64}\n", + " 1 0.007 0.007 0.007 0.007 {pandas._libs.algos.take_2d_axis1_float64_float64}\n", + " 1 0.072 0.072 0.072 0.072 {pandas._libs.algos.take_2d_axis1_object_object}\n", + " 1 0.000 0.000 0.000 0.000 {pandas._libs.lib.generate_slices}\n", + " 2 0.002 0.001 0.002 0.001 {pandas._libs.lib.infer_dtype}\n", + " 2 0.000 0.000 0.000 0.000 {pandas._libs.lib.values_from_object}\n", + "\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 3003 function calls (2973 primitive calls) in 0.114 seconds\n", + "\n", + " Ordered by: standard name\n", + "\n", + " ncalls tottime percall cumtime percall filename:lineno(function)\n", + " 2 0.000 0.000 0.000 0.000 :416(parent)\n", + " 94 0.000 0.000 0.000 0.000 :997(_handle_fromlist)\n", + " 1 0.000 0.000 0.094 0.094 :9(after)\n", + " 1 0.020 0.020 0.114 0.114 :1()\n", + " 1 0.000 0.000 0.000 0.000 __init__.py:205(iteritems)\n", + " 9 0.000 0.000 0.009 0.001 _methods.py:26(_amax)\n", + " 9 0.000 0.000 0.005 0.001 _methods.py:30(_amin)\n", + " 5 0.000 0.000 0.000 0.000 _methods.py:42(_any)\n", + " 10 0.000 0.000 0.000 0.000 _weakrefset.py:70(__contains__)\n", + " 10 0.000 0.000 0.000 0.000 abc.py:180(__instancecheck__)\n", + " 12 0.000 0.000 0.000 0.000 algorithms.py:1421(_get_take_nd_function)\n", + " 9 0.000 0.000 0.017 0.002 algorithms.py:1454(take)\n", + " 12 0.001 0.000 0.073 0.006 algorithms.py:1548(take_nd)\n", + " 1 0.000 0.000 0.000 0.000 algorithms.py:48(_ensure_data)\n", + " 1 0.000 0.000 0.003 0.003 algorithms.py:774(duplicated)\n", + " 1 0.000 0.000 0.003 0.003 base.py:1245(duplicated)\n", + " 2 0.000 0.000 0.000 0.000 base.py:2033(__contains__)\n", + " 1 0.000 0.000 0.000 0.000 base.py:2179(take)\n", + " 1 0.000 0.000 0.000 0.000 base.py:2445(equals)\n", + " 1 0.000 0.000 0.000 0.000 base.py:255(__new__)\n", + " 1 0.000 0.000 0.000 0.000 base.py:473(_simple_new)\n", + " 3 0.000 0.000 0.000 0.000 base.py:4914(_ensure_index)\n", + " 1 0.000 0.000 0.000 0.000 base.py:520(_shallow_copy_with_infer)\n", + " 70 0.000 0.000 0.000 0.000 base.py:61(is_dtype)\n", + " 1 0.000 0.000 0.000 0.000 base.py:615(is_)\n", + " 1 0.000 0.000 0.000 0.000 base.py:635(_reset_identity)\n", + " 17 0.000 0.000 0.000 0.000 base.py:641(__len__)\n", + " 1 0.000 0.000 0.000 0.000 base.py:662(dtype)\n", + " 3 0.000 0.000 0.000 0.000 base.py:672(values)\n", + " 2 0.000 0.000 0.000 0.000 base.py:711(get_values)\n", + " 1 0.000 0.000 0.000 0.000 base.py:904(_coerce_to_ndarray)\n", + " 1 0.000 0.000 0.000 0.000 base.py:920(_get_attributes_dict)\n", + " 1 0.000 0.000 0.000 0.000 base.py:922()\n", + " 12 0.000 0.000 0.001 0.000 cast.py:257(maybe_promote)\n", + " 9 0.000 0.000 0.000 0.000 cast.py:600(coerce_indexer_dtype)\n", + " 1 0.000 0.000 0.000 0.000 cast.py:853(maybe_castable)\n", + " 9 0.000 0.000 0.000 0.000 categorical.py:147(_maybe_to_categorical)\n", + " 9 0.000 0.000 0.018 0.002 categorical.py:1774(take_nd)\n", + " 9 0.000 0.000 0.000 0.000 categorical.py:1841(__len__)\n", + " 9 0.000 0.000 0.001 0.000 categorical.py:267(__init__)\n", + " 9 0.000 0.000 0.000 0.000 categorical.py:381(categories)\n", + " 9 0.000 0.000 0.000 0.000 categorical.py:420(ordered)\n", + " 36 0.000 0.000 0.000 0.000 categorical.py:425(dtype)\n", + " 9 0.000 0.000 0.000 0.000 categorical.py:434(_constructor)\n", + " 1 0.000 0.000 0.000 0.000 common.py:100(is_bool_indexer)\n", + " 1 0.000 0.000 0.000 0.000 common.py:1043(is_datetime64_any_dtype)\n", + " 14 0.000 0.000 0.000 0.000 common.py:122(is_sparse)\n", + " 2 0.000 0.000 0.000 0.000 common.py:1527(is_float_dtype)\n", + " 2 0.000 0.000 0.000 0.000 common.py:1578(is_bool_dtype)\n", + " 36 0.000 0.000 0.000 0.000 common.py:1688(is_extension_array_dtype)\n", + " 8 0.000 0.000 0.000 0.000 common.py:1784(_get_dtype)\n", + " 9 0.000 0.000 0.000 0.000 common.py:1835(_get_dtype_type)\n", + " 9 0.000 0.000 0.000 0.000 common.py:195(is_categorical)\n", + " 37 0.000 0.000 0.000 0.000 common.py:227(is_datetimetz)\n", + " 1 0.000 0.000 0.000 0.000 common.py:332(is_datetime64_dtype)\n", + " 38 0.000 0.000 0.000 0.000 common.py:369(is_datetime64tz_dtype)\n", + " 2 0.000 0.000 0.000 0.000 common.py:395(_apply_if_callable)\n", + " 1 0.000 0.000 0.000 0.000 common.py:407(is_timedelta64_dtype)\n", + " 14 0.000 0.000 0.000 0.000 common.py:477(is_interval_dtype)\n", + " 11 0.000 0.000 0.000 0.000 common.py:513(is_categorical_dtype)\n", + " 4 0.000 0.000 0.000 0.000 common.py:692(is_dtype_equal)\n", + " 3 0.000 0.000 0.000 0.000 common.py:858(is_signed_integer_dtype)\n", + " 3 0.000 0.000 0.000 0.000 common.py:89(is_object_dtype)\n", + " 2 0.000 0.000 0.000 0.000 common.py:907(is_unsigned_integer_dtype)\n", + " 18 0.000 0.000 0.000 0.000 dtypes.py:137(__init__)\n", + " 18 0.000 0.000 0.000 0.000 dtypes.py:156(_finalize)\n", + " 18 0.000 0.000 0.000 0.000 dtypes.py:278(validate_ordered)\n", + " 18 0.000 0.000 0.000 0.000 dtypes.py:298(validate_categories)\n", + " 9 0.000 0.000 0.000 0.000 dtypes.py:30(__unicode__)\n", + " 9 0.000 0.000 0.000 0.000 dtypes.py:33(__str__)\n", + " 9 0.000 0.000 0.000 0.000 dtypes.py:331(update_dtype)\n", + " 18 0.000 0.000 0.000 0.000 dtypes.py:363(categories)\n", + " 18 0.000 0.000 0.000 0.000 dtypes.py:370(ordered)\n", + " 1 0.000 0.000 0.000 0.000 dtypes.py:401(__new__)\n", + " 1 0.000 0.000 0.000 0.000 dtypes.py:459(construct_from_string)\n", + " 13 0.000 0.000 0.000 0.000 dtypes.py:707(is_dtype)\n", + " 2 0.000 0.000 0.091 0.045 frame.py:2664(__getitem__)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:2690(_getitem_column)\n", + " 1 0.000 0.000 0.091 0.091 frame.py:2707(_getitem_array)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:320(_constructor)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:334(__init__)\n", + " 2 0.000 0.000 0.000 0.000 generic.py:124(__init__)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:1490(__hash__)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:178(_init_mgr)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:2484(_get_item_cache)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:2603(_set_is_copy)\n", + " 1 0.000 0.000 0.090 0.090 generic.py:2783(_take)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:364(_get_axis_number)\n", + " 2 0.000 0.000 0.000 0.000 generic.py:377(_get_axis_name)\n", + " 2 0.000 0.000 0.000 0.000 generic.py:390(_get_axis)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:394(_get_block_manager_axis)\n", + " 2 0.000 0.000 0.000 0.000 generic.py:4345(__finalize__)\n", + " 5 0.000 0.000 0.000 0.000 generic.py:4362(__getattr__)\n", + " 3 0.000 0.000 0.000 0.000 generic.py:4378(__setattr__)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:4423(_protect_consolidate)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:4433(_consolidate_inplace)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:4436(f)\n", + " 246 0.000 0.000 0.000 0.000 generic.py:7(_check)\n", + " 1 0.000 0.000 0.000 0.000 indexing.py:2321(convert_to_index_sliceable)\n", + " 1 0.000 0.000 0.000 0.000 indexing.py:2345(check_bool_indexer)\n", + " 1 0.000 0.000 0.001 0.001 indexing.py:2441(maybe_convert_indices)\n", + " 9 0.000 0.000 0.014 0.002 indexing.py:2484(validate_indices)\n", + " 10 0.000 0.000 0.000 0.000 inference.py:251(is_list_like)\n", + " 9 0.000 0.000 0.000 0.000 inference.py:287(is_array_like)\n", + " 1 0.000 0.000 0.000 0.000 inference.py:415(is_hashable)\n", + " 13 0.000 0.000 0.000 0.000 internals.py:116(__init__)\n", + " 3 0.000 0.000 0.070 0.023 internals.py:1237(take_nd)\n", + " 13 0.000 0.000 0.000 0.000 internals.py:127(_check_ndim)\n", + " 9 0.000 0.000 0.000 0.000 internals.py:1723(__init__)\n", + " 9 0.000 0.000 0.000 0.000 internals.py:1745(shape)\n", + " 9 0.000 0.000 0.000 0.000 internals.py:1864(__init__)\n", + " 9 0.000 0.000 0.000 0.000 internals.py:1868(_maybe_coerce_values)\n", + " 9 0.000 0.000 0.000 0.000 internals.py:1891(fill_value)\n", + " 9 0.000 0.000 0.018 0.002 internals.py:1947(take_nd)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:222(to_dense)\n", + " 3 0.000 0.000 0.000 0.000 internals.py:229(fill_value)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:2298(__init__)\n", + " 49 0.000 0.000 0.000 0.000 internals.py:233(mgr_locs)\n", + " 13 0.000 0.000 0.000 0.000 internals.py:237(mgr_locs)\n", + " 9 0.000 0.000 0.000 0.000 internals.py:2552(__init__)\n", + " 12 0.000 0.000 0.000 0.000 internals.py:269(make_block_same_class)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3148(get_block_type)\n", + " 13 0.000 0.000 0.000 0.000 internals.py:3191(make_block)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3265(__init__)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3266()\n", + " 4 0.000 0.000 0.000 0.000 internals.py:3307(shape)\n", + " 12 0.000 0.000 0.000 0.000 internals.py:3309()\n", + " 13 0.000 0.000 0.000 0.000 internals.py:3311(ndim)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3363(_rebuild_blknos_and_blklocs)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3384(_get_items)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3473(__len__)\n", + " 3 0.000 0.000 0.000 0.000 internals.py:348(shape)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3488(_verify_integrity)\n", + " 13 0.000 0.000 0.000 0.000 internals.py:3490()\n", + " 22 0.000 0.000 0.000 0.000 internals.py:352(dtype)\n", + " 12 0.000 0.000 0.000 0.000 internals.py:356(ftype)\n", + " 3 0.000 0.000 0.000 0.000 internals.py:3776(is_consolidated)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3784(_consolidate_check)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3785()\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4085(consolidate)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:4101(_consolidate_inplace)\n", + " 1 0.000 0.000 0.089 0.089 internals.py:4388(reindex_indexer)\n", + " 1 0.000 0.000 0.089 0.089 internals.py:4423()\n", + " 1 0.000 0.000 0.090 0.090 internals.py:4518(take)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4639(__init__)\n", + " 9 0.000 0.000 0.000 0.000 internals.py:4684(_block)\n", + " 7 0.000 0.000 0.000 0.000 internals.py:4718(dtype)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:4752(get_values)\n", + " 9 0.000 0.000 0.000 0.000 missing.py:112(_isna_new)\n", + " 9 0.000 0.000 0.000 0.000 missing.py:32(isna)\n", + " 1 0.000 0.000 0.000 0.000 missing.py:376(array_equivalent)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:110(is_all_dates)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:35(__new__)\n", + " 43 0.000 0.000 0.000 0.000 numeric.py:433(asarray)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:504(asanyarray)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:64(_shallow_copy)\n", + " 1 0.000 0.000 0.000 0.000 range.py:260(_shallow_copy)\n", + " 2 0.000 0.000 0.000 0.000 range.py:315(equals)\n", + " 10 0.000 0.000 0.000 0.000 range.py:481(__len__)\n", + " 1 0.000 0.000 0.003 0.003 series.py:1577(duplicated)\n", + " 1 0.000 0.000 0.000 0.000 series.py:166(__init__)\n", + " 1 0.000 0.000 0.000 0.000 series.py:349(_constructor)\n", + " 1 0.000 0.000 0.000 0.000 series.py:365(_set_axis)\n", + " 1 0.000 0.000 0.000 0.000 series.py:391(_set_subtyp)\n", + " 2 0.000 0.000 0.000 0.000 series.py:401(name)\n", + " 1 0.000 0.000 0.000 0.000 series.py:4019(_sanitize_array)\n", + " 1 0.000 0.000 0.000 0.000 series.py:4036(_try_cast)\n", + " 2 0.000 0.000 0.000 0.000 series.py:405(name)\n", + " 7 0.000 0.000 0.000 0.000 series.py:412(dtype)\n", + " 2 0.000 0.000 0.000 0.000 series.py:476(get_values)\n", + " 1 0.000 0.000 0.000 0.000 series.py:562(__len__)\n", + " 2 0.000 0.000 0.000 0.000 series.py:637(__array__)\n", + " 1 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x9cff80}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method builtins.callable}\n", + " 1 0.000 0.000 0.114 0.114 {built-in method builtins.exec}\n", + " 323 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}\n", + " 160 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}\n", + " 3 0.000 0.000 0.000 0.000 {built-in method builtins.hash}\n", + " 523 0.000 0.000 0.001 0.000 {built-in method builtins.isinstance}\n", + " 66 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method builtins.iter}\n", + " 164/136 0.000 0.000 0.000 0.000 {built-in method builtins.len}\n", + " 10 0.000 0.000 0.000 0.000 {built-in method builtins.max}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method builtins.sum}\n", + " 12 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.arange}\n", + " 46/44 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.array}\n", + " 14 0.015 0.001 0.015 0.001 {built-in method numpy.core.multiarray.empty}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_int16}\n", + " 12 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_int64}\n", + " 8 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_int8}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_object}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_platform_int}\n", + " 27 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_bool}\n", + " 12 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_float}\n", + " 10 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_integer}\n", + " 9 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_scalar}\n", + " 9 0.000 0.000 0.000 0.000 {built-in method pandas._libs.missing.checknull}\n", + " 5 0.000 0.000 0.000 0.000 {method 'any' of 'numpy.ndarray' objects}\n", + " 9 0.000 0.000 0.000 0.000 {method 'astype' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}\n", + " 2 0.000 0.000 0.000 0.000 {method 'fill' of 'numpy.ndarray' objects}\n", + " 14 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}\n", + " 16 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}\n", + " 2 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}\n", + " 9 0.000 0.000 0.009 0.001 {method 'max' of 'numpy.ndarray' objects}\n", + " 9 0.000 0.000 0.005 0.001 {method 'min' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'nonzero' of 'numpy.ndarray' objects}\n", + " 23 0.014 0.001 0.014 0.001 {method 'reduce' of 'numpy.ufunc' objects}\n", + " 2 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'search' of '_sre.SRE_Pattern' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'take' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}\n", + " 5 0.000 0.000 0.000 0.000 {method 'view' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {pandas._libs.algos.take_1d_int16_int16}\n", + " 8 0.001 0.000 0.001 0.000 {pandas._libs.algos.take_1d_int8_int8}\n", + " 1 0.000 0.000 0.000 0.000 {pandas._libs.algos.take_2d_axis0_int64_int64}\n", + " 1 0.007 0.007 0.007 0.007 {pandas._libs.algos.take_2d_axis1_float64_float64}\n", + " 1 0.047 0.047 0.047 0.047 {pandas._libs.algos.take_2d_axis1_object_object}\n", + " 1 0.003 0.003 0.003 0.003 {pandas._libs.hashtable.duplicated_object}\n", + " 2 0.000 0.000 0.000 0.000 {pandas._libs.lib.values_from_object}\n", + "\n", + "\n" + ] + } + ], + "source": [ + "# optimize performance by using built in functions\n", + "\n", + "import cProfile\n", + "\n", + "def before(df):\n", + " duplicated_data = [group for _, group in df.groupby('Salary') if len(group) > 1]\n", + "\n", + "\n", + "def after(df):\n", + " duplicated_data = df[df['Salary'].duplicated(keep=False)]\n", + "\n", + "\n", + "cProfile.run(\"before(df)\")\n", + "cProfile.run(\"after(df)\")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2.06 s ± 15.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + } + ], + "source": [ + "%timeit before(df)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "91.7 ms ± 1.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "%timeit after(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Search for smart alternative" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 24422831 function calls (23928513 primitive calls) in 13.653 seconds\n", + "\n", + " Ordered by: standard name\n", + "\n", + " ncalls tottime percall cumtime percall filename:lineno(function)\n", + " 6 0.000 0.000 0.000 0.000 :416(parent)\n", + " 296623 0.156 0.000 0.348 0.000 :997(_handle_fromlist)\n", + " 1 0.049 0.049 13.653 13.653 :3(before)\n", + " 98855 0.221 0.000 4.778 0.000 :4()\n", + " 1 0.000 0.000 13.653 13.653 :1()\n", + " 1 0.002 0.002 0.002 0.002 __init__.py:124(lrange)\n", + " 4 0.000 0.000 0.000 0.000 __init__.py:205(iteritems)\n", + " 1 0.000 0.000 0.000 0.000 _decorators.py:136(wrapper)\n", + " 1 0.000 0.000 0.000 0.000 _methods.py:42(_any)\n", + " 2 0.000 0.000 0.000 0.000 _methods.py:45(_all)\n", + " 197721 0.097 0.000 0.097 0.000 _weakrefset.py:70(__contains__)\n", + " 197719 0.118 0.000 0.215 0.000 abc.py:180(__instancecheck__)\n", + " 10 0.000 0.000 0.000 0.000 algorithms.py:1421(_get_take_nd_function)\n", + " 10 0.000 0.000 0.004 0.000 algorithms.py:1548(take_nd)\n", + " 1 0.000 0.000 13.603 13.603 apply.py:105(get_result)\n", + " 1 0.000 0.000 0.000 0.000 apply.py:14(frame_apply)\n", + " 1 0.001 0.001 13.603 13.603 apply.py:219(apply_standard)\n", + " 1 0.235 0.235 13.551 13.551 apply.py:253(apply_series_generator)\n", + " 1 0.000 0.000 0.050 0.050 apply.py:293(wrap_results)\n", + " 1 0.000 0.000 0.000 0.000 apply.py:34(__init__)\n", + " 1 0.000 0.000 0.195 0.195 apply.py:363(series_generator)\n", + " 98856 0.197 0.000 8.310 0.000 apply.py:366()\n", + " 1 0.000 0.000 0.000 0.000 apply.py:370(result_index)\n", + " 1 0.000 0.000 0.000 0.000 apply.py:374(result_columns)\n", + " 98857 0.056 0.000 0.056 0.000 apply.py:85(columns)\n", + " 2 0.000 0.000 0.000 0.000 apply.py:89(index)\n", + " 1 0.000 0.000 0.194 0.194 apply.py:93(values)\n", + " 1 0.000 0.000 0.000 0.000 apply.py:97(dtypes)\n", + " 197710 0.099 0.000 0.367 0.000 base.py:1590(is_object)\n", + " 197710 0.209 0.000 0.611 0.000 base.py:1647(_convert_scalar_indexer)\n", + " 197712 0.161 0.000 0.196 0.000 base.py:2033(__contains__)\n", + " 1 0.000 0.000 0.000 0.000 base.py:2053(contains)\n", + " 2 0.000 0.000 0.000 0.000 base.py:2067(__getitem__)\n", + " 197710 0.145 0.000 0.708 0.000 base.py:2101(_can_hold_identifiers_and_holds_name)\n", + " 5/3 0.000 0.000 0.006 0.002 base.py:255(__new__)\n", + " 1 0.000 0.000 0.000 0.000 base.py:3071(get_loc)\n", + " 197710 0.540 0.000 2.931 0.000 base.py:3090(get_value)\n", + " 1 0.000 0.000 0.000 0.000 base.py:4117(_maybe_cast_indexer)\n", + " 1 0.000 0.000 0.000 0.000 base.py:4355(insert)\n", + " 4 0.000 0.000 0.000 0.000 base.py:444()\n", + " 3 0.000 0.000 0.000 0.000 base.py:473(_simple_new)\n", + " 98862 0.036 0.000 0.056 0.000 base.py:4914(_ensure_index)\n", + " 1 0.000 0.000 0.000 0.000 base.py:520(_shallow_copy_with_infer)\n", + " 395751 0.201 0.000 0.307 0.000 base.py:61(is_dtype)\n", + " 3 0.000 0.000 0.000 0.000 base.py:635(_reset_identity)\n", + " 494294 0.153 0.000 0.206 0.000 base.py:641(__len__)\n", + " 3 0.000 0.000 0.000 0.000 base.py:647(__array__)\n", + " 17 0.000 0.000 0.000 0.000 base.py:672(values)\n", + " 7 0.000 0.000 0.000 0.000 base.py:677(_values)\n", + " 1 0.000 0.000 0.000 0.000 base.py:789(_ndarray_values)\n", + " 1 0.002 0.002 0.003 0.003 base.py:850(_try_convert_to_int_index)\n", + " 2 0.000 0.000 0.000 0.000 base.py:893(tolist)\n", + " 1 0.000 0.000 0.000 0.000 base.py:904(_coerce_to_ndarray)\n", + " 3 0.000 0.000 0.002 0.001 base.py:912(__iter__)\n", + " 2 0.000 0.000 0.000 0.000 base.py:920(_get_attributes_dict)\n", + " 2 0.000 0.000 0.000 0.000 base.py:922()\n", + " 1 0.000 0.000 0.000 0.000 base.py:936(_coerce_scalar_to_index)\n", + " 1 0.000 0.000 0.001 0.001 cast.py:1093(find_common_type)\n", + " 2 0.000 0.000 0.001 0.000 cast.py:1118()\n", + " 2 0.000 0.000 0.000 0.000 cast.py:1121()\n", + " 2 0.001 0.001 0.001 0.001 cast.py:1207(construct_1d_object_array_from_listlike)\n", + " 98856 0.046 0.000 0.074 0.000 cast.py:1232(construct_1d_ndarray_preserving_na)\n", + " 9 0.000 0.000 0.000 0.000 cast.py:257(maybe_promote)\n", + " 1 0.000 0.000 0.003 0.003 cast.py:44(maybe_convert_platform)\n", + " 98858 0.095 0.000 0.095 0.000 cast.py:853(maybe_castable)\n", + " 98855 0.320 0.000 1.222 0.000 cast.py:867(maybe_infer_to_datetimelike)\n", + " 98857 0.354 0.000 1.810 0.000 cast.py:971(maybe_cast_to_datetime)\n", + " 9 0.000 0.000 0.004 0.000 categorical.py:1248(__array__)\n", + " 9 0.000 0.000 0.000 0.000 categorical.py:381(categories)\n", + " 27 0.000 0.000 0.000 0.000 categorical.py:425(dtype)\n", + " 4 0.000 0.000 0.000 0.000 common.py:1043(is_datetime64_any_dtype)\n", + " 2 0.000 0.000 0.000 0.000 common.py:1170(is_datetime_or_timedelta_dtype)\n", + " 197849 0.064 0.000 0.467 0.000 common.py:122(is_sparse)\n", + " 1 0.000 0.000 0.000 0.000 common.py:1405(needs_i8_conversion)\n", + " 5 0.000 0.000 0.000 0.000 common.py:1527(is_float_dtype)\n", + " 4 0.000 0.000 0.000 0.000 common.py:1578(is_bool_dtype)\n", + " 98988 0.093 0.000 0.899 0.000 common.py:1629(is_extension_type)\n", + " 98891 0.126 0.000 0.485 0.000 common.py:1688(is_extension_array_dtype)\n", + " 5 0.000 0.000 0.000 0.000 common.py:1784(_get_dtype)\n", + " 296604 0.217 0.000 0.320 0.000 common.py:1835(_get_dtype_type)\n", + " 197844 0.100 0.000 0.586 0.000 common.py:195(is_categorical)\n", + " 2 0.000 0.000 0.000 0.000 common.py:1965(pandas_dtype)\n", + " 197869 0.094 0.000 0.523 0.000 common.py:227(is_datetimetz)\n", + " 5 0.000 0.000 0.001 0.000 common.py:301(_asarray_tuplesafe)\n", + " 8 0.000 0.000 0.000 0.000 common.py:332(is_datetime64_dtype)\n", + " 197877 0.086 0.000 0.235 0.000 common.py:369(is_datetime64tz_dtype)\n", + " 197711 0.058 0.000 0.079 0.000 common.py:395(_apply_if_callable)\n", + " 7 0.000 0.000 0.000 0.000 common.py:407(is_timedelta64_dtype)\n", + " 1 0.000 0.000 0.000 0.000 common.py:444(is_period_dtype)\n", + " 21 0.000 0.000 0.000 0.000 common.py:477(is_interval_dtype)\n", + " 197858 0.096 0.000 0.254 0.000 common.py:513(is_categorical_dtype)\n", + " 1 0.000 0.000 0.000 0.000 common.py:546(is_string_dtype)\n", + " 2 0.000 0.000 0.000 0.000 common.py:647(is_datetimelike)\n", + " 2 0.000 0.000 0.001 0.000 common.py:692(is_dtype_equal)\n", + " 2 0.000 0.000 0.000 0.000 common.py:811(is_integer_dtype)\n", + " 3 0.000 0.000 0.000 0.000 common.py:858(is_signed_integer_dtype)\n", + " 395424 0.206 0.000 0.568 0.000 common.py:89(is_object_dtype)\n", + " 4 0.000 0.000 0.000 0.000 common.py:907(is_unsigned_integer_dtype)\n", + " 1 0.000 0.000 0.000 0.000 common.py:995(is_int_or_datetime_dtype)\n", + " 2 0.000 0.000 0.001 0.000 dtypes.py:172(__hash__)\n", + " 1 0.000 0.000 0.001 0.001 dtypes.py:183(__eq__)\n", + " 2 0.000 0.000 0.001 0.000 dtypes.py:227(_hash_categories)\n", + " 2 0.000 0.000 0.000 0.000 dtypes.py:241()\n", + " 4 0.000 0.000 0.000 0.000 dtypes.py:266(construct_from_string)\n", + " 16 0.000 0.000 0.000 0.000 dtypes.py:363(categories)\n", + " 5 0.000 0.000 0.000 0.000 dtypes.py:370(ordered)\n", + " 1 0.000 0.000 0.000 0.000 dtypes.py:584(is_dtype)\n", + " 2 0.000 0.000 0.000 0.000 dtypes.py:675(construct_from_string)\n", + " 18 0.000 0.000 0.000 0.000 dtypes.py:707(is_dtype)\n", + " 1 0.000 0.000 0.001 0.001 frame.py:3105(__setitem__)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:3165(_ensure_valid_index)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:3182(_set_item)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:3324(_sanitize_column)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:3344(reindexer)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:555(shape)\n", + " 1 0.000 0.000 13.603 13.603 frame.py:5837(apply)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:844(__len__)\n", + " 2 0.000 0.000 0.000 0.000 fromnumeric.py:1471(ravel)\n", + " 1 0.000 0.000 0.000 0.000 function.py:38(__call__)\n", + " 2 0.000 0.000 0.000 0.000 function_base.py:4476(append)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:1141(__invert__)\n", + " 98861 0.118 0.000 0.118 0.000 generic.py:124(__init__)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:164(_validate_dtype)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:2577(_clear_item_cache)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:2599(_set_item)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:2633(_check_setitem_copy)\n", + " 2 0.000 0.000 0.000 0.000 generic.py:364(_get_axis_number)\n", + " 3 0.000 0.000 0.000 0.000 generic.py:4345(__finalize__)\n", + " 296571 0.425 0.000 4.647 0.000 generic.py:4362(__getattr__)\n", + " 98863 0.204 0.000 0.531 0.000 generic.py:4378(__setattr__)\n", + " 197711 0.076 0.000 0.136 0.000 generic.py:438(_info_axis)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:4423(_protect_consolidate)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:4433(_consolidate_inplace)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:4436(f)\n", + " 1 0.000 0.000 0.194 0.194 generic.py:4563(values)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:4765(dtypes)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:4890(astype)\n", + " 1483499 0.447 0.000 0.971 0.000 generic.py:7(_check)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:9675(logical_func)\n", + " 2 0.000 0.000 0.000 0.000 hashing.py:23(_combine_hash_arrays)\n", + " 2 0.000 0.000 0.000 0.000 hashing.py:230(hash_array)\n", + " 1 0.000 0.000 0.000 0.000 indexing.py:2321(convert_to_index_sliceable)\n", + " 4 0.000 0.000 0.000 0.000 inference.py:119(is_iterator)\n", + " 197719 0.099 0.000 0.464 0.000 inference.py:251(is_list_like)\n", + " 1 0.000 0.000 0.000 0.000 inference.py:364(is_dict_like)\n", + " 98855 0.039 0.000 0.059 0.000 inference.py:415(is_hashable)\n", + " 1 0.000 0.000 0.000 0.000 inference.py:447(is_sequence)\n", + " 1 0.000 0.000 0.000 0.000 inspect.py:73(isclass)\n", + " 98861 0.189 0.000 0.387 0.000 internals.py:116(__init__)\n", + " 98861 0.039 0.000 0.039 0.000 internals.py:127(_check_ndim)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:184(is_categorical_astype)\n", + " 9 0.000 0.000 0.005 0.001 internals.py:1937(get_values)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:199(external_values)\n", + " 197712 0.033 0.000 0.033 0.000 internals.py:203(internal_values)\n", + " 3 0.000 0.000 0.084 0.028 internals.py:213(get_values)\n", + " 197711 0.082 0.000 0.171 0.000 internals.py:222(to_dense)\n", + " 98857 0.151 0.000 0.553 0.000 internals.py:2298(__init__)\n", + " 98874 0.019 0.000 0.019 0.000 internals.py:233(mgr_locs)\n", + " 98861 0.086 0.000 0.102 0.000 internals.py:237(mgr_locs)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:269(make_block_same_class)\n", + " 98860 0.315 0.000 1.621 0.000 internals.py:3148(get_block_type)\n", + " 98861 0.138 0.000 2.312 0.000 internals.py:3191(make_block)\n", + " 3 0.000 0.000 0.000 0.000 internals.py:3307(shape)\n", + " 9 0.000 0.000 0.000 0.000 internals.py:3309()\n", + " 3 0.000 0.000 0.000 0.000 internals.py:3311(ndim)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3315(set_axis)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3351(_is_single_block)\n", + " 6 0.000 0.000 0.000 0.000 internals.py:3384(_get_items)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3404(get_dtypes)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3405()\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3473(__len__)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3500(apply)\n", + " 197736 0.049 0.000 0.049 0.000 internals.py:352(dtype)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3561()\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3713(astype)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3776(is_consolidated)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3789(is_mixed_type)\n", + " 1 0.000 0.000 0.194 0.194 internals.py:3922(as_array)\n", + " 1 0.074 0.074 0.194 0.194 internals.py:3953(_interleave)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4085(consolidate)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4101(_consolidate_inplace)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4208(set)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4323(insert)\n", + " 98860 0.211 0.000 2.637 0.000 internals.py:4639(__init__)\n", + " 593135 0.118 0.000 0.118 0.000 internals.py:4684(_block)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4709(index)\n", + " 197711 0.114 0.000 0.203 0.000 internals.py:4718(dtype)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4742(external_values)\n", + " 197712 0.128 0.000 0.209 0.000 internals.py:4745(internal_values)\n", + " 197711 0.173 0.000 0.430 0.000 internals.py:4752(get_values)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:4774(_consolidate_inplace)\n", + " 1 0.000 0.000 0.001 0.001 internals.py:5044(_interleaved_dtype)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:5048()\n", + " 1 0.000 0.000 0.000 0.000 internals.py:5101(_extend_blocks)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:573(astype)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:577(_astype)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:5880(_fast_count_smallints)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:774(copy)\n", + " 2 0.000 0.000 0.000 0.000 missing.py:112(_isna_new)\n", + " 1 0.000 0.000 0.000 0.000 missing.py:189(_isna_ndarraylike)\n", + " 2 0.000 0.000 0.000 0.000 missing.py:32(isna)\n", + " 1 0.000 0.000 0.000 0.000 nanops.py:179(_get_fill_value)\n", + " 1 0.000 0.000 0.000 0.000 nanops.py:202(_get_values)\n", + " 1 0.000 0.000 0.000 0.000 nanops.py:256(_na_ok_dtype)\n", + " 1 0.000 0.000 0.000 0.000 nanops.py:260(_view_if_needed)\n", + " 1 0.000 0.000 0.000 0.000 nanops.py:318(nanany)\n", + " 5 0.000 0.000 0.000 0.000 numeric.py:110(is_all_dates)\n", + " 4 0.000 0.000 0.000 0.000 numeric.py:2491(seterr)\n", + " 4 0.000 0.000 0.000 0.000 numeric.py:2592(geterr)\n", + " 2 0.000 0.000 0.000 0.000 numeric.py:2887(__init__)\n", + " 2 0.000 0.000 0.000 0.000 numeric.py:2891(__enter__)\n", + " 2 0.000 0.000 0.000 0.000 numeric.py:2896(__exit__)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:35(__new__)\n", + " 27/18 0.000 0.000 0.006 0.000 numeric.py:433(asarray)\n", + " 5 0.000 0.000 0.000 0.000 numeric.py:504(asanyarray)\n", + " 2 0.000 0.000 0.000 0.000 numeric.py:94(zeros_like)\n", + " 4 0.000 0.000 0.000 0.000 numerictypes.py:619(issubclass_)\n", + " 2 0.000 0.000 0.000 0.000 numerictypes.py:687(issubdtype)\n", + " 1 0.000 0.000 0.002 0.002 range.py:257(tolist)\n", + " 1 0.000 0.000 0.000 0.000 range.py:315(equals)\n", + " 12 0.000 0.000 0.000 0.000 range.py:481(__len__)\n", + "98861/98860 0.629 0.000 8.107 0.000 series.py:166(__init__)\n", + " 1 0.040 0.040 0.049 0.049 series.py:284(_init_dict)\n", + " 1 0.000 0.000 0.001 0.001 series.py:3069(apply)\n", + " 1 0.000 0.000 0.000 0.000 series.py:3203(_reduce)\n", + " 3 0.000 0.000 0.000 0.000 series.py:349(_constructor)\n", + " 98862 0.102 0.000 0.143 0.000 series.py:365(_set_axis)\n", + " 98862 0.041 0.000 0.041 0.000 series.py:391(_set_subtyp)\n", + " 197719 0.124 0.000 0.214 0.000 series.py:401(name)\n", + " 98859 0.290 0.000 3.551 0.000 series.py:4019(_sanitize_array)\n", + " 98858 0.204 0.000 3.098 0.000 series.py:4036(_try_cast)\n", + " 98864 0.076 0.000 0.135 0.000 series.py:405(name)\n", + " 197711 0.097 0.000 0.301 0.000 series.py:412(dtype)\n", + " 1 0.000 0.000 0.000 0.000 series.py:432(values)\n", + " 197712 0.093 0.000 0.302 0.000 series.py:465(_values)\n", + " 197711 0.080 0.000 0.510 0.000 series.py:476(get_values)\n", + " 1 0.000 0.000 0.000 0.000 series.py:562(__len__)\n", + " 1 0.000 0.000 0.000 0.000 series.py:643(__array_wrap__)\n", + " 197710 0.340 0.000 3.379 0.000 series.py:764(__getitem__)\n", + " 1 0.000 0.000 0.000 0.000 shape_base.py:63(atleast_2d)\n", + " 3 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x9cff80}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method _operator.inv}\n", + " 4 0.000 0.000 0.001 0.000 {built-in method builtins.all}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method builtins.any}\n", + " 197711 0.021 0.000 0.021 0.000 {built-in method builtins.callable}\n", + " 1 0.000 0.000 13.653 13.653 {built-in method builtins.exec}\n", + " 2571249 0.852 0.000 1.154 0.000 {built-in method builtins.getattr}\n", + " 395544 0.182 0.000 0.182 0.000 {built-in method builtins.hasattr}\n", + " 296570 0.055 0.000 0.056 0.000 {built-in method builtins.hash}\n", + " 4449733 1.121 0.000 2.308 0.000 {built-in method builtins.isinstance}\n", + " 1087532 0.139 0.000 0.139 0.000 {built-in method builtins.issubclass}\n", + " 10 0.000 0.000 0.000 0.000 {built-in method builtins.iter}\n", + "1482938/988641 0.308 0.000 0.461 0.000 {built-in method builtins.len}\n", + " 12 0.000 0.000 0.000 0.000 {built-in method builtins.max}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method builtins.next}\n", + "395457/395448 0.116 0.000 0.121 0.000 {built-in method numpy.core.multiarray.array}\n", + " 3 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.concatenate}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.copyto}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.empty_like}\n", + " 14 0.032 0.002 0.032 0.002 {built-in method numpy.core.multiarray.empty}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.putmask}\n", + " 3 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.zeros}\n", + " 8 0.000 0.000 0.000 0.000 {built-in method numpy.core.umath.geterrobj}\n", + " 4 0.000 0.000 0.000 0.000 {built-in method numpy.core.umath.seterrobj}\n", + " 10 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_int64}\n", + " 98855 0.016 0.000 0.016 0.000 {built-in method pandas._libs.algos.ensure_object}\n", + " 98855 0.043 0.000 0.043 0.000 {built-in method pandas._libs.lib.infer_datetimelike_array}\n", + " 197720 0.029 0.000 0.029 0.000 {built-in method pandas._libs.lib.is_float}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_integer}\n", + " 197717 0.029 0.000 0.029 0.000 {built-in method pandas._libs.lib.is_scalar}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method pandas._libs.missing.checknull}\n", + " 2 0.000 0.000 0.000 0.000 {method 'all' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'any' of 'numpy.ndarray' objects}\n", + " 98857 0.012 0.000 0.012 0.000 {method 'append' of 'list' objects}\n", + " 4 0.085 0.021 0.085 0.021 {method 'astype' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'clear' of 'dict' objects}\n", + " 3 0.000 0.000 0.000 0.000 {method 'copy' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}\n", + " 2 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}\n", + " 12 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}\n", + " 2 0.000 0.000 0.000 0.000 {method 'get_loc' of 'pandas._libs.index.IndexEngine' objects}\n", + " 197710 0.232 0.000 0.232 0.000 {method 'get_value' of 'pandas._libs.index.IndexEngine' objects}\n", + " 4 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}\n", + " 3 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects}\n", + " 2 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}\n", + " 5 0.000 0.000 0.000 0.000 {method 'reduce' of 'numpy.ufunc' objects}\n", + " 9 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}\n", + " 6 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects}\n", + " 2 0.000 0.000 0.000 0.000 {method 'tolist' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'transpose' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}\n", + " 197731 0.089 0.000 0.089 0.000 {method 'view' of 'numpy.ndarray' objects}\n", + " 10 0.003 0.000 0.003 0.000 {pandas._libs.algos.take_1d_object_object}\n", + " 2 0.000 0.000 0.000 0.000 {pandas._libs.hashing.hash_object_array}\n", + " 2 0.001 0.000 0.001 0.000 {pandas._libs.lib.infer_dtype}\n", + " 1 0.000 0.000 0.001 0.001 {pandas._libs.lib.map_infer}\n", + " 1 0.002 0.002 0.002 0.002 {pandas._libs.lib.maybe_convert_objects}\n", + " 395422 0.206 0.000 0.715 0.000 {pandas._libs.lib.values_from_object}\n", + "\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 1088861 function calls (1088853 primitive calls) in 0.353 seconds\n", + "\n", + " Ordered by: standard name\n", + "\n", + " ncalls tottime percall cumtime percall filename:lineno(function)\n", + " 18 0.000 0.000 0.000 0.000 :997(_handle_fromlist)\n", + " 1 0.005 0.005 0.353 0.353 :7(after)\n", + " 1 0.000 0.000 0.353 0.353 :1()\n", + " 2 0.000 0.000 0.000 0.000 __init__.py:205(iteritems)\n", + " 3 0.000 0.000 0.000 0.000 _methods.py:42(_any)\n", + " 98862 0.024 0.000 0.024 0.000 _weakrefset.py:70(__contains__)\n", + " 98861 0.030 0.000 0.054 0.000 abc.py:180(__instancecheck__)\n", + " 1 0.000 0.000 0.000 0.000 accessor.py:129(__get__)\n", + " 1 0.000 0.000 0.000 0.000 base.py:142(_freeze)\n", + " 3 0.000 0.000 0.000 0.000 base.py:147(__setattr__)\n", + " 1 0.000 0.000 0.000 0.000 base.py:1569(is_unique)\n", + " 1 0.000 0.000 0.000 0.000 base.py:1935(_engine)\n", + " 1 0.000 0.000 0.000 0.000 base.py:1938()\n", + " 3 0.000 0.000 0.000 0.000 base.py:2033(__contains__)\n", + " 1 0.000 0.000 0.000 0.000 base.py:2053(contains)\n", + " 2 0.000 0.000 0.000 0.000 base.py:2067(__getitem__)\n", + " 1 0.000 0.000 0.000 0.000 base.py:2465(identical)\n", + " 2 0.000 0.000 0.000 0.000 base.py:2470()\n", + " 5 0.000 0.000 0.000 0.000 base.py:3071(get_loc)\n", + " 14 0.000 0.000 0.000 0.000 base.py:4914(_ensure_index)\n", + " 22 0.000 0.000 0.000 0.000 base.py:61(is_dtype)\n", + " 2 0.000 0.000 0.000 0.000 base.py:635(_reset_identity)\n", + " 2 0.000 0.000 0.000 0.000 base.py:641(__len__)\n", + " 2 0.000 0.000 0.000 0.000 base.py:672(values)\n", + " 1 0.000 0.000 0.000 0.000 base.py:677(_values)\n", + " 1 0.000 0.000 0.000 0.000 base.py:789(_ndarray_values)\n", + " 1 0.000 0.000 0.000 0.000 base.py:924(view)\n", + " 1 0.000 0.000 0.000 0.000 cast.py:1232(construct_1d_ndarray_preserving_na)\n", + " 2 0.000 0.000 0.000 0.000 cast.py:853(maybe_castable)\n", + " 2 0.000 0.000 0.000 0.000 cast.py:867(maybe_infer_to_datetimelike)\n", + " 2 0.000 0.000 0.000 0.000 cast.py:971(maybe_cast_to_datetime)\n", + " 2 0.000 0.000 0.000 0.000 common.py:1170(is_datetime_or_timedelta_dtype)\n", + " 9 0.000 0.000 0.000 0.000 common.py:122(is_sparse)\n", + " 1 0.000 0.000 0.000 0.000 common.py:123(_default_index)\n", + " 1 0.000 0.000 0.000 0.000 common.py:1405(needs_i8_conversion)\n", + " 1 0.000 0.000 0.000 0.000 common.py:1490(is_string_like_dtype)\n", + " 1 0.000 0.000 0.000 0.000 common.py:154(_all_none)\n", + " 1 0.000 0.000 0.000 0.000 common.py:1578(is_bool_dtype)\n", + " 3 0.000 0.000 0.000 0.000 common.py:1629(is_extension_type)\n", + " 8 0.000 0.000 0.000 0.000 common.py:1688(is_extension_array_dtype)\n", + " 3 0.000 0.000 0.000 0.000 common.py:1784(_get_dtype)\n", + " 11 0.000 0.000 0.000 0.000 common.py:1835(_get_dtype_type)\n", + " 5 0.000 0.000 0.000 0.000 common.py:195(is_categorical)\n", + " 8 0.000 0.000 0.000 0.000 common.py:227(is_datetimetz)\n", + " 10 0.000 0.000 0.000 0.000 common.py:369(is_datetime64tz_dtype)\n", + " 3 0.000 0.000 0.000 0.000 common.py:395(_apply_if_callable)\n", + " 2 0.000 0.000 0.000 0.000 common.py:444(is_period_dtype)\n", + " 2 0.000 0.000 0.000 0.000 common.py:477(is_interval_dtype)\n", + " 9 0.000 0.000 0.000 0.000 common.py:513(is_categorical_dtype)\n", + " 2 0.000 0.000 0.000 0.000 common.py:546(is_string_dtype)\n", + " 1 0.000 0.000 0.000 0.000 common.py:811(is_integer_dtype)\n", + " 9 0.000 0.000 0.000 0.000 common.py:89(is_object_dtype)\n", + " 1 0.000 0.000 0.000 0.000 common.py:995(is_int_or_datetime_dtype)\n", + " 2 0.000 0.000 0.000 0.000 concat.py:105(_get_sliced_frame_result_type)\n", + " 2 0.000 0.000 0.000 0.000 dtypes.py:584(is_dtype)\n", + " 2 0.000 0.000 0.000 0.000 dtypes.py:707(is_dtype)\n", + " 2 0.000 0.000 0.001 0.000 frame.py:2664(__getitem__)\n", + " 2 0.000 0.000 0.000 0.000 frame.py:2690(_getitem_column)\n", + " 2 0.000 0.000 0.000 0.000 frame.py:3093(_box_item_values)\n", + " 2 0.000 0.000 0.000 0.000 frame.py:3100(_box_col_values)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:3105(__setitem__)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:3165(_ensure_valid_index)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:3182(_set_item)\n", + " 3 0.000 0.000 0.000 0.000 frame.py:320(_constructor)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:3324(_sanitize_column)\n", + " 3 0.000 0.000 0.005 0.002 frame.py:334(__init__)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:3344(reindexer)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:3541(align)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:461(_init_ndarray)\n", + " 1 0.000 0.000 0.002 0.002 frame.py:4759(_combine_match_index)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:478(_get_axes)\n", + " 1 0.000 0.000 0.001 0.001 frame.py:6845(_reduce)\n", + " 1 0.000 0.000 0.001 0.001 frame.py:6856(f)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:7047(_get_agg_axis)\n", + " 1 0.000 0.000 0.003 0.003 frame.py:7255(isin)\n", + " 1 0.000 0.000 0.001 0.001 frame.py:7349(_arrays_to_mgr)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:7419(_prep_ndarray)\n", + " 1 0.000 0.000 0.004 0.004 frame.py:7453(_to_arrays)\n", + " 1 0.000 0.000 0.004 0.004 frame.py:7547(_list_to_arrays)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:7604(_convert_object_array)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:7615(convert)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:7621()\n", + " 1 0.000 0.000 0.000 0.000 frame.py:7644(_homogenize)\n", + " 1 0.000 0.000 0.000 0.000 frame.py:844(__len__)\n", + " 1 0.000 0.000 0.000 0.000 function.py:38(__call__)\n", + " 7 0.000 0.000 0.000 0.000 generic.py:124(__init__)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:178(_init_mgr)\n", + " 2 0.000 0.000 0.000 0.000 generic.py:2484(_get_item_cache)\n", + " 2 0.000 0.000 0.000 0.000 generic.py:2498(_set_as_cached)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:2577(_clear_item_cache)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:2599(_set_item)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:2633(_check_setitem_copy)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:297(_construct_axes_dict)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:299()\n", + " 1 0.000 0.000 0.001 0.001 generic.py:3058(reindex_like)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:317(_construct_axes_from_arguments)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:349()\n", + " 3 0.000 0.000 0.000 0.000 generic.py:364(_get_axis_number)\n", + " 1 0.000 0.000 0.001 0.001 generic.py:3647(reindex)\n", + " 2 0.000 0.000 0.000 0.000 generic.py:3674()\n", + " 2 0.000 0.000 0.000 0.000 generic.py:377(_get_axis_name)\n", + " 2 0.000 0.000 0.000 0.000 generic.py:390(_get_axis)\n", + " 3 0.000 0.000 0.000 0.000 generic.py:4345(__finalize__)\n", + " 4 0.000 0.000 0.000 0.000 generic.py:4362(__getattr__)\n", + " 11 0.000 0.000 0.000 0.000 generic.py:4378(__setattr__)\n", + " 4 0.000 0.000 0.000 0.000 generic.py:4423(_protect_consolidate)\n", + " 3 0.000 0.000 0.000 0.000 generic.py:4433(_consolidate_inplace)\n", + " 3 0.000 0.000 0.000 0.000 generic.py:4436(f)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:4475(_is_mixed_type)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:4477()\n", + " 2 0.000 0.000 0.000 0.000 generic.py:4563(values)\n", + " 1 0.000 0.000 0.001 0.001 generic.py:5009(copy)\n", + " 67 0.000 0.000 0.000 0.000 generic.py:7(_check)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:7332(align)\n", + " 1 0.000 0.000 0.000 0.000 generic.py:7423(_align_series)\n", + " 1 0.000 0.000 0.001 0.001 generic.py:9675(logical_func)\n", + " 1 0.000 0.000 0.000 0.000 indexing.py:2321(convert_to_index_sliceable)\n", + " 98861 0.035 0.000 0.133 0.000 inference.py:251(is_list_like)\n", + " 1 0.000 0.000 0.000 0.000 inference.py:388(is_named_tuple)\n", + " 4 0.000 0.000 0.000 0.000 inference.py:415(is_hashable)\n", + " 6 0.000 0.000 0.000 0.000 internals.py:116(__init__)\n", + " 6 0.000 0.000 0.000 0.000 internals.py:127(_check_ndim)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:199(external_values)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:203(internal_values)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:213(get_values)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:2278(should_store)\n", + " 4 0.000 0.000 0.000 0.000 internals.py:2298(__init__)\n", + " 15 0.000 0.000 0.000 0.000 internals.py:233(mgr_locs)\n", + " 6 0.000 0.000 0.000 0.000 internals.py:237(mgr_locs)\n", + " 3 0.000 0.000 0.000 0.000 internals.py:269(make_block_same_class)\n", + " 4 0.000 0.000 0.000 0.000 internals.py:3148(get_block_type)\n", + " 6 0.000 0.000 0.000 0.000 internals.py:3191(make_block)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3265(__init__)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3266()\n", + " 7 0.000 0.000 0.000 0.000 internals.py:3307(shape)\n", + " 21 0.000 0.000 0.000 0.000 internals.py:3309()\n", + " 5 0.000 0.000 0.000 0.000 internals.py:3311(ndim)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3351(_is_single_block)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3363(_rebuild_blknos_and_blklocs)\n", + " 11 0.000 0.000 0.000 0.000 internals.py:3384(_get_items)\n", + " 3 0.000 0.000 0.000 0.000 internals.py:3473(__len__)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:348(shape)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3488(_verify_integrity)\n", + " 4 0.000 0.000 0.000 0.000 internals.py:3490()\n", + " 1 0.000 0.000 0.001 0.001 internals.py:3500(apply)\n", + " 5 0.000 0.000 0.000 0.000 internals.py:352(dtype)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:356(ftype)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3561()\n", + " 2 0.000 0.000 0.000 0.000 internals.py:372(iget)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:375(set)\n", + " 5 0.000 0.000 0.000 0.000 internals.py:3776(is_consolidated)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3784(_consolidate_check)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3785()\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3789(is_mixed_type)\n", + " 1 0.000 0.000 0.001 0.001 internals.py:3895(copy)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3915()\n", + " 1 0.000 0.000 0.000 0.000 internals.py:3916()\n", + " 2 0.000 0.000 0.000 0.000 internals.py:3922(as_array)\n", + " 3 0.000 0.000 0.000 0.000 internals.py:4085(consolidate)\n", + " 3 0.000 0.000 0.000 0.000 internals.py:4101(_consolidate_inplace)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:4108(get)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:4137(iget)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4208(set)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4235(value_getitem)\n", + " 4 0.000 0.000 0.000 0.000 internals.py:4639(__init__)\n", + " 6 0.000 0.000 0.000 0.000 internals.py:4684(_block)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4709(index)\n", + " 3 0.000 0.000 0.000 0.000 internals.py:4718(dtype)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:4742(external_values)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4745(internal_values)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4768(is_consolidated)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:4774(_consolidate_inplace)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:4846(create_block_manager_from_blocks)\n", + " 1 0.000 0.000 0.001 0.001 internals.py:4869(create_block_manager_from_arrays)\n", + " 1 0.000 0.000 0.001 0.001 internals.py:4880(form_blocks)\n", + " 1 0.000 0.000 0.001 0.001 internals.py:4972(_simple_blockify)\n", + " 1 0.000 0.000 0.001 0.001 internals.py:5017(_stack_arrays)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:5020(_asarray_compat)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:5026(_shape_compat)\n", + " 1 0.000 0.000 0.000 0.000 internals.py:5101(_extend_blocks)\n", + " 2 0.000 0.000 0.000 0.000 internals.py:5208(_get_blkno_placements)\n", + " 1 0.000 0.000 0.001 0.001 internals.py:774(copy)\n", + " 5 0.000 0.000 0.009 0.002 missing.py:112(_isna_new)\n", + " 2 0.002 0.001 0.009 0.004 missing.py:189(_isna_ndarraylike)\n", + " 1 0.000 0.000 0.000 0.000 missing.py:259(notna)\n", + " 5 0.000 0.000 0.009 0.002 missing.py:32(isna)\n", + " 1 0.000 0.000 0.000 0.000 missing.py:596(clean_reindex_fill_method)\n", + " 2 0.000 0.000 0.000 0.000 missing.py:74(clean_fill_method)\n", + " 1 0.000 0.000 0.000 0.000 nanops.py:179(_get_fill_value)\n", + " 1 0.000 0.000 0.001 0.001 nanops.py:202(_get_values)\n", + " 1 0.000 0.000 0.000 0.000 nanops.py:256(_na_ok_dtype)\n", + " 1 0.000 0.000 0.000 0.000 nanops.py:260(_view_if_needed)\n", + " 1 0.000 0.000 0.001 0.001 nanops.py:318(nanany)\n", + " 4 0.000 0.000 0.000 0.000 numeric.py:110(is_all_dates)\n", + " 2 0.000 0.000 0.000 0.000 numeric.py:2491(seterr)\n", + " 2 0.000 0.000 0.000 0.000 numeric.py:2592(geterr)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:2887(__init__)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:2891(__enter__)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:2896(__exit__)\n", + " 3 0.000 0.000 0.000 0.000 numeric.py:433(asarray)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:504(asanyarray)\n", + " 1 0.000 0.000 0.000 0.000 numeric.py:630(require)\n", + " 2 0.000 0.000 0.000 0.000 numeric.py:701()\n", + " 1 0.000 0.000 0.002 0.002 ops.py:1397(_combine_series_frame)\n", + " 1 0.000 0.000 0.000 0.000 ops.py:1442(_align_method_FRAME)\n", + " 1 0.000 0.000 0.002 0.002 ops.py:1571(na_op)\n", + " 1 0.000 0.000 0.002 0.002 ops.py:1579(f)\n", + " 2 0.000 0.000 0.000 0.000 range.py:131(_simple_new)\n", + " 1 0.000 0.000 0.000 0.000 range.py:158(_validate_dtype)\n", + " 1 0.000 0.000 0.000 0.000 range.py:177(_get_data_as_items)\n", + " 1 0.000 0.000 0.000 0.000 range.py:236(dtype)\n", + " 1 0.000 0.000 0.000 0.000 range.py:240(is_unique)\n", + " 1 0.000 0.000 0.000 0.000 range.py:260(_shallow_copy)\n", + " 4 0.000 0.000 0.000 0.000 range.py:315(equals)\n", + " 35 0.000 0.000 0.000 0.000 range.py:481(__len__)\n", + " 1 0.000 0.000 0.000 0.000 range.py:491(__getitem__)\n", + " 2 0.000 0.000 0.000 0.000 range.py:68(__new__)\n", + " 2 0.000 0.000 0.000 0.000 range.py:84(_ensure_int)\n", + " 4 0.000 0.000 0.000 0.000 series.py:166(__init__)\n", + " 1 0.000 0.000 0.001 0.001 series.py:3323(reindex)\n", + " 1 0.000 0.000 0.000 0.000 series.py:349(_constructor)\n", + " 1 0.000 0.000 0.000 0.000 series.py:353(_constructor_expanddim)\n", + " 4 0.000 0.000 0.000 0.000 series.py:365(_set_axis)\n", + " 4 0.000 0.000 0.000 0.000 series.py:391(_set_subtyp)\n", + " 6 0.000 0.000 0.000 0.000 series.py:401(name)\n", + " 2 0.000 0.000 0.000 0.000 series.py:4019(_sanitize_array)\n", + " 2 0.000 0.000 0.000 0.000 series.py:4036(_try_cast)\n", + " 6 0.000 0.000 0.000 0.000 series.py:405(name)\n", + " 3 0.000 0.000 0.000 0.000 series.py:412(dtype)\n", + " 2 0.000 0.000 0.000 0.000 series.py:432(values)\n", + " 1 0.000 0.000 0.000 0.000 series.py:465(_values)\n", + " 1 0.000 0.000 0.000 0.000 series.py:562(__len__)\n", + " 1 0.000 0.000 0.000 0.000 shape_base.py:63(atleast_2d)\n", + " 1 0.000 0.000 0.119 0.119 strings.py:1345(str_split)\n", + " 98855 0.014 0.000 0.096 0.000 strings.py:1456()\n", + " 1 0.000 0.000 0.119 0.119 strings.py:148(_na_map)\n", + " 1 0.000 0.000 0.119 0.119 strings.py:153(_map)\n", + " 1 0.000 0.000 0.000 0.000 strings.py:1894(__init__)\n", + " 1 0.000 0.000 0.000 0.000 strings.py:1904(_validate)\n", + " 1 0.001 0.001 0.222 0.222 strings.py:1953(_wrap_result)\n", + " 98855 0.022 0.000 0.155 0.000 strings.py:1978(cons_row)\n", + " 1 0.016 0.016 0.172 0.172 strings.py:1984()\n", + " 98856 0.014 0.000 0.018 0.000 strings.py:1987()\n", + " 1 0.017 0.017 0.021 0.021 strings.py:1988()\n", + " 1 0.001 0.001 0.342 0.342 strings.py:2328(split)\n", + " 2 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x9cff80}\n", + " 1 0.002 0.002 0.002 0.002 {built-in method _operator.eq}\n", + " 3/2 0.000 0.000 0.000 0.000 {built-in method builtins.all}\n", + " 3 0.000 0.000 0.000 0.000 {built-in method builtins.callable}\n", + " 1 0.000 0.000 0.353 0.353 {built-in method builtins.exec}\n", + " 107 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}\n", + " 31 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}\n", + " 8 0.000 0.000 0.000 0.000 {built-in method builtins.hash}\n", + " 197972 0.044 0.000 0.098 0.000 {built-in method builtins.isinstance}\n", + " 41 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method builtins.iter}\n", + "197830/197823 0.008 0.000 0.008 0.000 {built-in method builtins.len}\n", + " 36 0.007 0.000 0.025 0.001 {built-in method builtins.max}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method builtins.sum}\n", + " 3 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.arange}\n", + " 8 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.array}\n", + " 6 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.empty}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.putmask}\n", + " 4 0.000 0.000 0.000 0.000 {built-in method numpy.core.umath.geterrobj}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method numpy.core.umath.seterrobj}\n", + " 1 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_int64}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method pandas._libs.algos.ensure_object}\n", + " 2 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.infer_datetimelike_array}\n", + " 5 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_integer}\n", + " 11 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_scalar}\n", + " 3 0.000 0.000 0.000 0.000 {built-in method pandas._libs.missing.checknull}\n", + " 3 0.000 0.000 0.000 0.000 {method 'any' of 'numpy.ndarray' objects}\n", + " 4 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'clear' of 'dict' objects}\n", + " 4 0.001 0.000 0.001 0.000 {method 'copy' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects}\n", + " 4 0.000 0.000 0.000 0.000 {method 'fill' of 'numpy.ndarray' objects}\n", + " 2 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}\n", + " 8 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}\n", + " 5 0.000 0.000 0.000 0.000 {method 'get_loc' of 'pandas._libs.index.IndexEngine' objects}\n", + " 4 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}\n", + " 9 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}\n", + " 3 0.000 0.000 0.000 0.000 {method 'reduce' of 'numpy.ufunc' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}\n", + " 98855 0.081 0.000 0.081 0.000 {method 'split' of 'str' objects}\n", + " 2 0.000 0.000 0.000 0.000 {method 'transpose' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}\n", + " 1 0.000 0.000 0.000 0.000 {method 'upper' of 'str' objects}\n", + " 3 0.000 0.000 0.000 0.000 {method 'view' of 'numpy.ndarray' objects}\n", + " 1 0.000 0.000 0.000 0.000 {pandas._libs.internals.get_blkno_indexers}\n", + " 1 0.016 0.016 0.112 0.112 {pandas._libs.lib.map_infer_mask}\n", + " 1 0.000 0.000 0.000 0.000 {pandas._libs.lib.maybe_convert_objects}\n", + " 1 0.003 0.003 0.003 0.003 {pandas._libs.lib.to_object_array}\n", + " 1 0.000 0.000 0.000 0.000 {pandas._libs.lib.values_from_object}\n", + " 1 0.007 0.007 0.007 0.007 {pandas._libs.missing.isnaobj}\n", + "\n", + "\n" + ] + } + ], + "source": [ + "import cProfile\n", + "\n", + "def before(df):\n", + " df[\"exists\"] = ~df.apply(lambda x: x.LanguageDesireNextYear in x.LanguageWorkedWith, axis=\"columns\")\n", + "\n", + "\n", + "def after(df):\n", + " df_split = df[\"LanguageDesireNextYear\"].str.split(\",\", expand=True)\n", + " df[\"exists\"] = df_split.isin(df[\"LanguageWorkedWith\"]).any(1)\n", + "\n", + "df.LanguageDesireNextYear = df.LanguageDesireNextYear.fillna('missing')\n", + "df.LanguageWorkedWith = df.LanguageWorkedWith.fillna('missing')\n", + "cProfile.run(\"before(df)\")\n", + "cProfile.run(\"after(df)\")" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7.59 s ± 48.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + } + ], + "source": [ + "%timeit before(df)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "147 ms ± 2.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "%timeit after(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Do tests\n", + "\n", + "[For loops with pandas - When should I care?](https://stackoverflow.com/questions/54028199/for-loops-with-pandas-when-should-i-care)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 0%| | 0/15 [00:00" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "import perfplot \n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "perfplot.show(\n", + " setup=lambda n: pd.DataFrame(np.random.choice(1000, (n, 2)), columns=['A','B']),\n", + " kernels=[\n", + " lambda df: df[df.A != df.B],\n", + " lambda df: df.query('A != B'),\n", + " lambda df: df[[x != y for x, y in zip(df.A, df.B)]]\n", + " ],\n", + " labels=['vectorized !=', 'query (numexpr)', 'list comp'],\n", + " n_range=[2**k for k in range(0, 15)],\n", + " xlabel='N'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From fce9bb22c5ecde9c814a997cef5110eec360b033 Mon Sep 17 00:00:00 2001 From: softhints Date: Sat, 6 Apr 2019 13:01:09 +0300 Subject: [PATCH 27/76] count and percentage --- ...and percentage by value for a column.ipynb | 328 ++++++++++++++++++ 1 file changed, 328 insertions(+) create mode 100644 notebooks/Pandas count and percentage by value for a column.ipynb diff --git a/notebooks/Pandas count and percentage by value for a column.ipynb b/notebooks/Pandas count and percentage by value for a column.ipynb new file mode 100644 index 0000000..b31455a --- /dev/null +++ b/notebooks/Pandas count and percentage by value for a column.ipynb @@ -0,0 +1,328 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pandas count and percentage by value for a column\n", + "\n", + "* read remote data from pdf\n", + "* calculate count and percent\n", + "* format percent in better output\n", + "\n", + "Bonus\n", + "\n", + "* pandas column renaming" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
foodPortion sizeper 100 gramsenergy
0Fish cake90 cals per cake200 calsMedium
1Fish fingers50 cals per piece220 calsMedium
2Gammon320 cals280 calsMed-High
3Haddock fresh200 cals110 calsLow calorie
4Halibut fresh220 cals125 calsLow calorie
\n", + "
" + ], + "text/plain": [ + " food Portion size per 100 grams energy\n", + "0 Fish cake 90 cals per cake 200 cals Medium\n", + "1 Fish fingers 50 cals per piece 220 cals Medium\n", + "2 Gammon 320 cals 280 cals Med-High\n", + "3 Haddock fresh 200 cals 110 cals Low calorie\n", + "4 Halibut fresh 220 cals 125 cals Low calorie" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from tabula import read_pdf\n", + "import pandas as pd\n", + "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=3, pandas_options={'header': None})\n", + "df.columns = ['food', 'Portion size ', 'per 100 grams', 'energy']\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "s = df.energy" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Medium 14\n", + "High 6\n", + "Low calorie 4\n", + "Med-High 4\n", + "Low-Med 1\n", + "Low- Med 1\n", + "Name: energy, dtype: int64" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "counts = s.value_counts()\n", + "counts" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Medium 0.466667\n", + "High 0.200000\n", + "Low calorie 0.133333\n", + "Med-High 0.133333\n", + "Low-Med 0.033333\n", + "Low- Med 0.033333\n", + "Name: energy, dtype: float64" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "percent = s.value_counts(normalize=True)\n", + "percent" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Medium 46.7%\n", + "High 20.0%\n", + "Low calorie 13.3%\n", + "Med-High 13.3%\n", + "Low-Med 3.3%\n", + "Low- Med 3.3%\n", + "Name: energy, dtype: object" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "percent100 = s.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'\n", + "percent100" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countsperper100
Medium140.46666746.7%
High60.20000020.0%
Low calorie40.13333313.3%
Med-High40.13333313.3%
Low-Med10.0333333.3%
Low- Med10.0333333.3%
\n", + "
" + ], + "text/plain": [ + " counts per per100\n", + "Medium 14 0.466667 46.7%\n", + "High 6 0.200000 20.0%\n", + "Low calorie 4 0.133333 13.3%\n", + "Med-High 4 0.133333 13.3%\n", + "Low-Med 1 0.033333 3.3%\n", + "Low- Med 1 0.033333 3.3%" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.DataFrame({'counts': counts, 'per': percent, 'per100': percent100})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = df.energy\n", + "counts = s.value_counts()\n", + "percent = s.value_counts(normalize=True)\n", + "percent100 = s.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'\n", + "pd.DataFrame({'counts': counts, 'per': percent, 'per100': percent100})" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 82ffc36d61a3e5a2654bf4125c93e559071f813c Mon Sep 17 00:00:00 2001 From: softhints Date: Sun, 7 Apr 2019 10:58:46 +0300 Subject: [PATCH 28/76] pandas-use-list-values-select-rows-column --- ...s-use-list-values-select-rows-column.ipynb | 1453 +++++++++++++++++ 1 file changed, 1453 insertions(+) create mode 100644 notebooks/pandas/pandas-use-list-values-select-rows-column.ipynb diff --git a/notebooks/pandas/pandas-use-list-values-select-rows-column.ipynb b/notebooks/pandas/pandas-use-list-values-select-rows-column.ipynb new file mode 100644 index 0000000..41e2e0e --- /dev/null +++ b/notebooks/pandas/pandas-use-list-values-select-rows-column.ipynb @@ -0,0 +1,1453 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas use a list of values to select rows from a column\n", + "\n", + "* filter pandas rows by exact match from a list\n", + "* filter pandas rows by partial match from a list\n", + "\n", + "Bonus\n", + "\n", + "* execute value counts on multiple columns\n", + "* vectorized operations\n", + "\n", + "> Vectorization is the process of executing operations on entire arrays. " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "pd.set_option('display.max_colwidth', -1)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(98855, 129)\n" + ] + } + ], + "source": [ + "# read the data frame and see the data insight\n", + "df = pd.read_csv(\"../csv/stackoverflow/developer_survey_2018/survey_results_public.csv\", low_memory=False)\n", + "print(df.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
RespondentHobbyOpenSourceCountryStudentEmploymentFormalEducationUndergradMajorCompanySizeDevType...ExerciseGenderSexualOrientationEducationParentsRaceEthnicityAgeDependentsMilitaryUSSurveyTooLongSurveyEasy
01YesNoKenyaNoEmployed part-timeBachelor’s degree (BA, BS, B.Eng., etc.)Mathematics or statistics20 to 99 employeesFull-stack developer...3 - 4 times per weekMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)Black or of African descent25 - 34 years oldYesNaNThe survey was an appropriate lengthVery easy
13YesYesUnited KingdomNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)A natural science (ex. biology, chemistry, physics)10,000 or more employeesDatabase administrator;DevOps specialist;Full-stack developer;System administrator...Daily or almost every dayMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)White or of European descent35 - 44 years oldYesNaNThe survey was an appropriate lengthSomewhat easy
\n", + "

2 rows × 129 columns

\n", + "
" + ], + "text/plain": [ + " Respondent Hobby OpenSource Country Student Employment \\\n", + "0 1 Yes No Kenya No Employed part-time \n", + "1 3 Yes Yes United Kingdom No Employed full-time \n", + "\n", + " FormalEducation \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "\n", + " UndergradMajor \\\n", + "0 Mathematics or statistics \n", + "1 A natural science (ex. biology, chemistry, physics) \n", + "\n", + " CompanySize \\\n", + "0 20 to 99 employees \n", + "1 10,000 or more employees \n", + "\n", + " DevType \\\n", + "0 Full-stack developer \n", + "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", + "\n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "1 ... Daily or almost every day Male Straight or heterosexual \n", + "\n", + " EducationParents RaceEthnicity \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent \n", + "\n", + " Age Dependents MilitaryUS \\\n", + "0 25 - 34 years old Yes NaN \n", + "1 35 - 44 years old Yes NaN \n", + "\n", + " SurveyTooLong SurveyEasy \n", + "0 The survey was an appropriate length Very easy \n", + "1 The survey was an appropriate length Somewhat easy \n", + "\n", + "[2 rows x 129 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Computer science, computer engineering, or software engineering 50336\n", + "Another engineering discipline (ex. civil, electrical, mechanical) 6945 \n", + "Information systems, information technology, or system administration 6507 \n", + "A natural science (ex. biology, chemistry, physics) 3050 \n", + "Mathematics or statistics 2818 \n", + "Web development or web design 2418 \n", + "A business discipline (ex. accounting, finance, marketing) 1921 \n", + "A humanities discipline (ex. literature, history, philosophy) 1590 \n", + "A social science (ex. anthropology, psychology, political science) 1377 \n", + "Fine arts or performing arts (ex. graphic design, music, studio art) 1135 \n", + "I never declared a major 693 \n", + "A health science (ex. nursing, pharmacy, radiology) 246 \n", + "Name: UndergradMajor, dtype: int64" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.UndergradMajor.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
RespondentHobbyOpenSourceCountryStudentEmploymentFormalEducationUndergradMajorCompanySizeDevType...ExerciseGenderSexualOrientationEducationParentsRaceEthnicityAgeDependentsMilitaryUSSurveyTooLongSurveyEasy
01YesNoKenyaNoEmployed part-timeBachelor’s degree (BA, BS, B.Eng., etc.)Mathematics or statistics20 to 99 employeesFull-stack developer...3 - 4 times per weekMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)Black or of African descent25 - 34 years oldYesNaNThe survey was an appropriate lengthVery easy
3251YesNoUnited StatesNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)Web development or web design500 to 999 employeesBack-end developer;Designer;Front-end developer;Full-stack developer;Marketing or sales professional;Mobile developer...Daily or almost every dayFemaleStraight or heterosexualAssociate degreeWhite or of European descent18 - 24 years oldNoNoThe survey was an appropriate lengthVery easy
82124YesYesUnited KingdomNoEmployed full-timeMaster’s degree (MA, MS, M.Eng., MBA, etc.)Mathematics or statistics10,000 or more employeesBack-end developer;DevOps specialist;Front-end developer;Full-stack developer;Mobile developer...1 - 2 times per weekMaleStraight or heterosexualBachelor’s degree (BA, BS, B.Eng., etc.)White or of European descent25 - 34 years oldYesNaNThe survey was an appropriate lengthVery easy
84126YesYesArgentinaYes, part-timeEmployed full-timeSome college/university study without earning a degreeWeb development or web designFewer than 10 employeesMobile developer...1 - 2 times per weekMaleStraight or heterosexualSome college/university study without earning a degreeNaN25 - 34 years oldNoNaNThe survey was an appropriate lengthVery easy
148230YesYesUnited StatesNoEmployed full-timeBachelor’s degree (BA, BS, B.Eng., etc.)Mathematics or statistics1,000 to 4,999 employeesData scientist or machine learning specialist...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", + "

5 rows × 129 columns

\n", + "
" + ], + "text/plain": [ + " Respondent Hobby OpenSource Country Student \\\n", + "0 1 Yes No Kenya No \n", + "32 51 Yes No United States No \n", + "82 124 Yes Yes United Kingdom No \n", + "84 126 Yes Yes Argentina Yes, part-time \n", + "148 230 Yes Yes United States No \n", + "\n", + " Employment \\\n", + "0 Employed part-time \n", + "32 Employed full-time \n", + "82 Employed full-time \n", + "84 Employed full-time \n", + "148 Employed full-time \n", + "\n", + " FormalEducation \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "32 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "82 Master’s degree (MA, MS, M.Eng., MBA, etc.) \n", + "84 Some college/university study without earning a degree \n", + "148 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "\n", + " UndergradMajor CompanySize \\\n", + "0 Mathematics or statistics 20 to 99 employees \n", + "32 Web development or web design 500 to 999 employees \n", + "82 Mathematics or statistics 10,000 or more employees \n", + "84 Web development or web design Fewer than 10 employees \n", + "148 Mathematics or statistics 1,000 to 4,999 employees \n", + "\n", + " DevType \\\n", + "0 Full-stack developer \n", + "32 Back-end developer;Designer;Front-end developer;Full-stack developer;Marketing or sales professional;Mobile developer \n", + "82 Back-end developer;DevOps specialist;Front-end developer;Full-stack developer;Mobile developer \n", + "84 Mobile developer \n", + "148 Data scientist or machine learning specialist \n", + "\n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "32 ... Daily or almost every day Female Straight or heterosexual \n", + "82 ... 1 - 2 times per week Male Straight or heterosexual \n", + "84 ... 1 - 2 times per week Male Straight or heterosexual \n", + "148 ... NaN NaN NaN \n", + "\n", + " EducationParents \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "32 Associate degree \n", + "82 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "84 Some college/university study without earning a degree \n", + "148 NaN \n", + "\n", + " RaceEthnicity Age Dependents MilitaryUS \\\n", + "0 Black or of African descent 25 - 34 years old Yes NaN \n", + "32 White or of European descent 18 - 24 years old No No \n", + "82 White or of European descent 25 - 34 years old Yes NaN \n", + "84 NaN 25 - 34 years old No NaN \n", + "148 NaN NaN NaN NaN \n", + "\n", + " SurveyTooLong SurveyEasy \n", + "0 The survey was an appropriate length Very easy \n", + "32 The survey was an appropriate length Very easy \n", + "82 The survey was an appropriate length Very easy \n", + "84 The survey was an appropriate length Very easy \n", + "148 NaN NaN \n", + "\n", + "[5 rows x 129 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['UndergradMajor'].isin(['Mathematics or statistics', \n", + " 'Web development or web design'])].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "area_list = ['biology', 'physics', 'Computer', 'enginnering', 'pharmacy', 'psychology', 'graphic design',\n", + " 'music', 'art', 'studio art', 'accounting', 'finance', 'chemistry',]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
biologyphysicsComputerenginneringpharmacypsychologygraphic designmusicartstudio artaccountingfinancechemistry
0FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
1TrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrue
2FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
3FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
4FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
5FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
6FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
7FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
8FalseFalseFalseFalseFalseFalseTrueTrueTrueTrueFalseFalseFalse
9FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
10FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
11FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
12FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
13FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
14NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
15FalseFalseFalseFalseFalseFalseTrueTrueTrueTrueFalseFalseFalse
16FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
17FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueTrueFalse
18NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
19FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueTrueFalse
20FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
21NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
22FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
23NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
24FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
25TrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrue
26FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
27FalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalse
28FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
29FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
\n", + "
" + ], + "text/plain": [ + " biology physics Computer enginnering pharmacy psychology graphic design \\\n", + "0 False False False False False False False \n", + "1 True True False False False False False \n", + "2 False False True False False False False \n", + "3 False False True False False False False \n", + "4 False False True False False False False \n", + "5 False False True False False False False \n", + "6 False False True False False False False \n", + "7 False False True False False False False \n", + "8 False False False False False False True \n", + "9 False False True False False False False \n", + "10 False False False False False False False \n", + "11 False False False False False False False \n", + "12 False False True False False False False \n", + "13 False False False False False False False \n", + "14 NaN NaN NaN NaN NaN NaN NaN \n", + "15 False False False False False False True \n", + "16 False False True False False False False \n", + "17 False False False False False False False \n", + "18 NaN NaN NaN NaN NaN NaN NaN \n", + "19 False False False False False False False \n", + "20 False False False False False False False \n", + "21 NaN NaN NaN NaN NaN NaN NaN \n", + "22 False False True False False False False \n", + "23 NaN NaN NaN NaN NaN NaN NaN \n", + "24 False False True False False False False \n", + "25 True True False False False False False \n", + "26 False False True False False False False \n", + "27 False False False False False True False \n", + "28 False False True False False False False \n", + "29 False False False False False False False \n", + "\n", + " music art studio art accounting finance chemistry \n", + "0 False False False False False False \n", + "1 False False False False False True \n", + "2 False False False False False False \n", + "3 False False False False False False \n", + "4 False False False False False False \n", + "5 False False False False False False \n", + "6 False False False False False False \n", + "7 False False False False False False \n", + "8 True True True False False False \n", + "9 False False False False False False \n", + "10 False False False False False False \n", + "11 False False False False False False \n", + "12 False False False False False False \n", + "13 False False False False False False \n", + "14 NaN NaN NaN NaN NaN NaN \n", + "15 True True True False False False \n", + "16 False False False False False False \n", + "17 False False False True True False \n", + "18 NaN NaN NaN NaN NaN NaN \n", + "19 False False False True True False \n", + "20 False False False False False False \n", + "21 NaN NaN NaN NaN NaN NaN \n", + "22 False False False False False False \n", + "23 NaN NaN NaN NaN NaN NaN \n", + "24 False False False False False False \n", + "25 False False False False False True \n", + "26 False False False False False False \n", + "27 False False False False False False \n", + "28 False False False False False False \n", + "29 False False False False False False " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import re\n", + "area_df = pd.DataFrame(dict((area, df.UndergradMajor.str.contains(area))\n", + " for area in area_list))\n", + "area_df.head(30)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Back-end developer 6417\n", + "Full-stack developer 6104\n", + "Back-end developer;Front-end developer;Full-stack developer 4460\n", + "Mobile developer 3518\n", + "Student 3222\n", + "Name: DevType, dtype: int64" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.DevType.value_counts().head()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "dev_list = ['Mobile', 'Data', 'QA']" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
012345678910111213141516171819
MobileFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalse
DataFalseTrueFalseFalseTrueTrueFalseFalseTrueFalseTrueFalseFalseFalseFalseFalseFalseFalseTrueFalse
QAFalseFalseFalseFalseTrueFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrue
\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4 5 6 7 8 9 \\\n", + "Mobile False False False False False False False False False False \n", + "Data False True False False True True False False True False \n", + "QA False False False False True False False True False False \n", + "\n", + " 10 11 12 13 14 15 16 17 18 19 \n", + "Mobile True False False False False False False False False False \n", + "Data True False False False False False False False True False \n", + "QA False False False False False False False False False True " + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import re\n", + "dev_df = pd.DataFrame(dict((dev, df.DevType.str.contains(dev, re.IGNORECASE))\n", + " for dev in dev_list))\n", + "dev_df.head(20).T" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False 85904\n", + "True 6194 \n", + "Name: QA, dtype: int64" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dev_df.QA.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
MobileDataQA
False732947020985904
True18804218896194
\n", + "
" + ], + "text/plain": [ + " Mobile Data QA\n", + "False 73294 70209 85904\n", + "True 18804 21889 6194 " + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dev_df.apply(pd.Series.value_counts)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
MobileQA
False7329485904
True188046194
\n", + "
" + ], + "text/plain": [ + " Mobile QA\n", + "False 73294 85904\n", + "True 18804 6194 " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dev_df[['Mobile','QA']].apply(pd.Series.value_counts)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 392a223edada5d81b2ce6da3ef593ecb213d0c5d Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Sat, 13 Apr 2019 11:02:14 +0300 Subject: [PATCH 29/76] Create README.md --- README.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..5917e28 --- /dev/null +++ b/README.md @@ -0,0 +1,17 @@ +# python +Jupyter notebooks and datasets for the interesting pandas/python/data science video series. + +# Who is this repo? + +For people who are interested in data science, data analysis and finding interesting relation for data. This repository is related to site: https://blog.softhints.com/tag/pandas/ where you can find more interesting videos. The youtube channel is: + +https://www.youtube.com/channel/UCg5rvP_D735oSBatdcH5ZFA + +# Popular Videos + +https://softhints.com/youtube-videos.html + +# Latest Videos + +1. [Pandas Tutorial : How to split columns of dataframe](https://www.youtube.com/watch?v=cCoGsFVPVh0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) + From a8de2db6db791f5336e96af1baacf5673e0f71f2 Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Sat, 13 Apr 2019 11:06:02 +0300 Subject: [PATCH 30/76] Update README.md --- README.md | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 5917e28..e59275c 100644 --- a/README.md +++ b/README.md @@ -13,5 +13,20 @@ https://softhints.com/youtube-videos.html # Latest Videos -1. [Pandas Tutorial : How to split columns of dataframe](https://www.youtube.com/watch?v=cCoGsFVPVh0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) - +0. (Pandas Tutorial : How to split columns of dataframe)[https://www.youtube.com/watch?v=cCoGsFVPVh0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +1. (Pandas Tutorial : How to split dataframe by string or date)[https://www.youtube.com/watch?v=7sgDvC4k6Xg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +2. (Easily extract tables from websites with pandas and python)[https://www.youtube.com/watch?v=OXA_ZD1gR6A&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +3. (Easily extract information from excel with Python and Pandas)[https://www.youtube.com/watch?v=hJMH_1o8eU0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +4. (Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2)[https://www.youtube.com/watch?v=702lkQbZx50&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +5. (Pandas is column part of another column in the same row of dataframe)[https://www.youtube.com/watch?v=duOHHDqI40c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +6. (Load multiple CSV files into a single Dataframe)[https://www.youtube.com/watch?v=30ndwJm1I5c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +7. (Analyze top youtube channels 2019 with pandas - PewDiePie I)[https://www.youtube.com/watch?v=mG9OnH9R5yM&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +8. (dataframe column transformations ( str, int, category, concat))[https://www.youtube.com/watch?v=5pbRivDYzko&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +9. (Pandas DataFrame generate n-level hierarchical JSON)[https://www.youtube.com/watch?v=lCcE-0bykRU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +10. (Pandas How add new column existing DataFrame)[https://www.youtube.com/watch?v=UvCO5gKQqtE&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +11. (Python Pandas find and drop duplicate data)[https://www.youtube.com/watch?v=4ixLp8aFomw&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +12. (Map the headers to a column with pandas?)[https://www.youtube.com/watch?v=3g6KG_8zq0E&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +13. (Pandas count values in a column of type list)[https://www.youtube.com/watch?v=lx7KFd6BPcg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +14. (How to Optimize and Speed Up Pandas)[https://www.youtube.com/watch?v=nW5ltiwV-6Y&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +15. (Pandas count and percentage by value for a column)[https://www.youtube.com/watch?v=P5pxJkv71BU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +16. (Pandas use a list of values to select rows from a column)[https://www.youtube.com/watch?v=jlSbo5wmTPQ&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] From 1310aaa4ce252664f6218761fa9c821d5f6735a0 Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Sat, 13 Apr 2019 11:06:41 +0300 Subject: [PATCH 31/76] Update README.md --- README.md | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index e59275c..9901a47 100644 --- a/README.md +++ b/README.md @@ -13,20 +13,20 @@ https://softhints.com/youtube-videos.html # Latest Videos -0. (Pandas Tutorial : How to split columns of dataframe)[https://www.youtube.com/watch?v=cCoGsFVPVh0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -1. (Pandas Tutorial : How to split dataframe by string or date)[https://www.youtube.com/watch?v=7sgDvC4k6Xg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -2. (Easily extract tables from websites with pandas and python)[https://www.youtube.com/watch?v=OXA_ZD1gR6A&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -3. (Easily extract information from excel with Python and Pandas)[https://www.youtube.com/watch?v=hJMH_1o8eU0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -4. (Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2)[https://www.youtube.com/watch?v=702lkQbZx50&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -5. (Pandas is column part of another column in the same row of dataframe)[https://www.youtube.com/watch?v=duOHHDqI40c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -6. (Load multiple CSV files into a single Dataframe)[https://www.youtube.com/watch?v=30ndwJm1I5c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -7. (Analyze top youtube channels 2019 with pandas - PewDiePie I)[https://www.youtube.com/watch?v=mG9OnH9R5yM&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -8. (dataframe column transformations ( str, int, category, concat))[https://www.youtube.com/watch?v=5pbRivDYzko&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -9. (Pandas DataFrame generate n-level hierarchical JSON)[https://www.youtube.com/watch?v=lCcE-0bykRU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -10. (Pandas How add new column existing DataFrame)[https://www.youtube.com/watch?v=UvCO5gKQqtE&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -11. (Python Pandas find and drop duplicate data)[https://www.youtube.com/watch?v=4ixLp8aFomw&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -12. (Map the headers to a column with pandas?)[https://www.youtube.com/watch?v=3g6KG_8zq0E&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -13. (Pandas count values in a column of type list)[https://www.youtube.com/watch?v=lx7KFd6BPcg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -14. (How to Optimize and Speed Up Pandas)[https://www.youtube.com/watch?v=nW5ltiwV-6Y&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -15. (Pandas count and percentage by value for a column)[https://www.youtube.com/watch?v=P5pxJkv71BU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] -16. (Pandas use a list of values to select rows from a column)[https://www.youtube.com/watch?v=jlSbo5wmTPQ&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv] +0. [Pandas Tutorial : How to split columns of dataframe](https://www.youtube.com/watch?v=cCoGsFVPVh0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +1. [Pandas Tutorial : How to split dataframe by string or date](https://www.youtube.com/watch?v=7sgDvC4k6Xg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +2. [Easily extract tables from websites with pandas and python](https://www.youtube.com/watch?v=OXA_ZD1gR6A&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +3. [Easily extract information from excel with Python and Pandas](https://www.youtube.com/watch?v=hJMH_1o8eU0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +4. [Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2](https://www.youtube.com/watch?v=702lkQbZx50&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +5. [Pandas is column part of another column in the same row of dataframe](https://www.youtube.com/watch?v=duOHHDqI40c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +6. [Load multiple CSV files into a single Dataframe](https://www.youtube.com/watch?v=30ndwJm1I5c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +7. [Analyze top youtube channels 2019 with pandas - PewDiePie I](https://www.youtube.com/watch?v=mG9OnH9R5yM&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +8. [dataframe column transformations ( str, int, category, concat)](https://www.youtube.com/watch?v=5pbRivDYzko&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +9. [Pandas DataFrame generate n-level hierarchical JSON](https://www.youtube.com/watch?v=lCcE-0bykRU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +10. [Pandas How add new column existing DataFrame](https://www.youtube.com/watch?v=UvCO5gKQqtE&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +11. [Python Pandas find and drop duplicate data](https://www.youtube.com/watch?v=4ixLp8aFomw&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +12. [Map the headers to a column with pandas?](https://www.youtube.com/watch?v=3g6KG_8zq0E&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +13. [Pandas count values in a column of type list](https://www.youtube.com/watch?v=lx7KFd6BPcg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +14. [How to Optimize and Speed Up Pandas](https://www.youtube.com/watch?v=nW5ltiwV-6Y&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +15. [Pandas count and percentage by value for a column](https://www.youtube.com/watch?v=P5pxJkv71BU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) +16. [Pandas use a list of values to select rows from a column](https://www.youtube.com/watch?v=jlSbo5wmTPQ&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) From 650571be86ebe8af6c1a86fed35ad9f7bbc0fe1e Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Sat, 13 Apr 2019 11:08:05 +0300 Subject: [PATCH 32/76] Update README.md --- README.md | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/README.md b/README.md index 9901a47..9cd91ed 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,8 @@ https://softhints.com/youtube-videos.html # Latest Videos +## Pandas + 0. [Pandas Tutorial : How to split columns of dataframe](https://www.youtube.com/watch?v=cCoGsFVPVh0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 1. [Pandas Tutorial : How to split dataframe by string or date](https://www.youtube.com/watch?v=7sgDvC4k6Xg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 2. [Easily extract tables from websites with pandas and python](https://www.youtube.com/watch?v=OXA_ZD1gR6A&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) @@ -30,3 +32,56 @@ https://softhints.com/youtube-videos.html 14. [How to Optimize and Speed Up Pandas](https://www.youtube.com/watch?v=nW5ltiwV-6Y&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 15. [Pandas count and percentage by value for a column](https://www.youtube.com/watch?v=P5pxJkv71BU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 16. [Pandas use a list of values to select rows from a column](https://www.youtube.com/watch?v=jlSbo5wmTPQ&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) + + +## python + +0. [python string split by separator](https://www.youtube.com/watch?v=iBsg75W2Vig&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +1. [python random number generation examples](https://www.youtube.com/watch?v=WDTnZgSreL4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +2. [bilingual programming education in java and python](https://www.youtube.com/watch?v=eEHBjP06WSI&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +3. [biggest programmer salaries 2018](https://www.youtube.com/watch?v=X2bUUkWC7dE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +4. [python extract text from image or pdf](https://www.youtube.com/watch?v=PK-GvWWQ03g&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +5. [Python read validate and import CSV JSON file to MySQL](https://www.youtube.com/watch?v=WbW0rHCX2UU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +6. [python regex match date](https://www.youtube.com/watch?v=o8Je7hPgsdU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +7. [python regex cheat sheet with examples](https://www.youtube.com/watch?v=o_CSmob64uU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +8. [python string methods tutorial](https://www.youtube.com/watch?v=7yuPVq9DtV0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +9. [python shuffle list](https://www.youtube.com/watch?v=WFRBxz6AeZI&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +10. [Easy install of Python and PyCharm on Windows](https://www.youtube.com/watch?v=cDOlBRzHRI0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +11. [learn python for beginners complete tutorial 2018](https://www.youtube.com/watch?v=hnc3bGtYQsQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +12. [think python chaper 2](https://www.youtube.com/watch?v=A6EIl677ntQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +13. [Python/Java bad and good code comments examples](https://www.youtube.com/watch?v=SRCToEkq7to&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +14. [intellij pycharm surround string quote](https://www.youtube.com/watch?v=AgRHEGB8Urs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +15. [Top Five Most Annoying Programming Mistakes For Beginners with Python](https://www.youtube.com/watch?v=JToPoYip-C4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +16. [No Python Interpreter Configured For The Module - PyCharm/IntelliJ](https://www.youtube.com/watch?v=mkKDI6y2kyE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +17. [python split string into list examples](https://www.youtube.com/watch?v=T8EfomTlcfA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +18. [How to migrate/update virtualenv from Python 3.5 to 3.6](https://www.youtube.com/watch?v=cFTB5EJUxzw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +19. [Python String Remove Last n Characters](https://www.youtube.com/watch?v=hZHfdOKFlAw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +20. [Python Pandas 7 examples of filters and lambda apply](https://www.youtube.com/watch?v=7nYkJctgSSA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +21. [The simplest way to run python headless test with Chrome on Ubuntu](https://www.youtube.com/watch?v=BdppFIT_lIs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +22. [Python 3 Simple Examples get current folder and go to parent](https://www.youtube.com/watch?v=tQ_9a6UhUQs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +23. [python join/merge list two and more lists](https://www.youtube.com/watch?v=-zcJ4uB7XUo&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +24. [Easy way to convert dictionary to SQL insert with Python](https://www.youtube.com/watch?v=hUXGQwTSfMs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +25. [Python 3 detect and prevent TypeError-s](https://www.youtube.com/watch?v=DJd0JYaVkqA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +26. [The right way to declare multiple variables in Python](https://www.youtube.com/watch?v=8OoLg39nNlo&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +27. [Python uninstall a module installed with pip install and virtual envirornment](https://www.youtube.com/watch?v=03ahRfkfwME&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +28. [python performance profiling in pycharm](https://www.youtube.com/watch?v=EZ-im7m8630&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +29. [Python Cumulative Sum per Group with Pandas](https://www.youtube.com/watch?v=1tCbvYv_ibw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +30. [PyCharm - Breakpoints, Favorites, TODOs simple examples](https://www.youtube.com/watch?v=_fNZLrz97kg&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +31. [Python 3 simple ways to list files and folders](https://www.youtube.com/watch?v=oJdubyyJNIQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +32. [Python 3 elegant way to find most/less common element in a list](https://www.youtube.com/watch?v=P4LonC3puS4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +33. [clock angle problem final](https://www.youtube.com/watch?v=eIRhXharV7k&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +34. [Python 3 List Comprehension Tutorial for beginners](https://www.youtube.com/watch?v=DmSephyJNtQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +35. [python 3 how to remove white spaces](https://www.youtube.com/watch?v=0k0fvqikaoE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +36. [Pandas Tutorial : How to split dataframe by string or date](https://www.youtube.com/watch?v=7sgDvC4k6Xg&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +37. [improve your programming skills with fun](https://www.youtube.com/watch?v=uoAV7651Op0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +38. [pandas dataframe search for string in all columns filter regex](https://www.youtube.com/watch?v=vbHFIALhSWE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +39. [Pandas is column part of another column in the same row of dataframe](https://www.youtube.com/watch?v=duOHHDqI40c&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +40. [Easily extract tables from websites with pandas and python](https://www.youtube.com/watch?v=OXA_ZD1gR6A&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +41. [Easily extract information from excel with Python and Pandas](https://www.youtube.com/watch?v=hJMH_1o8eU0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +42. [Python asterisk argument or What is the usage of * asterisk in Python](https://www.youtube.com/watch?v=JBm8iptLnuA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +43. [Easy Image validation with Python - valid image, blank or pattern](https://www.youtube.com/watch?v=HMB4zrP_-HY&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +44. [Pandas DataFrame generate n-level hierarchical JSON](https://www.youtube.com/watch?v=lCcE-0bykRU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +45. [Python group or sort list of lists by common element](https://www.youtube.com/watch?v=zVQJQxpedm8&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +46. [Think Python: Chapter 3 Functions 3.2](https://www.youtube.com/watch?v=Ol3Dwucax9U&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +47. [Questions and Answers 1 Improve OCR and tabula range](https://www.youtube.com/watch?v=nrF_Rgh88no&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) +48. [Map the headers to a column with pandas?](https://www.youtube.com/watch?v=3g6KG_8zq0E&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) From 6eb5dcf576d0f95e4f50955cac45a6df94e87f08 Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Sat, 27 Jul 2019 19:47:15 +0300 Subject: [PATCH 33/76] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 9cd91ed..607f136 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # python Jupyter notebooks and datasets for the interesting pandas/python/data science video series. -# Who is this repo? +# Who is this repo for? For people who are interested in data science, data analysis and finding interesting relation for data. This repository is related to site: https://blog.softhints.com/tag/pandas/ where you can find more interesting videos. The youtube channel is: From db00ee9027eef930797829c15427ec736220afd3 Mon Sep 17 00:00:00 2001 From: softhints Date: Thu, 31 Oct 2019 09:19:30 +0200 Subject: [PATCH 34/76] Python Pandas extract URL or date by regex --- ...das extract url or dates from column.ipynb | 786 ++++++++++++++++++ 1 file changed, 786 insertions(+) create mode 100644 notebooks/pandas/Pandas extract url or dates from column.ipynb diff --git a/notebooks/pandas/Pandas extract url or dates from column.ipynb b/notebooks/pandas/Pandas extract url or dates from column.ipynb new file mode 100644 index 0000000..fe659f2 --- /dev/null +++ b/notebooks/pandas/Pandas extract url or dates from column.ipynb @@ -0,0 +1,786 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Python Pandas extract URL or date by regex" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "\n", + "# Reading the CSV file as it is\n", + "result = pd.read_csv('../csv/url_dates.csv') " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "pd.set_option('display.max_columns', None) # or 1000\n", + "pd.set_option('display.max_rows', None) # or 1000\n", + "pd.set_option('display.max_colwidth', -1) # or 199" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
log
02019-10-28 19:56:03 DEMO <GET https://www.wikipedia.org/> (The Free Encyclopedia) 2019-10-29 9:06:03
12019-10-29 19:56:03 DEMO <GET https://en.wikipedia.org/wiki/Main_Page> (5,962,233 articles in English) 2019-10-31 11:16:43
22019-10-29 19:56:03 DEMO <GET https://it.wikipedia.org/wiki/Pagina_principale> (1 561 730 voci in italiano) 2019-10-30 21:15:23
32019-10-30 19:56:03 DEMO <GET https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal> (1 014 783 artigos em português) 2019-10-30 20:26:35
\n", + "
" + ], + "text/plain": [ + " log\n", + "0 2019-10-28 19:56:03 DEMO (The Free Encyclopedia) 2019-10-29 9:06:03 \n", + "1 2019-10-29 19:56:03 DEMO (5,962,233 articles in English) 2019-10-31 11:16:43 \n", + "2 2019-10-29 19:56:03 DEMO (1 561 730 voci in italiano) 2019-10-30 21:15:23 \n", + "3 2019-10-30 19:56:03 DEMO (1 014 783 artigos em português) 2019-10-30 20:26:35" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Checking sample data\n", + "result.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# URL extraction from Dataframe" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# extract urls by matching protocol - https and end >\n", + "# first part is a matching group while the ending is a non matching group\n", + "result['url'] = result.log.str.extract(r'(https.*)(?:>)').head()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
logurl
22019-10-29 19:56:03 DEMO <GET https://it.wikipedia.org/wiki/Pagina_principale> (1 561 730 voci in italiano) 2019-10-30 21:15:23https://it.wikipedia.org/wiki/Pagina_principale
\n", + "
" + ], + "text/plain": [ + " log \\\n", + "2 2019-10-29 19:56:03 DEMO (1 561 730 voci in italiano) 2019-10-30 21:15:23 \n", + "\n", + " url \n", + "2 https://it.wikipedia.org/wiki/Pagina_principale " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# filtering results if needed\n", + "result[result['url'].str.contains('it.wikipedia.org')]" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# extract urls by matching protocol - https and end >\n", + "# first part is a matching group while the ending is a non matching group\n", + "result['url'] = result.log.str.extract(r'(https.*)(?:>)').head()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0
0https://www.wikipedia.org/>
1https://en.wikipedia.org/wiki/Main_Page>
2https://it.wikipedia.org/wiki/Pagina_principale>
3https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal>
\n", + "
" + ], + "text/plain": [ + " 0\n", + "0 https://www.wikipedia.org/> \n", + "1 https://en.wikipedia.org/wiki/Main_Page> \n", + "2 https://it.wikipedia.org/wiki/Pagina_principale> \n", + "3 https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal>" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# examples\n", + "\n", + "result.log.str.extract(r'(https?:\\/\\/(?:www\\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\\.[^\\s]{2,}|www\\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\\.[^\\s]{2,}|https?:\\/\\/(?:www\\.|(?!www))[a-zA-Z0-9]+\\.[^\\s]{2,}|www\\.[a-zA-Z0-9]+\\.[^\\s]{2,})').head()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
012345
0httpsNaNwww.wikipedia.org/>NaNNaNNaN
1httpsNaNen.wikipedia.org/wiki/Main_Page>NaNNaNNaN
2httpsNaNit.wikipedia.org/wiki/Pagina_principale>NaNNaNNaN
3httpsNaNpt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal>NaNNaNNaN
\n", + "
" + ], + "text/plain": [ + " 0 1 2 \\\n", + "0 https NaN www.wikipedia.org/> \n", + "1 https NaN en.wikipedia.org/wiki/Main_Page> \n", + "2 https NaN it.wikipedia.org/wiki/Pagina_principale> \n", + "3 https NaN pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal> \n", + "\n", + " 3 4 5 \n", + "0 NaN NaN NaN \n", + "1 NaN NaN NaN \n", + "2 NaN NaN NaN \n", + "3 NaN NaN NaN " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# examples\n", + "result.log.str.extract(r'(ftp|http|https):\\/\\/(\\w+:{0,1}\\w*@)?(\\S+)(:[0-9]+)?(\\/|\\/([\\w#!:.?+=&%@!\\-\\/]))?').head()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Date extraction from Dataframe" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "# extract single date\n", + "result['date'] = result.log.str.extract(r'(\\d{4}-\\d{2}-\\d{2})')" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2019-10-28\n", + "1 2019-10-29\n", + "2 2019-10-29\n", + "3 2019-10-30\n", + "Name: date, dtype: object" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result['date']" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0
match
002019-10-28
12019-10-29
102019-10-29
12019-10-31
202019-10-29
12019-10-30
302019-10-30
12019-10-30
\n", + "
" + ], + "text/plain": [ + " 0\n", + " match \n", + "0 0 2019-10-28\n", + " 1 2019-10-29\n", + "1 0 2019-10-29\n", + " 1 2019-10-31\n", + "2 0 2019-10-29\n", + " 1 2019-10-30\n", + "3 0 2019-10-30\n", + " 1 2019-10-30" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# extract multiple dates\n", + "result.log.str.extractall(r'(\\d{4}-\\d{2}-\\d{2})')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0
match01
02019-10-282019-10-29
12019-10-292019-10-31
22019-10-292019-10-30
32019-10-302019-10-30
\n", + "
" + ], + "text/plain": [ + " 0 \n", + "match 0 1\n", + "0 2019-10-28 2019-10-29\n", + "1 2019-10-29 2019-10-31\n", + "2 2019-10-29 2019-10-30\n", + "3 2019-10-30 2019-10-30" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# unstack the multiindex\n", + "result.log.str.extractall(r'(\\d{4}-\\d{2}-\\d{2})').unstack()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "# extract datetime\n", + "result['datetime'] = result.log.str.extract(r'(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2})')" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2019-10-28 19:56:03\n", + "1 2019-10-29 19:56:03\n", + "2 2019-10-29 19:56:03\n", + "3 2019-10-30 19:56:03\n", + "Name: datetime, dtype: object" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result['datetime']" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "# match datetime extract only date\n", + "result['date'] = result.log.str.extract(r'(\\d{4}-\\d{2}-\\d{2}) (?:\\d{2}-\\d{2}-\\d{2})')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 NaN\n", + "1 NaN\n", + "2 NaN\n", + "3 NaN\n", + "Name: date, dtype: object" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result['date']" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "# match datetime extract only date\n", + "result[['date', 'time']] = result.log.str.extract(r'(\\d{4}-\\d{2}-\\d{2}) (\\d{2}:\\d{2}:\\d{2})')" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
datetime
02019-10-2819:56:03
12019-10-2919:56:03
22019-10-2919:56:03
32019-10-3019:56:03
\n", + "
" + ], + "text/plain": [ + " date time\n", + "0 2019-10-28 19:56:03\n", + "1 2019-10-29 19:56:03\n", + "2 2019-10-29 19:56:03\n", + "3 2019-10-30 19:56:03" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result[['date', 'time']]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Split URLs" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "result['url_split'] = 'https' + result.log.str.split('https', expand=True)[1].str.split('>', expand=True)[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 https://www.wikipedia.org/ \n", + "1 https://en.wikipedia.org/wiki/Main_Page \n", + "2 https://it.wikipedia.org/wiki/Pagina_principale \n", + "3 https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal\n", + "Name: url_split, dtype: object" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result['url_split']" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 1b67aa7f24c2da919231b4fab7e10ff15bc274da Mon Sep 17 00:00:00 2001 From: softhints Date: Thu, 31 Oct 2019 09:22:14 +0200 Subject: [PATCH 35/76] Pandas_extract_url_or_dates_from_column --- ...column.ipynb => Pandas_extract_url_or_dates_from_column.ipynb} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename notebooks/pandas/{Pandas extract url or dates from column.ipynb => Pandas_extract_url_or_dates_from_column.ipynb} (100%) diff --git a/notebooks/pandas/Pandas extract url or dates from column.ipynb b/notebooks/pandas/Pandas_extract_url_or_dates_from_column.ipynb similarity index 100% rename from notebooks/pandas/Pandas extract url or dates from column.ipynb rename to notebooks/pandas/Pandas_extract_url_or_dates_from_column.ipynb From 40826abf32284a5f67a7f719ceadfb31f0a3f18f Mon Sep 17 00:00:00 2001 From: softhints Date: Fri, 1 Nov 2019 09:16:24 +0200 Subject: [PATCH 36/76] Pandas_compare_columns_in_two_Dataframes --- ...as_compare_columns_in_two_Dataframes.ipynb | 894 ++++++++++++++++++ 1 file changed, 894 insertions(+) create mode 100644 notebooks/pandas/Pandas_compare_columns_in_two_Dataframes.ipynb diff --git a/notebooks/pandas/Pandas_compare_columns_in_two_Dataframes.ipynb b/notebooks/pandas/Pandas_compare_columns_in_two_Dataframes.ipynb new file mode 100644 index 0000000..b060b47 --- /dev/null +++ b/notebooks/pandas/Pandas_compare_columns_in_two_Dataframes.ipynb @@ -0,0 +1,894 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df1 = pd.read_csv('../csv/file1.csv',sep=\"\\s+\")\n", + "df2 = pd.read_csv('../csv/file2.csv',sep=\"\\s+\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalue
0Mikea+98
1Jerya-144
2Tomyb108
\n", + "
" + ], + "text/plain": [ + " name type value\n", + "0 Mike a+ 98\n", + "1 Jery a- 144\n", + "2 Tomy b 108" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
typelowhigh
0a+7897
1a-108143
2b108150
\n", + "
" + ], + "text/plain": [ + " type low high\n", + "0 a+ 78 97\n", + "1 a- 108 143\n", + "2 b 108 150" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Similar sized dataframes" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "df1['low_value'] = np.where(df1.type == df2.type, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 True\n", + "1 True\n", + "2 True\n", + "Name: low_value, dtype: object" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1['low_value']" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# compare using np.where whether values from first dataframe has match in the column of the second\n", + "import numpy as np\n", + "df1['low_high'] = np.where(df1.value < df2.high, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 False\n", + "1 False\n", + "2 True\n", + "Name: low_high, dtype: object" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1['low_high']" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "# Compare one column from first against two from second dataframe\n", + "df1['low_high_value'] = np.where((df1.value >= df2.low) & (df1.value <= df2.high), 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 False\n", + "1 False\n", + "2 True\n", + "Name: low_high_value, dtype: object" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1['low_high_value']" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['False', 'False', 'True'], dtype='\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalue0
0Mikea+98False
1Jerya-144False
2Tomyb108True
\n", + "" + ], + "text/plain": [ + " name type value 0\n", + "0 Mike a+ 98 False\n", + "1 Jery a- 144 False\n", + "2 Tomy b 108 True" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# compare data as Boolean Series and join it the result to first dataframe\n", + "df3 = [(df2.type.isin(df1.type)) & (df1.value.between(df2.low,df2.high,inclusive=True))]\n", + "df1.join(df3)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "# compare data and assign it as new column to first data frame\n", + "df1['enh1'] = pd.Series((df2.type.isin(df1.type)) & (df1.value >= df2.low) & (df1.value <= df2.high))" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalueenh1
0Mikea+98False
1Jerya-144False
2Tomyb108True
\n", + "
" + ], + "text/plain": [ + " name type value enh1\n", + "0 Mike a+ 98 False\n", + "1 Jery a- 144 False\n", + "2 Tomy b 108 True" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "# compare with 3 conditions and or clause. You can use any valid python code\n", + "df1['enh2'] = pd.Series((df2.type.isin(df1.type)) & (df1.value != df2.low) | (df1.value + 1 == df2.high))" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalueenh1enh2
0Mikea+98FalseTrue
1Jerya-144FalseTrue
2Tomyb108TrueFalse
\n", + "
" + ], + "text/plain": [ + " name type value enh1 enh2\n", + "0 Mike a+ 98 False True\n", + "1 Jery a- 144 False True\n", + "2 Tomy b 108 True False" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Different sized dataframes" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "# add new row for dataframe 2\n", + "df2 = df2.append({'type':'0', 'low':143, 'high':108}, ignore_index=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "merged = df1.merge(df2,how='outer',left_on=['type'],right_on=[\"type\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalueenh1enh2lowhigh
0Mikea+98.0FalseTrue7897
1Jerya-144.0FalseTrue108143
2Tomyb108.0TrueFalse108150
3NaN0NaNNaNNaN143108
\n", + "
" + ], + "text/plain": [ + " name type value enh1 enh2 low high\n", + "0 Mike a+ 98.0 False True 78 97\n", + "1 Jery a- 144.0 False True 108 143\n", + "2 Tomy b 108.0 True False 108 150\n", + "3 NaN 0 NaN NaN NaN 143 108" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "merged" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypevalueenh1enh2lowhigh
2Tomyb108.0TrueFalse108150
\n", + "
" + ], + "text/plain": [ + " name type value enh1 enh2 low high\n", + "2 Tomy b 108.0 True False 108 150" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "merged[(merged.value >= merged.low) & (merged.value <= merged.high)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Error ValueError: Can only compare identically-labeled Series objects" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "ename": "ValueError", + "evalue": "Can only compare identically-labeled Series objects", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# demo of error - ValueError: Can only compare identically-labeled Series objects\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mdf1\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'low_high'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwhere\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mdf2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhigh\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'True'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'False'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/ops/__init__.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(self, other, axis)\u001b[0m\n\u001b[1;32m 1140\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1141\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mABCSeries\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_indexed_same\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1142\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Can only compare identically-labeled \"\u001b[0m \u001b[0;34m\"Series objects\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1143\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1144\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mis_categorical_dtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mValueError\u001b[0m: Can only compare identically-labeled Series objects" + ] + } + ], + "source": [ + "# demo of error - ValueError: Can only compare identically-labeled Series objects \n", + "import numpy as np\n", + "df1['low_high'] = np.where(df1.value < df2.high, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "df2.drop(3, inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "# demo of error - Now is working because of equal rows\n", + "import numpy as np\n", + "df1['low_high'] = np.where(df1.value < df2.high, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "# how to cause it on first dataframes\n", + "df1.set_index([pd.Index([1, 2, 3])], inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "ename": "ValueError", + "evalue": "Can only compare identically-labeled Series objects", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# demo of error - ValueError: Can only compare identically-labeled Series objects because of mismatching indexes\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mdf1\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'low_high'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwhere\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mdf2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhigh\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'True'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'False'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/ops/__init__.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(self, other, axis)\u001b[0m\n\u001b[1;32m 1140\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1141\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mABCSeries\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_indexed_same\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1142\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Can only compare identically-labeled \"\u001b[0m \u001b[0;34m\"Series objects\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1143\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1144\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mis_categorical_dtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mValueError\u001b[0m: Can only compare identically-labeled Series objects" + ] + } + ], + "source": [ + "# demo of error - ValueError: Can only compare identically-labeled Series objects because of mismatching indexes\n", + "import numpy as np\n", + "df1['low_high'] = np.where(df1.value < df2.high, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "# possible solution for - ValueError: Can only compare identically-labeled Series objects\n", + "df1.sort_index(inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "# possible solution for - ValueError: Can only compare identically-labeled Series objects\n", + "df1.reset_index(inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "# demo of error - ValueError: Can only compare identically-labeled Series objects\n", + "import numpy as np\n", + "df1['low_high'] = np.where(df1.value < df2.high, 'True', 'False')" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 False\n", + "1 False\n", + "2 True\n", + "Name: low_high, dtype: object" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1['low_high']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From a2c8a0e16aea38282fd95b1ee7ca3ce3b588da09 Mon Sep 17 00:00:00 2001 From: softhints Date: Sun, 3 Nov 2019 11:27:58 +0200 Subject: [PATCH 37/76] Python How to Wrap Long Lines in Text file --- scripts/1.python_wrap_lines.py | 42 ++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 scripts/1.python_wrap_lines.py diff --git a/scripts/1.python_wrap_lines.py b/scripts/1.python_wrap_lines.py new file mode 100644 index 0000000..2db7976 --- /dev/null +++ b/scripts/1.python_wrap_lines.py @@ -0,0 +1,42 @@ +import os + +size = 80 +file = 'budo' +folder = os.path.expanduser('~/Documents/Fortunes/') + +# Read and store the entire file line by line +with open(f'{folder}{file}.txt') as reader: + provers = reader.readlines() + +# wrap/collate lines by separators [",", " ", "."] +def collate(text, size): + new_text = [] + split_char = 1 + while split_char > 0: + comma = str.find(text, ',', size) + space = str.find(text, ' ', size) + dot = str.find(text, '.', size) + + split_char = min(max(comma, dot), max(comma, space), max(dot, space)) + + if text[:split_char]: + new_text.append(text[:split_char]) + text = text[split_char+1:].replace('\n', "") + + return new_text + +# write collated information to new(same) file +with open(f'{folder}{file}.txt', 'w') as writer: + for wisdom in provers: + if len(wisdom) > size: + collated = collate(wisdom, size) + for short in collated: + writer.write(short) + writer.write('\n') + else: + writer.write(wisdom) + +# Executing Shell Commands with Python +import os +myCmd = f'strfile -c % {folder}{file}.txt {folder}{file}.txt.dat' +os.system(myCmd) \ No newline at end of file From 1e5744d60afef23b5a04fabc95c52b76bc2aac47 Mon Sep 17 00:00:00 2001 From: softhints Date: Thu, 2 Jan 2020 11:10:56 +0200 Subject: [PATCH 38/76] How to merge multiple CSV files with Python --- notebooks/csv/data_201901.csv | 3 + notebooks/csv/data_201902.csv | 3 + notebooks/csv/data_202001.csv | 3 + notebooks/csv/data_202002.csv | 3 + ...merge_multiple_CSV_files_with_Python.ipynb | 342 ++++++++++++++++++ 5 files changed, 354 insertions(+) create mode 100644 notebooks/csv/data_201901.csv create mode 100644 notebooks/csv/data_201902.csv create mode 100644 notebooks/csv/data_202001.csv create mode 100644 notebooks/csv/data_202002.csv create mode 100644 notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb diff --git a/notebooks/csv/data_201901.csv b/notebooks/csv/data_201901.csv new file mode 100644 index 0000000..ec916f4 --- /dev/null +++ b/notebooks/csv/data_201901.csv @@ -0,0 +1,3 @@ +col1,col2,col3 +A,B,1 +AA,BB,2 \ No newline at end of file diff --git a/notebooks/csv/data_201902.csv b/notebooks/csv/data_201902.csv new file mode 100644 index 0000000..223cfe2 --- /dev/null +++ b/notebooks/csv/data_201902.csv @@ -0,0 +1,3 @@ +col1,col2,col3 +C,D,3 +CC,DD,4 \ No newline at end of file diff --git a/notebooks/csv/data_202001.csv b/notebooks/csv/data_202001.csv new file mode 100644 index 0000000..52bdb1d --- /dev/null +++ b/notebooks/csv/data_202001.csv @@ -0,0 +1,3 @@ +col1,col2,col3,col4 +E,F,5,e5 +EE,FF,6,ee6 \ No newline at end of file diff --git a/notebooks/csv/data_202002.csv b/notebooks/csv/data_202002.csv new file mode 100644 index 0000000..56194e0 --- /dev/null +++ b/notebooks/csv/data_202002.csv @@ -0,0 +1,3 @@ +col1,col2,col3,col5 +H,J,7,77 +HH,JJ,8,88 \ No newline at end of file diff --git a/notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb b/notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb new file mode 100644 index 0000000..663a374 --- /dev/null +++ b/notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb @@ -0,0 +1,342 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# How to merge multiple CSV files with Python\n", + "Python convert normal JSON to JSON separated lines 3 examples\n", + "\n", + "* Steps to merge multiple CSV(identical) files with Python\n", + "* Steps to merge multiple CSV(identical) files with Python with trace\n", + "* Combine multiple CSV files when the columns are different\n", + "* Bonus: Merge multiple files with Windows/Linux" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Steps to merge multiple CSV(identical) files with Python" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os, glob\n", + "import pandas as pd\n", + "\n", + "path = \"../../csv/\"\n", + "\n", + "all_files = glob.glob(os.path.join(path, \"data_2019*.csv\"))\n", + "\n", + "all_csv = (pd.read_csv(f, sep=',') for f in all_files)\n", + "df_merged = pd.concat(all_csv, ignore_index=True)\n", + "df_merged.to_csv( \"merged.csv\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Steps to merge multiple CSV(identical) files with Python with trace" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3file
0CD3data_201902.csv
1CCDD4data_201902.csv
2AB1data_201901.csv
3AABB2data_201901.csv
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3 file\n", + "0 C D 3 data_201902.csv\n", + "1 CC DD 4 data_201902.csv\n", + "2 A B 1 data_201901.csv\n", + "3 AA BB 2 data_201901.csv" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import os, glob\n", + "import pandas as pd\n", + "\n", + "path = \"../../csv/\"\n", + "\n", + "all_files = glob.glob(os.path.join(path, \"data_2019*.csv\"))\n", + "\n", + "all_df = []\n", + "for f in all_files:\n", + " df = pd.read_csv(f, sep=',')\n", + " df['file'] = f.split('/')[-1]\n", + " all_df.append(df)\n", + " \n", + "merged_df = pd.concat(all_df, ignore_index=True)\n", + "merged_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Combine multiple CSV files when the columns are different" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3col4col5file
0EF5e5NaNdata_202001.csv
1EEFF6ee6NaNdata_202001.csv
2HJ7NaN77.0data_202002.csv
3HHJJ8NaN88.0data_202002.csv
4CD3NaNNaNdata_201902.csv
5CCDD4NaNNaNdata_201902.csv
6AB1NaNNaNdata_201901.csv
7AABB2NaNNaNdata_201901.csv
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3 col4 col5 file\n", + "0 E F 5 e5 NaN data_202001.csv\n", + "1 EE FF 6 ee6 NaN data_202001.csv\n", + "2 H J 7 NaN 77.0 data_202002.csv\n", + "3 HH JJ 8 NaN 88.0 data_202002.csv\n", + "4 C D 3 NaN NaN data_201902.csv\n", + "5 CC DD 4 NaN NaN data_201902.csv\n", + "6 A B 1 NaN NaN data_201901.csv\n", + "7 AA BB 2 NaN NaN data_201901.csv" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import os, glob\n", + "import pandas as pd\n", + "\n", + "path = \"../../csv/\"\n", + "\n", + "all_files = glob.glob(os.path.join(path, \"data_*.csv\"))\n", + "\n", + "\n", + "all_df = []\n", + "for f in all_files:\n", + " df = pd.read_csv(f, sep=',')\n", + " df['file'] = f.split('/')[-1]\n", + " all_df.append(df)\n", + " \n", + "merged_df = pd.concat(all_df, ignore_index=True, sort=True)\n", + "merged_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Bonus: Merge multiple files with Windows/Linux\n", + "\n", + "Linux\n", + "\n", + "`sed 1d data_*.csv > merged.csv`\n", + "\n", + "Windows\n", + "\n", + "`C:\\> copy data_*.csv merged.csv `" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 6565afa326996e50ad3250d4f5a229f0107eb16c Mon Sep 17 00:00:00 2001 From: softhints Date: Fri, 3 Jan 2020 10:32:08 +0200 Subject: [PATCH 39/76] Pandas_compare_columns_in_two_Dataframes --- ...merge_multiple_CSV_files_with_Python.ipynb | 332 +++++++++++++++++- 1 file changed, 327 insertions(+), 5 deletions(-) diff --git a/notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb b/notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb index 663a374..7651657 100644 --- a/notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb +++ b/notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb @@ -13,6 +13,245 @@ "* Bonus: Merge multiple files with Windows/Linux" ] }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['../../csv/data_202001.csv',\n", + " '../../csv/data_202002.csv',\n", + " '../../csv/data_201902.csv',\n", + " '../../csv/data_201901.csv']" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3col4
0EF5e5
1EEFF6ee6
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3 col4\n", + "0 E F 5 e5\n", + "1 EE FF 6 ee6" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3col5
0HJ777
1HHJJ888
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3 col5\n", + "0 H J 7 77\n", + "1 HH JJ 8 88" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
0CD3
1CCDD4
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "0 C D 3\n", + "1 CC DD 4" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
0AB1
1AABB2
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "0 A B 1\n", + "1 AA BB 2" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "all_files = glob.glob(os.path.join(path, \"data_*.csv\"))\n", + "display(all_files)\n", + "for f in all_files:\n", + " display(pd.read_csv(f, sep=','))" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -22,7 +261,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ @@ -30,6 +269,7 @@ "import pandas as pd\n", "\n", "path = \"../../csv/\"\n", + "#path = \"/home/user/data\"\n", "\n", "all_files = glob.glob(os.path.join(path, \"data_2019*.csv\"))\n", "\n", @@ -38,6 +278,88 @@ "df_merged.to_csv( \"merged.csv\")" ] }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0col1col2col3
00CD3
11CCDD4
22AB1
33AABB2
\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 col1 col2 col3\n", + "0 0 C D 3\n", + "1 1 CC DD 4\n", + "2 2 A B 1\n", + "3 3 AA BB 2" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.read_csv('merged.csv')" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -47,7 +369,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -118,7 +440,7 @@ "3 AA BB 2 data_201901.csv" ] }, - "execution_count": 2, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } @@ -150,7 +472,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -271,7 +593,7 @@ "7 AA BB 2 NaN NaN data_201901.csv" ] }, - "execution_count": 7, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } From ef0af0e93d35968ec52c16130049a740fb23add2 Mon Sep 17 00:00:00 2001 From: softhints Date: Fri, 3 Jan 2020 10:55:06 +0200 Subject: [PATCH 40/76] Pandas : Select rows between two dates - DataFrame or CSV file --- ...en_two_dates_-_DataFrame_or_CSV_file.ipynb | 696 ++++++++++++++++++ 1 file changed, 696 insertions(+) create mode 100644 notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb diff --git a/notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb b/notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb new file mode 100644 index 0000000..c2b755d --- /dev/null +++ b/notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb @@ -0,0 +1,696 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas : Select rows between two dates - DataFrame or CSV file" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Resources\n", + "\n", + "* [pandas.to_datetime](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html)\n", + "* [pandas.DataFrame.between_time](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.between_time.html)\n", + "* [pandas.DataFrame.loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use cases\n", + "\n", + "* Pandas: Verify columns containing dates\n", + "* Convert string to datetime in DataFrame\n", + "* Select rows between two dates\n", + " * 1. Select rows based on dates with loc\n", + " * 2. Select rows based on dates without loc\n", + " * 3. Select rows between two times\n", + " * 4. Select rows based on dates without loc\n", + " * 5. Use mask to mark the records\n", + " * 6. Select records from last month/30 days " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Import Pandas and read data" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
loading_datetimepagestitledatetime_col
02019-10-28 19:56:03main<GET https://www.wikipedia.org/> (The Free En...2019-10-29 9:06:03
12019-10-29 19:56:03english<GET https://en.wikipedia.org/wiki/Main_Page>...2019-10-31 11:16:43
22019-10-29 19:56:03italiano<GET https://it.wikipedia.org/wiki/Pagina_pri...2019-10-30 21:15:23
32019-10-30 19:56:03português<GET https://pt.wikipedia.org/wiki/Wikip%C3%A...2019-10-30 20:26:35
\n", + "
" + ], + "text/plain": [ + " loading_datetime pages \\\n", + "0 2019-10-28 19:56:03 main \n", + "1 2019-10-29 19:56:03 english \n", + "2 2019-10-29 19:56:03 italiano \n", + "3 2019-10-30 19:56:03 português \n", + "\n", + " title datetime_col \n", + "0 (The Free En... 2019-10-29 9:06:03 \n", + "1 ... 2019-10-31 11:16:43 \n", + "2 \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
loading_datetimepagestitledatetime_col
12019-10-29 19:56:03english<GET https://en.wikipedia.org/wiki/Main_Page>...2019-10-31 11:16:43+00:00
22019-10-29 19:56:03italiano<GET https://it.wikipedia.org/wiki/Pagina_pri...2019-10-30 21:15:23+00:00
\n", + "" + ], + "text/plain": [ + " loading_datetime pages \\\n", + "1 2019-10-29 19:56:03 english \n", + "2 2019-10-29 19:56:03 italiano \n", + "\n", + " title datetime_col \n", + "1 ... 2019-10-31 11:16:43+00:00 \n", + "2 start_date) & (df['datetime_col'] < end_date)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 2. Select rows based on dates without loc" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
loading_datetimepagestitledatetime_col
12019-10-29 19:56:03english<GET https://en.wikipedia.org/wiki/Main_Page>...2019-10-31 11:16:43+00:00
22019-10-29 19:56:03italiano<GET https://it.wikipedia.org/wiki/Pagina_pri...2019-10-30 21:15:23+00:00
\n", + "
" + ], + "text/plain": [ + " loading_datetime pages \\\n", + "1 2019-10-29 19:56:03 english \n", + "2 2019-10-29 19:56:03 italiano \n", + "\n", + " title datetime_col \n", + "1 ... 2019-10-31 11:16:43+00:00 \n", + "2 \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
loading_datetimepagestitle
datetime_col
2019-10-30 21:15:23+00:002019-10-29 19:56:03italiano<GET https://it.wikipedia.org/wiki/Pagina_pri...
\n", + "" + ], + "text/plain": [ + " loading_datetime pages \\\n", + "datetime_col \n", + "2019-10-30 21:15:23+00:00 2019-10-29 19:56:03 italiano \n", + "\n", + " title \n", + "datetime_col \n", + "2019-10-30 21:15:23+00:00 \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
loading_datetimepagestitledatetime_col
\n", + "" + ], + "text/plain": [ + "Empty DataFrame\n", + "Columns: [loading_datetime, pages, title, datetime_col]\n", + "Index: []" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[(df['datetime_col'] > '2018-12-02') & (df['datetime_col'] <= '2018-12-03 23:26:10+00:00')]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 6. Select records from last month/30 days " + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
loading_datetimepagestitledatetime_col
12019-10-29 19:56:03english<GET https://en.wikipedia.org/wiki/Main_Page>...2019-10-31 11:16:43+00:00
\n", + "
" + ], + "text/plain": [ + " loading_datetime pages \\\n", + "1 2019-10-29 19:56:03 english \n", + "\n", + " title datetime_col \n", + "1 ... 2019-10-31 11:16:43+00:00 " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df[\"datetime_col\"] >= (pd.to_datetime('11/30/2019', utc=True) - pd.Timedelta(days=30))]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 5e301b8427b7cdaf4eb8e027be660f0a4e1c03e5 Mon Sep 17 00:00:00 2001 From: softhints Date: Sun, 5 Jan 2020 12:42:56 +0200 Subject: [PATCH 41/76] Pandas_Crosstab_-_cross_tabulation_of_two_factors_examples --- ...s_tabulation_of_two_factors_examples.ipynb | 2380 +++++++++++++++++ 1 file changed, 2380 insertions(+) create mode 100644 notebooks/pandas/Pandas_Crosstab_-_cross_tabulation_of_two_factors_examples.ipynb diff --git a/notebooks/pandas/Pandas_Crosstab_-_cross_tabulation_of_two_factors_examples.ipynb b/notebooks/pandas/Pandas_Crosstab_-_cross_tabulation_of_two_factors_examples.ipynb new file mode 100644 index 0000000..dfe2ce4 --- /dev/null +++ b/notebooks/pandas/Pandas_Crosstab_-_cross_tabulation_of_two_factors_examples.ipynb @@ -0,0 +1,2380 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas : Crosstab - cross tabulation of two (or more) factors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Resources\n", + "\n", + "* [pandas.crosstab](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.crosstab.html)\n", + "* [Pivot table](https://en.wikipedia.org/wiki/Pivot_table)\n", + "* [imdb 5000 movie dataset](https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset)\n", + "\n", + "## Official Pandas doc\n", + "\n", + ">Compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.\n", + "\n", + "## Pivot Table\n", + "\n", + "> A pivot table is a table of statistics that summarizes the data of more extensive table (such as from a database, spreadsheet, or business intelligence program). This summary might include sums, averages, or other statistics, which the pivot table groups together in a meaningful way.\n", + "\n", + "> Pivot tables are a technique in data processing." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use cases\n", + "\n", + "* Data summary\n", + "* Data aggregation\n", + "* Grouping\n", + "* Quick Reports\n", + "* Data patterns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Import Pandas and read data" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Select data for the crosstab" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
2ColorSam Mendes602.0148.00.0161.0Rory Kinnear11000.0200074175.0Action|Adventure|Thriller...994.0EnglishUKPG-13245000000.02015.0393.06.82.3585000
3ColorChristopher Nolan813.0164.022000.023000.0Christian Bale27000.0448130642.0Action|Thriller...2701.0EnglishUSAPG-13250000000.02012.023000.08.52.35164000
4NaNDoug WalkerNaNNaN131.0NaNRob Walker131.0NaNDocumentary...NaNNaNNaNNaNNaNNaN12.07.1NaN0
\n", + "

5 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "2 Color Sam Mendes 602.0 148.0 \n", + "3 Color Christopher Nolan 813.0 164.0 \n", + "4 NaN Doug Walker NaN NaN \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "2 0.0 161.0 Rory Kinnear \n", + "3 22000.0 23000.0 Christian Bale \n", + "4 131.0 NaN Rob Walker \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy ... \n", + "2 11000.0 200074175.0 Action|Adventure|Thriller ... \n", + "3 27000.0 448130642.0 Action|Thriller ... \n", + "4 131.0 NaN Documentary ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "0 3054.0 English USA PG-13 237000000.0 \n", + "1 1238.0 English USA PG-13 300000000.0 \n", + "2 994.0 English UK PG-13 245000000.0 \n", + "3 2701.0 English USA PG-13 250000000.0 \n", + "4 NaN NaN NaN NaN NaN \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "0 2009.0 936.0 7.9 1.78 \n", + "1 2007.0 5000.0 7.1 2.35 \n", + "2 2015.0 393.0 6.8 2.35 \n", + "3 2012.0 23000.0 8.5 2.35 \n", + "4 NaN 12.0 7.1 NaN \n", + "\n", + " movie_facebook_likes \n", + "0 33000 \n", + "1 0 \n", + "2 85000 \n", + "3 164000 \n", + "4 0 \n", + "\n", + "[5 rows x 28 columns]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
colorColorColorColorColorNaN
director_nameJames CameronGore VerbinskiSam MendesChristopher NolanDoug Walker
num_critic_for_reviews723302602813NaN
duration178169148164NaN
director_facebook_likes0563022000131
actor_3_facebook_likes855100016123000NaN
actor_2_nameJoel David MooreOrlando BloomRory KinnearChristian BaleRob Walker
actor_1_facebook_likes1000400001100027000131
gross7.60506e+083.09404e+082.00074e+084.48131e+08NaN
genresAction|Adventure|Fantasy|Sci-FiAction|Adventure|FantasyAction|Adventure|ThrillerAction|ThrillerDocumentary
actor_1_nameCCH PounderJohnny DeppChristoph WaltzTom HardyDoug Walker
movie_titleAvatarPirates of the Caribbean: At World's EndSpectreThe Dark Knight RisesStar Wars: Episode VII - The Force Awakens  ...
num_voted_users88620447122027586811443378
cast_total_facebook_likes48344835011700106759143
actor_3_nameWes StudiJack DavenportStephanie SigmanJoseph Gordon-LevittNaN
facenumber_in_poster00100
plot_keywordsavatar|future|marine|native|paraplegicgoddess|marriage ceremony|marriage proposal|pi...bomb|espionage|sequel|spy|terroristdeception|imprisonment|lawlessness|police offi...NaN
movie_imdb_linkhttp://www.imdb.com/title/tt0499549/?ref_=fn_t...http://www.imdb.com/title/tt0449088/?ref_=fn_t...http://www.imdb.com/title/tt2379713/?ref_=fn_t...http://www.imdb.com/title/tt1345836/?ref_=fn_t...http://www.imdb.com/title/tt5289954/?ref_=fn_t...
num_user_for_reviews305412389942701NaN
languageEnglishEnglishEnglishEnglishNaN
countryUSAUSAUKUSANaN
content_ratingPG-13PG-13PG-13PG-13NaN
budget2.37e+083e+082.45e+082.5e+08NaN
title_year2009200720152012NaN
actor_2_facebook_likes93650003932300012
imdb_score7.97.16.88.57.1
aspect_ratio1.782.352.352.35NaN
movie_facebook_likes330000850001640000
\n", + "
" + ], + "text/plain": [ + " 0 \\\n", + "color Color \n", + "director_name James Cameron \n", + "num_critic_for_reviews 723 \n", + "duration 178 \n", + "director_facebook_likes 0 \n", + "actor_3_facebook_likes 855 \n", + "actor_2_name Joel David Moore \n", + "actor_1_facebook_likes 1000 \n", + "gross 7.60506e+08 \n", + "genres Action|Adventure|Fantasy|Sci-Fi \n", + "actor_1_name CCH Pounder \n", + "movie_title Avatar  \n", + "num_voted_users 886204 \n", + "cast_total_facebook_likes 4834 \n", + "actor_3_name Wes Studi \n", + "facenumber_in_poster 0 \n", + "plot_keywords avatar|future|marine|native|paraplegic \n", + "movie_imdb_link http://www.imdb.com/title/tt0499549/?ref_=fn_t... \n", + "num_user_for_reviews 3054 \n", + "language English \n", + "country USA \n", + "content_rating PG-13 \n", + "budget 2.37e+08 \n", + "title_year 2009 \n", + "actor_2_facebook_likes 936 \n", + "imdb_score 7.9 \n", + "aspect_ratio 1.78 \n", + "movie_facebook_likes 33000 \n", + "\n", + " 1 \\\n", + "color Color \n", + "director_name Gore Verbinski \n", + "num_critic_for_reviews 302 \n", + "duration 169 \n", + "director_facebook_likes 563 \n", + "actor_3_facebook_likes 1000 \n", + "actor_2_name Orlando Bloom \n", + "actor_1_facebook_likes 40000 \n", + "gross 3.09404e+08 \n", + "genres Action|Adventure|Fantasy \n", + "actor_1_name Johnny Depp \n", + "movie_title Pirates of the Caribbean: At World's End  \n", + "num_voted_users 471220 \n", + "cast_total_facebook_likes 48350 \n", + "actor_3_name Jack Davenport \n", + "facenumber_in_poster 0 \n", + "plot_keywords goddess|marriage ceremony|marriage proposal|pi... \n", + "movie_imdb_link http://www.imdb.com/title/tt0449088/?ref_=fn_t... \n", + "num_user_for_reviews 1238 \n", + "language English \n", + "country USA \n", + "content_rating PG-13 \n", + "budget 3e+08 \n", + "title_year 2007 \n", + "actor_2_facebook_likes 5000 \n", + "imdb_score 7.1 \n", + "aspect_ratio 2.35 \n", + "movie_facebook_likes 0 \n", + "\n", + " 2 \\\n", + "color Color \n", + "director_name Sam Mendes \n", + "num_critic_for_reviews 602 \n", + "duration 148 \n", + "director_facebook_likes 0 \n", + "actor_3_facebook_likes 161 \n", + "actor_2_name Rory Kinnear \n", + "actor_1_facebook_likes 11000 \n", + "gross 2.00074e+08 \n", + "genres Action|Adventure|Thriller \n", + "actor_1_name Christoph Waltz \n", + "movie_title Spectre  \n", + "num_voted_users 275868 \n", + "cast_total_facebook_likes 11700 \n", + "actor_3_name Stephanie Sigman \n", + "facenumber_in_poster 1 \n", + "plot_keywords bomb|espionage|sequel|spy|terrorist \n", + "movie_imdb_link http://www.imdb.com/title/tt2379713/?ref_=fn_t... \n", + "num_user_for_reviews 994 \n", + "language English \n", + "country UK \n", + "content_rating PG-13 \n", + "budget 2.45e+08 \n", + "title_year 2015 \n", + "actor_2_facebook_likes 393 \n", + "imdb_score 6.8 \n", + "aspect_ratio 2.35 \n", + "movie_facebook_likes 85000 \n", + "\n", + " 3 \\\n", + "color Color \n", + "director_name Christopher Nolan \n", + "num_critic_for_reviews 813 \n", + "duration 164 \n", + "director_facebook_likes 22000 \n", + "actor_3_facebook_likes 23000 \n", + "actor_2_name Christian Bale \n", + "actor_1_facebook_likes 27000 \n", + "gross 4.48131e+08 \n", + "genres Action|Thriller \n", + "actor_1_name Tom Hardy \n", + "movie_title The Dark Knight Rises  \n", + "num_voted_users 1144337 \n", + "cast_total_facebook_likes 106759 \n", + "actor_3_name Joseph Gordon-Levitt \n", + "facenumber_in_poster 0 \n", + "plot_keywords deception|imprisonment|lawlessness|police offi... \n", + "movie_imdb_link http://www.imdb.com/title/tt1345836/?ref_=fn_t... \n", + "num_user_for_reviews 2701 \n", + "language English \n", + "country USA \n", + "content_rating PG-13 \n", + "budget 2.5e+08 \n", + "title_year 2012 \n", + "actor_2_facebook_likes 23000 \n", + "imdb_score 8.5 \n", + "aspect_ratio 2.35 \n", + "movie_facebook_likes 164000 \n", + "\n", + " 4 \n", + "color NaN \n", + "director_name Doug Walker \n", + "num_critic_for_reviews NaN \n", + "duration NaN \n", + "director_facebook_likes 131 \n", + "actor_3_facebook_likes NaN \n", + "actor_2_name Rob Walker \n", + "actor_1_facebook_likes 131 \n", + "gross NaN \n", + "genres Documentary \n", + "actor_1_name Doug Walker \n", + "movie_title Star Wars: Episode VII - The Force Awakens  ... \n", + "num_voted_users 8 \n", + "cast_total_facebook_likes 143 \n", + "actor_3_name NaN \n", + "facenumber_in_poster 0 \n", + "plot_keywords NaN \n", + "movie_imdb_link http://www.imdb.com/title/tt5289954/?ref_=fn_t... \n", + "num_user_for_reviews NaN \n", + "language NaN \n", + "country NaN \n", + "content_rating NaN \n", + "budget NaN \n", + "title_year NaN \n", + "actor_2_facebook_likes 12 \n", + "imdb_score 7.1 \n", + "aspect_ratio NaN \n", + "movie_facebook_likes 0 " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head().T" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',\n", + " 'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',\n", + " 'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',\n", + " 'movie_title', 'num_voted_users', 'cast_total_facebook_likes',\n", + " 'actor_3_name', 'facenumber_in_poster', 'plot_keywords',\n", + " 'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',\n", + " 'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',\n", + " 'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],\n", + " dtype='object')" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "df2 = df.iloc[[2, 4, 9, 12, 13, 14, 20, 23, 25, 30, 34, 50, 79], :]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 3: Create crosstab table" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_name
Baz Luhrmann10000
Brett Ratner01000
David Yates00010
Gore Verbinski00002
Jon Favreau00010
Marc Forster00010
Peter Jackson00201
Sam Mendes00020
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director_name \n", + "Baz Luhrmann 1 0 0 0 0\n", + "Brett Ratner 0 1 0 0 0\n", + "David Yates 0 0 0 1 0\n", + "Gore Verbinski 0 0 0 0 2\n", + "Jon Favreau 0 0 0 1 0\n", + "Marc Forster 0 0 0 1 0\n", + "Peter Jackson 0 0 2 0 1\n", + "Sam Mendes 0 0 0 2 0" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# simple usage\n", + "pd.crosstab(df2['director_name'], df2['country'])" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director
Baz Luhrmann10000
Brett Ratner01000
David Yates00010
Gore Verbinski00002
Jon Favreau00010
Marc Forster00010
Peter Jackson00201
Sam Mendes00020
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director \n", + "Baz Luhrmann 1 0 0 0 0\n", + "Brett Ratner 0 1 0 0 0\n", + "David Yates 0 0 0 1 0\n", + "Gore Verbinski 0 0 0 0 2\n", + "Jon Favreau 0 0 0 1 0\n", + "Marc Forster 0 0 0 1 0\n", + "Peter Jackson 0 0 2 0 1\n", + "Sam Mendes 0 0 0 2 0" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# change row and column names\n", + "pd.crosstab(df2['director_name'], df2['country'], rownames=['director'], colnames=['country'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Crosstab: normaliza or show percentage per row or total" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_name
Baz Luhrmann0.0833330.0000000.0000000.0000000.000000
Brett Ratner0.0000000.0833330.0000000.0000000.000000
David Yates0.0000000.0000000.0000000.0833330.000000
Gore Verbinski0.0000000.0000000.0000000.0000000.166667
Jon Favreau0.0000000.0000000.0000000.0833330.000000
Marc Forster0.0000000.0000000.0000000.0833330.000000
Peter Jackson0.0000000.0000000.1666670.0000000.083333
Sam Mendes0.0000000.0000000.0000000.1666670.000000
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director_name \n", + "Baz Luhrmann 0.083333 0.000000 0.000000 0.000000 0.000000\n", + "Brett Ratner 0.000000 0.083333 0.000000 0.000000 0.000000\n", + "David Yates 0.000000 0.000000 0.000000 0.083333 0.000000\n", + "Gore Verbinski 0.000000 0.000000 0.000000 0.000000 0.166667\n", + "Jon Favreau 0.000000 0.000000 0.000000 0.083333 0.000000\n", + "Marc Forster 0.000000 0.000000 0.000000 0.083333 0.000000\n", + "Peter Jackson 0.000000 0.000000 0.166667 0.000000 0.083333\n", + "Sam Mendes 0.000000 0.000000 0.000000 0.166667 0.000000" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Show percentage - global - normalize=True\n", + "pd.crosstab(df2['director_name'], df2['country'], normalize=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_name
Baz Luhrmann1.00.00.0000000.00.000000
Brett Ratner0.01.00.0000000.00.000000
David Yates0.00.00.0000001.00.000000
Gore Verbinski0.00.00.0000000.01.000000
Jon Favreau0.00.00.0000001.00.000000
Marc Forster0.00.00.0000001.00.000000
Peter Jackson0.00.00.6666670.00.333333
Sam Mendes0.00.00.0000001.00.000000
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director_name \n", + "Baz Luhrmann 1.0 0.0 0.000000 0.0 0.000000\n", + "Brett Ratner 0.0 1.0 0.000000 0.0 0.000000\n", + "David Yates 0.0 0.0 0.000000 1.0 0.000000\n", + "Gore Verbinski 0.0 0.0 0.000000 0.0 1.000000\n", + "Jon Favreau 0.0 0.0 0.000000 1.0 0.000000\n", + "Marc Forster 0.0 0.0 0.000000 1.0 0.000000\n", + "Peter Jackson 0.0 0.0 0.666667 0.0 0.333333\n", + "Sam Mendes 0.0 0.0 0.000000 1.0 0.000000" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Show percentage - per index - normalize='index'\n", + "pd.crosstab(df2['director_name'], df2['country'], normalize='index')" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSAAll
director_name
Baz Luhrmann100001
Brett Ratner010001
David Yates000101
Gore Verbinski000022
Jon Favreau000101
Marc Forster000101
Peter Jackson002013
Sam Mendes000202
All1125312
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA All\n", + "director_name \n", + "Baz Luhrmann 1 0 0 0 0 1\n", + "Brett Ratner 0 1 0 0 0 1\n", + "David Yates 0 0 0 1 0 1\n", + "Gore Verbinski 0 0 0 0 2 2\n", + "Jon Favreau 0 0 0 1 0 1\n", + "Marc Forster 0 0 0 1 0 1\n", + "Peter Jackson 0 0 2 0 1 3\n", + "Sam Mendes 0 0 0 2 0 2\n", + "All 1 1 2 5 3 12" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Show total - margins=True\n", + "pd.crosstab(df2['director_name'], df2['country'], margins=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSAAll
director_name
Baz Luhrmann0.0833330.0000000.0000000.0000000.0000000.083333
Brett Ratner0.0000000.0833330.0000000.0000000.0000000.083333
David Yates0.0000000.0000000.0000000.0833330.0000000.083333
Gore Verbinski0.0000000.0000000.0000000.0000000.1666670.166667
Jon Favreau0.0000000.0000000.0000000.0833330.0000000.083333
Marc Forster0.0000000.0000000.0000000.0833330.0000000.083333
Peter Jackson0.0000000.0000000.1666670.0000000.0833330.250000
Sam Mendes0.0000000.0000000.0000000.1666670.0000000.166667
All0.0833330.0833330.1666670.4166670.2500001.000000
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA All\n", + "director_name \n", + "Baz Luhrmann 0.083333 0.000000 0.000000 0.000000 0.000000 0.083333\n", + "Brett Ratner 0.000000 0.083333 0.000000 0.000000 0.000000 0.083333\n", + "David Yates 0.000000 0.000000 0.000000 0.083333 0.000000 0.083333\n", + "Gore Verbinski 0.000000 0.000000 0.000000 0.000000 0.166667 0.166667\n", + "Jon Favreau 0.000000 0.000000 0.000000 0.083333 0.000000 0.083333\n", + "Marc Forster 0.000000 0.000000 0.000000 0.083333 0.000000 0.083333\n", + "Peter Jackson 0.000000 0.000000 0.166667 0.000000 0.083333 0.250000\n", + "Sam Mendes 0.000000 0.000000 0.000000 0.166667 0.000000 0.166667\n", + "All 0.083333 0.083333 0.166667 0.416667 0.250000 1.000000" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Combining totals and percentage\n", + "pd.crosstab(df2['director_name'], df2['country'], margins=True, normalize=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_name
Baz Luhrmann1.0000000.0000000.0000000.0000000.000000
Brett Ratner0.0000001.0000000.0000000.0000000.000000
David Yates0.0000000.0000000.0000001.0000000.000000
Gore Verbinski0.0000000.0000000.0000000.0000001.000000
Jon Favreau0.0000000.0000000.0000001.0000000.000000
Marc Forster0.0000000.0000000.0000001.0000000.000000
Peter Jackson0.0000000.0000000.6666670.0000000.333333
Sam Mendes0.0000000.0000000.0000001.0000000.000000
All0.0833330.0833330.1666670.4166670.250000
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director_name \n", + "Baz Luhrmann 1.000000 0.000000 0.000000 0.000000 0.000000\n", + "Brett Ratner 0.000000 1.000000 0.000000 0.000000 0.000000\n", + "David Yates 0.000000 0.000000 0.000000 1.000000 0.000000\n", + "Gore Verbinski 0.000000 0.000000 0.000000 0.000000 1.000000\n", + "Jon Favreau 0.000000 0.000000 0.000000 1.000000 0.000000\n", + "Marc Forster 0.000000 0.000000 0.000000 1.000000 0.000000\n", + "Peter Jackson 0.000000 0.000000 0.666667 0.000000 0.333333\n", + "Sam Mendes 0.000000 0.000000 0.000000 1.000000 0.000000\n", + "All 0.083333 0.083333 0.166667 0.416667 0.250000" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Combining totals and percentage per row\n", + "pd.crosstab(df2['director_name'], df2['country'], margins=True, normalize='index')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas crosstab multiple columns" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_namegenres
Baz LuhrmannDrama|Romance10000
Brett RatnerAction|Adventure|Fantasy|Sci-Fi|Thriller01000
David YatesAdventure|Family|Fantasy|Mystery00010
Gore VerbinskiAction|Adventure|Fantasy00001
Action|Adventure|Western00001
Jon FavreauAdventure|Drama|Family|Fantasy00010
Marc ForsterAction|Adventure00010
Peter JacksonAction|Adventure|Drama|Romance00100
Adventure|Fantasy00101
Sam MendesAction|Adventure|Thriller00020
\n", + "
" + ], + "text/plain": [ + "country Australia Canada \\\n", + "director_name genres \n", + "Baz Luhrmann Drama|Romance 1 0 \n", + "Brett Ratner Action|Adventure|Fantasy|Sci-Fi|Thriller 0 1 \n", + "David Yates Adventure|Family|Fantasy|Mystery 0 0 \n", + "Gore Verbinski Action|Adventure|Fantasy 0 0 \n", + " Action|Adventure|Western 0 0 \n", + "Jon Favreau Adventure|Drama|Family|Fantasy 0 0 \n", + "Marc Forster Action|Adventure 0 0 \n", + "Peter Jackson Action|Adventure|Drama|Romance 0 0 \n", + " Adventure|Fantasy 0 0 \n", + "Sam Mendes Action|Adventure|Thriller 0 0 \n", + "\n", + "country New Zealand UK USA \n", + "director_name genres \n", + "Baz Luhrmann Drama|Romance 0 0 0 \n", + "Brett Ratner Action|Adventure|Fantasy|Sci-Fi|Thriller 0 0 0 \n", + "David Yates Adventure|Family|Fantasy|Mystery 0 1 0 \n", + "Gore Verbinski Action|Adventure|Fantasy 0 0 1 \n", + " Action|Adventure|Western 0 0 1 \n", + "Jon Favreau Adventure|Drama|Family|Fantasy 0 1 0 \n", + "Marc Forster Action|Adventure 0 1 0 \n", + "Peter Jackson Action|Adventure|Drama|Romance 1 0 0 \n", + " Adventure|Fantasy 1 0 1 \n", + "Sam Mendes Action|Adventure|Thriller 0 2 0 " + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.crosstab([df2['director_name'], df2['genres']], df2['country'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Simulate pandas crosstab with Group By" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namecountry
director_namecountry
Baz LuhrmannAustralia11
Brett RatnerCanada11
David YatesUK11
Gore VerbinskiUSA22
Jon FavreauUK11
Marc ForsterUK11
Peter JacksonNew Zealand22
USA11
Sam MendesUK22
\n", + "
" + ], + "text/plain": [ + " director_name country\n", + "director_name country \n", + "Baz Luhrmann Australia 1 1\n", + "Brett Ratner Canada 1 1\n", + "David Yates UK 1 1\n", + "Gore Verbinski USA 2 2\n", + "Jon Favreau UK 1 1\n", + "Marc Forster UK 1 1\n", + "Peter Jackson New Zealand 2 2\n", + " USA 1 1\n", + "Sam Mendes UK 2 2" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cols = ['director_name', 'country']\n", + "df2.groupby(cols)[cols].count()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pandas crosstab use values from another column" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSA
director_name
Baz Luhrmann7.3NaNNaNNaNNaN
Brett RatnerNaN6.8NaNNaNNaN
David YatesNaNNaNNaN7.5NaN
Gore VerbinskiNaNNaNNaNNaN6.9
Jon FavreauNaNNaNNaN7.8NaN
Marc ForsterNaNNaNNaN6.7NaN
Peter JacksonNaNNaN7.35NaN7.9
Sam MendesNaNNaNNaN7.3NaN
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA\n", + "director_name \n", + "Baz Luhrmann 7.3 NaN NaN NaN NaN\n", + "Brett Ratner NaN 6.8 NaN NaN NaN\n", + "David Yates NaN NaN NaN 7.5 NaN\n", + "Gore Verbinski NaN NaN NaN NaN 6.9\n", + "Jon Favreau NaN NaN NaN 7.8 NaN\n", + "Marc Forster NaN NaN NaN 6.7 NaN\n", + "Peter Jackson NaN NaN 7.35 NaN 7.9\n", + "Sam Mendes NaN NaN NaN 7.3 NaN" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "pd.crosstab(df2['director_name'], df2['country'], values=df2.imdb_score, aggfunc=np.average)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryAustraliaCanadaNew ZealandUKUSAAll
director_name
Baz Luhrmann7.3NaNNaNNaNNaN7.300000
Brett RatnerNaN6.8NaNNaNNaN6.800000
David YatesNaNNaNNaN7.50NaN7.500000
Gore VerbinskiNaNNaNNaNNaN6.9000006.900000
Jon FavreauNaNNaNNaN7.80NaN7.800000
Marc ForsterNaNNaNNaN6.70NaN6.700000
Peter JacksonNaNNaN7.35NaN7.9000007.533333
Sam MendesNaNNaNNaN7.30NaN7.300000
All7.36.87.357.327.2333337.258333
\n", + "
" + ], + "text/plain": [ + "country Australia Canada New Zealand UK USA All\n", + "director_name \n", + "Baz Luhrmann 7.3 NaN NaN NaN NaN 7.300000\n", + "Brett Ratner NaN 6.8 NaN NaN NaN 6.800000\n", + "David Yates NaN NaN NaN 7.50 NaN 7.500000\n", + "Gore Verbinski NaN NaN NaN NaN 6.900000 6.900000\n", + "Jon Favreau NaN NaN NaN 7.80 NaN 7.800000\n", + "Marc Forster NaN NaN NaN 6.70 NaN 6.700000\n", + "Peter Jackson NaN NaN 7.35 NaN 7.900000 7.533333\n", + "Sam Mendes NaN NaN NaN 7.30 NaN 7.300000\n", + "All 7.3 6.8 7.35 7.32 7.233333 7.258333" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "pd.crosstab(df2['director_name'], df2['country'], values=df2.imdb_score, aggfunc=np.average, margins=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 4d72bb6df7cd29ddd3de8fdbd725e03234004a4b Mon Sep 17 00:00:00 2001 From: softhints Date: Tue, 7 Jan 2020 10:14:12 +0200 Subject: [PATCH 42/76] Pandas - value_counts - multiple columns, all columns and bad data --- ..._columns%2C_all_columns_and_bad_data.ipynb | 1098 +++++++++++++++++ 1 file changed, 1098 insertions(+) create mode 100644 notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb diff --git a/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb b/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb new file mode 100644 index 0000000..085e38f --- /dev/null +++ b/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb @@ -0,0 +1,1098 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 20. Pandas - value_counts - multiple columns, all columns and bad data" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df_movie = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df_resp = pd.read_csv(\"../csv/other_text_responses.csv\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Pandas apply value_counts on multiple columns at once" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',\n", + " 'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',\n", + " 'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',\n", + " 'movie_title', 'num_voted_users', 'cast_total_facebook_likes',\n", + " 'actor_3_name', 'facenumber_in_poster', 'plot_keywords',\n", + " 'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',\n", + " 'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',\n", + " 'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],\n", + " dtype='object')" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_movie.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colorcontent_rating
Black and White209.0NaN
ApprovedNaN55.0
Color4815.0NaN
GNaN112.0
GPNaN6.0
MNaN5.0
NC-17NaN7.0
Not RatedNaN116.0
PGNaN701.0
PG-13NaN1461.0
PassedNaN9.0
RNaN2118.0
TV-14NaN30.0
TV-GNaN10.0
TV-MANaN20.0
TV-PGNaN13.0
TV-YNaN1.0
TV-Y7NaN1.0
UnratedNaN62.0
XNaN13.0
\n", + "
" + ], + "text/plain": [ + " color content_rating\n", + " Black and White 209.0 NaN\n", + "Approved NaN 55.0\n", + "Color 4815.0 NaN\n", + "G NaN 112.0\n", + "GP NaN 6.0\n", + "M NaN 5.0\n", + "NC-17 NaN 7.0\n", + "Not Rated NaN 116.0\n", + "PG NaN 701.0\n", + "PG-13 NaN 1461.0\n", + "Passed NaN 9.0\n", + "R NaN 2118.0\n", + "TV-14 NaN 30.0\n", + "TV-G NaN 10.0\n", + "TV-MA NaN 20.0\n", + "TV-PG NaN 13.0\n", + "TV-Y NaN 1.0\n", + "TV-Y7 NaN 1.0\n", + "Unrated NaN 62.0\n", + "X NaN 13.0" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_movie[['color', 'content_rating']].apply(pd.Series.value_counts)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Pandas apply value_counts on all columns" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q12_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "LinkedIn 39\n", + "Medium 38\n", + "Coursera 16\n", + "Linkedin 16\n", + "Books 14\n", + "Facebook 11\n", + "medium 11\n", + "linkedin 9\n", + "books 8\n", + "Newsletters 7\n", + "Name: Q12_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q13_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "mlcourse.ai 35\n", + "NPTEL 23\n", + "Youtube 22\n", + "Simplilearn 18\n", + "Pluralsight 17\n", + "Stepik 14\n", + "youtube 12\n", + "Data Science Academy 12\n", + "Springboard 11\n", + "Books 10\n", + "Name: Q13_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Python 89\n", + "python 45\n", + "None 36\n", + "Matlab 28\n", + "none 22\n", + "R 13\n", + "SQL 13\n", + "MATLAB 11\n", + "matlab 11\n", + "Python 9\n", + "Name: Q14_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_Part_1_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Excel 865\n", + "Microsoft Excel 392\n", + "excel 263\n", + "MS Excel 67\n", + "Google Sheets 61\n", + "Google sheets 44\n", + "Microsoft excel 38\n", + "Excel 33\n", + "microsoft excel 27\n", + "EXCEL 25\n", + "Name: Q14_Part_1_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_Part_2_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "SAS 129\n", + "SPSS 116\n", + "R 60\n", + "spss 34\n", + "Spss 25\n", + "Python 21\n", + "Sas 18\n", + "Stata 15\n", + "python 14\n", + "R, Python 11\n", + "Name: Q14_Part_2_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_Part_3_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Tableau 260\n", + "Power BI 71\n", + "tableau 51\n", + "PowerBI 23\n", + "Salesforce 19\n", + "Tableau 16\n", + "Qlik 10\n", + "Power Bi 9\n", + "Spotfire 8\n", + "Power bi 6\n", + "Name: Q14_Part_3_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_Part_4_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "JupyterLab 943\n", + "Jupyter 602\n", + "RStudio 516\n", + "Python 301\n", + "Jupyter Notebook 275\n", + "Rstudio 225\n", + "Jupyterlab 184\n", + "jupyter 183\n", + "Jupyter notebook 170\n", + "python 163\n", + "Name: Q14_Part_4_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q14_Part_5_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "AWS 203\n", + "GCP 134\n", + "Azure 87\n", + "aws 40\n", + "Aws 26\n", + "Google Colab 25\n", + "Databricks 17\n", + "gcp 15\n", + "Gcp 12\n", + "Colab 11\n", + "Name: Q14_Part_5_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q16_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Eclipse 47\n", + "IntelliJ 17\n", + "Intellij 14\n", + "eclipse 13\n", + "SAS 11\n", + "Google Colab 11\n", + "IntelliJ IDEA 10\n", + "Anaconda 9\n", + "Colab 9\n", + "Xcode 8\n", + "Name: Q16_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q17_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Databricks 43\n", + "databricks 12\n", + "Github 12\n", + "Anaconda 8\n", + "Domino Data Lab 5\n", + "Zeppelin 5\n", + "Domino 4\n", + "Jupyter Notebook 4\n", + "Anaconda 3\n", + "Databricks notebooks 3\n", + "Name: Q17_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q18_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "C# 198\n", + "Scala 100\n", + "SAS 79\n", + "Julia 46\n", + "PHP 43\n", + "VBA 30\n", + "Go 27\n", + "Ruby 27\n", + "c# 27\n", + "Swift 20\n", + "Name: Q18_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q19_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Julia 22\n", + "Scala 9\n", + "C# 6\n", + "SAS 4\n", + "Swift 4\n", + "Octave 4\n", + "julia 2\n", + "mathematica 2\n", + "Rust 2\n", + "scala 2\n", + "Name: Q19_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q20_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Tableau 31\n", + "Excel 13\n", + "MATLAB 12\n", + "Power BI 10\n", + "Pandas 8\n", + "tableau 6\n", + "PowerBI 5\n", + "pandas 5\n", + "Dash 4\n", + "Spotfire 4\n", + "Name: Q20_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q21_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "FPGA 7\n", + "Laptop 4\n", + "Fpga 2\n", + "Google Colab 1\n", + "Just a laptop 1\n", + "Raspberry pi 3 and pi zero 1\n", + "dataloggers 1\n", + ".ljio 1\n", + "AWS Serverless (Glue, Athena, Lambda) 1\n", + "Neuromorphic hardware 1\n", + "Name: Q21_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q24_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "SVM 32\n", + "Support Vector Machines 11\n", + "KNN 7\n", + "Clustering 6\n", + "Support Vector Machine 4\n", + "svm 4\n", + "SVM, KNN 4\n", + "SVMs 4\n", + "Support vector machine 3\n", + "Cluster Analysis 2\n", + "Name: Q24_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q25_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "DataRobot 12\n", + "catalyst 7\n", + "Catalyst 6\n", + "fastai 2\n", + "sklearn 2\n", + "Microsoft ML 2\n", + "Datarobot 2\n", + "Watson Studio 1\n", + "Bigartm 1\n", + "Watson Machine Learning 1\n", + "Name: Q25_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q26_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Inverse Reinforcement Learning, Geometrical Computer Vision 1\n", + "Time based deep learning methods, e.g. LSTM, I3D, etc 1\n", + "Pose estimation, object detection, etc. 1\n", + "Super Resolution 1\n", + "triplet loss 1\n", + "Wavenet 1\n", + "Triplet loss (FaceNet) 1\n", + "Azure Computer Vision API 1\n", + "everytime daily constantly, never stoping, never ceasing to fail with measurable risk and damage to never stop knowing the perfect limit of our capacity and perfection. Cancerous attitude towards ourselves but victorious for our comformists. 1\n", + "Text Classification 1\n", + "Name: Q26_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q27_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "SpaCy 2\n", + "Making your own 1\n", + "Retrofitting 1\n", + "LSTM 1\n", + "svm 1\n", + "Which of the following natural language processing (NLP) methods do you use on a regular basis? (Select all that apply) - Other - Text 1\n", + ". 1\n", + "Python libraries 1\n", + "InferSent 1\n", + "SCDV 1\n", + "Name: Q27_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q28_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Catboost 24\n", + "CatBoost 14\n", + "catboost 13\n", + "h2o 11\n", + "H2O 10\n", + "MATLAB 9\n", + "Chainer 6\n", + "Catalyst 4\n", + "Caffe 4\n", + "MXNet 4\n", + "Name: Q28_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q29_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Digital Ocean 5\n", + "DigitalOcean 4\n", + "Tencent Cloud 3\n", + "Databricks 2\n", + "OVH 2\n", + "DataRobot 2\n", + "SAS Cloud 2\n", + "RStudio cloud 2\n", + "paperspace 2\n", + "Private cloud 2\n", + "Name: Q29_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q2_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "non-binary 4\n", + "Attack Helicopter 2\n", + "bionicle 2\n", + "What is your gender? - Prefer to self-describe - Text 1\n", + "Male 1\n", + "agender 1\n", + "male 1\n", + "Supermacho 1\n", + "Transfrogaria. 1\n", + "Male to female transgender 1\n", + "Name: Q2_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q30_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "IBM Cloud 7\n", + "Databricks 6\n", + "AWS SageMaker 4\n", + "AWS EMR 4\n", + "AWS S3 3\n", + "Azure Databricks 3\n", + "IBM Watson 3\n", + "Bigquery 2\n", + "Azure Batch 2\n", + "AWS Fargate 2\n", + "Name: Q30_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q31_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Snowflake 13\n", + "SAS 6\n", + "DataRobot 4\n", + "Hadoop 4\n", + "Cloudera 4\n", + "Spark 4\n", + "IBM Cloud Pak for Data 3\n", + "Splunk 3\n", + "Apache Spark 3\n", + "IBM Watson 2\n", + "Name: Q31_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q32_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "DataRobot 15\n", + "Knime 11\n", + "KNIME 8\n", + "IBM Watson Studio 7\n", + "MATLAB 4\n", + "Alteryx 4\n", + "IBM Watson 3\n", + "H2O 3\n", + "Watson Machine Learning 3\n", + "IBM Cloud 3\n", + "Name: Q32_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q33_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "IBM AutoAI 6\n", + "Prevision.io 4\n", + "H2O AutoML 3\n", + "H20 AutoML 2\n", + "prevision.io 2\n", + "H2O.ai AutoML 2\n", + "SAS 2\n", + "RapidMiner AutoML 1\n", + "IBM SPSS Modeler 1\n", + "chocolate 1\n", + "Name: Q33_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q34_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Snowflake 18\n", + "DB2 17\n", + "MongoDB 15\n", + "Teradata 12\n", + "IBM DB2 7\n", + "Mongo 6\n", + "SAP HANA 6\n", + "SAS 5\n", + "MariaDB 5\n", + "IBM Db2 4\n", + "Name: Q34_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q5_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Professor 43\n", + "Machine Learning Engineer 32\n", + "Consultant 19\n", + "Teacher 19\n", + "Lecturer 14\n", + "CEO 13\n", + "CTO 13\n", + "Engineer 12\n", + "Solution Architect 11\n", + "Director 11\n", + "Name: Q5_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------------------------Q9_OTHER_TEXT---------------------------------------- - " + ] + }, + { + "data": { + "text/plain": [ + "Software Development 1\n", + "Define and Explore new uses for machine learning 1\n", + "Teach machine learning methods and applications 1\n", + "Etl on data 1\n", + "Build applications that turn my data scientist colleagues jobs easier 1\n", + "exploring potential datasets 1\n", + ". 1\n", + "Process improvement, stakeholder management 1\n", + "student 1\n", + "teaching data analytics 1\n", + "Name: Q9_OTHER_TEXT, dtype: int64" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# List top values per question\n", + "for col in df_resp.columns:\n", + " print('-' * 40 + col + '-' * 40 , end=' - ')\n", + " display(df_resp[col].value_counts().head(10))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Pandas apply value_counts on column with bad data" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "import difflib \n", + "\n", + "correct_values = {}\n", + "words = df_resp.Q14_Part_3_TEXT.value_counts(ascending=True).index\n", + "\n", + "for keyword in words:\n", + " similar = difflib.get_close_matches(keyword, words, n=20, cutoff=0.6)\n", + " for x in similar:\n", + " correct_values[x] = keyword\n", + " \n", + "df_resp[\"corr\"] = df_resp[\"Q14_Part_3_TEXT\"].map(correct_values)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Tableau 345\n", + "Power BI 137\n", + "Salesforce 43\n", + "Qlik 27\n", + "Spotfire 17\n", + " ... \n", + "QV 1\n", + "SQL DATA BASE 1\n", + "Power BI, SSRS 1\n", + "Tablea, Netezza, SQL server 1\n", + "For trading 1\n", + "Name: corr, Length: 177, dtype: int64" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_resp[\"corr\"].value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Tableau 260\n", + "Power BI 71\n", + "tableau 51\n", + "PowerBI 23\n", + "Salesforce 19\n", + " ... \n", + "Tableau, Salesforce 1\n", + "Tableau, powerbi 1\n", + "Tableau and Power BI 1\n", + "MySQL and Sisense 1\n", + "Alteryx 1\n", + "Name: Q14_Part_3_TEXT, Length: 339, dtype: int64" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_resp[\"Q14_Part_3_TEXT\"].value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From e1642dfa9e0d069821ac4be221b3c87ce51ce9ca Mon Sep 17 00:00:00 2001 From: softhints Date: Tue, 7 Jan 2020 10:17:46 +0200 Subject: [PATCH 43/76] Pandas_compare_columns_in_two_Dataframes --- ..._columns%2C_all_columns_and_bad_data.ipynb | 330 +++++++++--------- 1 file changed, 165 insertions(+), 165 deletions(-) diff --git a/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb b/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb index 085e38f..9ca588d 100644 --- a/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb +++ b/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb @@ -9,7 +9,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ @@ -19,7 +19,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ @@ -36,7 +36,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 3, "metadata": {}, "outputs": [ { @@ -53,7 +53,7 @@ " dtype='object')" ] }, - "execution_count": 9, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } @@ -64,7 +64,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -221,7 +221,7 @@ "X NaN 13.0" ] }, - "execution_count": 13, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -239,7 +239,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -252,16 +252,16 @@ { "data": { "text/plain": [ - "LinkedIn 39\n", - "Medium 38\n", - "Coursera 16\n", - "Linkedin 16\n", - "Books 14\n", - "Facebook 11\n", - "medium 11\n", - "linkedin 9\n", - "books 8\n", - "Newsletters 7\n", + "LinkedIn 39\n", + "Medium 38\n", + "Coursera 16\n", + "Linkedin 16\n", + "Books 14\n", + "Facebook 11\n", + "medium 11\n", + "linkedin 9\n", + "books 8\n", + "Data Science Central 7\n", "Name: Q12_OTHER_TEXT, dtype: int64" ] }, @@ -284,10 +284,10 @@ "Simplilearn 18\n", "Pluralsight 17\n", "Stepik 14\n", - "youtube 12\n", "Data Science Academy 12\n", + "youtube 12\n", "Springboard 11\n", - "Books 10\n", + "Codecademy 10\n", "Name: Q13_OTHER_TEXT, dtype: int64" ] }, @@ -311,8 +311,8 @@ "none 22\n", "R 13\n", "SQL 13\n", - "MATLAB 11\n", "matlab 11\n", + "MATLAB 11\n", "Python 9\n", "Name: Q14_OTHER_TEXT, dtype: int64" ] @@ -356,16 +356,16 @@ { "data": { "text/plain": [ - "SAS 129\n", - "SPSS 116\n", - "R 60\n", - "spss 34\n", - "Spss 25\n", - "Python 21\n", - "Sas 18\n", - "Stata 15\n", - "python 14\n", - "R, Python 11\n", + "SAS 129\n", + "SPSS 116\n", + "R 60\n", + "spss 34\n", + "Spss 25\n", + "Python 21\n", + "Sas 18\n", + "Stata 15\n", + "python 14\n", + "sas 11\n", "Name: Q14_Part_2_TEXT, dtype: int64" ] }, @@ -486,16 +486,16 @@ { "data": { "text/plain": [ - "Databricks 43\n", - "databricks 12\n", - "Github 12\n", - "Anaconda 8\n", - "Domino Data Lab 5\n", - "Zeppelin 5\n", - "Domino 4\n", - "Jupyter Notebook 4\n", - "Anaconda 3\n", - "Databricks notebooks 3\n", + "Databricks 43\n", + "databricks 12\n", + "Github 12\n", + "Anaconda 8\n", + "Zeppelin 5\n", + "Domino Data Lab 5\n", + "Domino 4\n", + "Jupyter Notebook 4\n", + "DataBricks 3\n", + "github 3\n", "Name: Q17_OTHER_TEXT, dtype: int64" ] }, @@ -541,13 +541,13 @@ "Julia 22\n", "Scala 9\n", "C# 6\n", - "SAS 4\n", "Swift 4\n", "Octave 4\n", + "SAS 4\n", "julia 2\n", + "scala 2\n", "mathematica 2\n", "Rust 2\n", - "scala 2\n", "Name: Q19_OTHER_TEXT, dtype: int64" ] }, @@ -570,10 +570,10 @@ "Power BI 10\n", "Pandas 8\n", "tableau 6\n", - "PowerBI 5\n", "pandas 5\n", - "Dash 4\n", + "PowerBI 5\n", "Spotfire 4\n", + "Dash 4\n", "Name: Q20_OTHER_TEXT, dtype: int64" ] }, @@ -590,16 +590,16 @@ { "data": { "text/plain": [ - "FPGA 7\n", - "Laptop 4\n", - "Fpga 2\n", - "Google Colab 1\n", - "Just a laptop 1\n", - "Raspberry pi 3 and pi zero 1\n", - "dataloggers 1\n", - ".ljio 1\n", - "AWS Serverless (Glue, Athena, Lambda) 1\n", - "Neuromorphic hardware 1\n", + "FPGA 7\n", + "Laptop 4\n", + "Fpga 2\n", + "Pen and Paper 1\n", + "Ms hpc 1\n", + "NVIDIA Jetson, Intel Movidius 1\n", + "Cloud stacks with sets of all three above 1\n", + "Hadoop cluster 1\n", + "PC 1\n", + "SSD 1\n", "Name: Q21_OTHER_TEXT, dtype: int64" ] }, @@ -620,12 +620,12 @@ "Support Vector Machines 11\n", "KNN 7\n", "Clustering 6\n", + "SVMs 4\n", "Support Vector Machine 4\n", - "svm 4\n", "SVM, KNN 4\n", - "SVMs 4\n", + "svm 4\n", "Support vector machine 3\n", - "Cluster Analysis 2\n", + "Collaborative Filtering 2\n", "Name: Q24_OTHER_TEXT, dtype: int64" ] }, @@ -642,16 +642,16 @@ { "data": { "text/plain": [ - "DataRobot 12\n", - "catalyst 7\n", - "Catalyst 6\n", - "fastai 2\n", - "sklearn 2\n", - "Microsoft ML 2\n", - "Datarobot 2\n", - "Watson Studio 1\n", - "Bigartm 1\n", - "Watson Machine Learning 1\n", + "DataRobot 12\n", + "catalyst 7\n", + "Catalyst 6\n", + "Microsoft ML 2\n", + "sklearn 2\n", + "fastai 2\n", + "Datarobot 2\n", + "Writing quite a bit of my own automation 1\n", + "Mostly Deep Learning Libraries & scikit learn. Also recently started using MLFlow for experiment tracking 1\n", + "Automated Distributed-Learning mechanism using Google AI Platform. 1\n", "Name: Q25_OTHER_TEXT, dtype: int64" ] }, @@ -668,15 +668,15 @@ { "data": { "text/plain": [ - "Inverse Reinforcement Learning, Geometrical Computer Vision 1\n", - "Time based deep learning methods, e.g. LSTM, I3D, etc 1\n", - "Pose estimation, object detection, etc. 1\n", - "Super Resolution 1\n", - "triplet loss 1\n", - "Wavenet 1\n", - "Triplet loss (FaceNet) 1\n", - "Azure Computer Vision API 1\n", + "GLCM wavelet 1\n", + "text processing 1\n", + "openCV 1\n", + "3D objects classification, segmentation 1\n", "everytime daily constantly, never stoping, never ceasing to fail with measurable risk and damage to never stop knowing the perfect limit of our capacity and perfection. Cancerous attitude towards ourselves but victorious for our comformists. 1\n", + "OpenCV 1\n", + "Custom Built Tools 1\n", + "OpenCV with GPUs 1\n", + "Handwriting Recognize tools 1\n", "Text Classification 1\n", "Name: Q26_OTHER_TEXT, dtype: int64" ] @@ -694,16 +694,16 @@ { "data": { "text/plain": [ - "SpaCy 2\n", - "Making your own 1\n", - "Retrofitting 1\n", - "LSTM 1\n", - "svm 1\n", - "Which of the following natural language processing (NLP) methods do you use on a regular basis? (Select all that apply) - Other - Text 1\n", - ". 1\n", - "Python libraries 1\n", - "InferSent 1\n", - "SCDV 1\n", + "SpaCy 2\n", + "Am learning this 1\n", + "Flair 1\n", + "Spss nlp 1\n", + "O 1\n", + "Record linkage, clustering, edit distance 1\n", + "OCR 1\n", + "svm 1\n", + "Stopwords, Lemmatization, TF-IDF, BoW 1\n", + "SCDV 1\n", "Name: Q27_OTHER_TEXT, dtype: int64" ] }, @@ -727,9 +727,9 @@ "H2O 10\n", "MATLAB 9\n", "Chainer 6\n", - "Catalyst 4\n", + "Spacy 4\n", "Caffe 4\n", - "MXNet 4\n", + "SAS 4\n", "Name: Q28_OTHER_TEXT, dtype: int64" ] }, @@ -749,13 +749,13 @@ "Digital Ocean 5\n", "DigitalOcean 4\n", "Tencent Cloud 3\n", - "Databricks 2\n", "OVH 2\n", - "DataRobot 2\n", - "SAS Cloud 2\n", + "Google Colab 2\n", + "Private cloud 2\n", "RStudio cloud 2\n", + "Databricks 2\n", "paperspace 2\n", - "Private cloud 2\n", + "DataRobot 2\n", "Name: Q29_OTHER_TEXT, dtype: int64" ] }, @@ -772,16 +772,16 @@ { "data": { "text/plain": [ - "non-binary 4\n", - "Attack Helicopter 2\n", - "bionicle 2\n", - "What is your gender? - Prefer to self-describe - Text 1\n", - "Male 1\n", - "agender 1\n", - "male 1\n", - "Supermacho 1\n", - "Transfrogaria. 1\n", - "Male to female transgender 1\n", + "non-binary 4\n", + "Attack Helicopter 2\n", + "bionicle 2\n", + "Unicorn 1\n", + "half femal-half male whole animal 1\n", + "MALE 1\n", + "Lvl 129 Dust Devil 1\n", + "genderfluid helicopter 1\n", + "Indescribable 1\n", + "chakka 1\n", "Name: Q2_OTHER_TEXT, dtype: int64" ] }, @@ -802,12 +802,12 @@ "Databricks 6\n", "AWS SageMaker 4\n", "AWS EMR 4\n", + "IBM Watson 3\n", "AWS S3 3\n", "Azure Databricks 3\n", - "IBM Watson 3\n", - "Bigquery 2\n", - "Azure Batch 2\n", - "AWS Fargate 2\n", + "Athena 2\n", + "Google BigQuery 2\n", + "Watson 2\n", "Name: Q30_OTHER_TEXT, dtype: int64" ] }, @@ -826,14 +826,14 @@ "text/plain": [ "Snowflake 13\n", "SAS 6\n", - "DataRobot 4\n", + "Spark 4\n", "Hadoop 4\n", + "DataRobot 4\n", "Cloudera 4\n", - "Spark 4\n", + "Apache Spark 3\n", "IBM Cloud Pak for Data 3\n", "Splunk 3\n", - "Apache Spark 3\n", - "IBM Watson 2\n", + "Vertica 2\n", "Name: Q31_OTHER_TEXT, dtype: int64" ] }, @@ -854,12 +854,12 @@ "Knime 11\n", "KNIME 8\n", "IBM Watson Studio 7\n", - "MATLAB 4\n", "Alteryx 4\n", + "MATLAB 4\n", "IBM Watson 3\n", - "H2O 3\n", + "R 3\n", + "Datarobot 3\n", "Watson Machine Learning 3\n", - "IBM Cloud 3\n", "Name: Q32_OTHER_TEXT, dtype: int64" ] }, @@ -876,16 +876,16 @@ { "data": { "text/plain": [ - "IBM AutoAI 6\n", - "Prevision.io 4\n", - "H2O AutoML 3\n", - "H20 AutoML 2\n", - "prevision.io 2\n", - "H2O.ai AutoML 2\n", - "SAS 2\n", - "RapidMiner AutoML 1\n", - "IBM SPSS Modeler 1\n", - "chocolate 1\n", + "IBM AutoAI 6\n", + "Prevision.io 4\n", + "H2O AutoML 3\n", + "H2O.ai AutoML 2\n", + "SAS 2\n", + "H20 AutoML 2\n", + "prevision.io 2\n", + "H2O auto ML 1\n", + "MATLAB Classification Learner 1\n", + "IBM SPSS Modeler 1\n", "Name: Q33_OTHER_TEXT, dtype: int64" ] }, @@ -907,11 +907,11 @@ "MongoDB 15\n", "Teradata 12\n", "IBM DB2 7\n", - "Mongo 6\n", "SAP HANA 6\n", - "SAS 5\n", + "Mongo 6\n", "MariaDB 5\n", - "IBM Db2 4\n", + "SAS 5\n", + "Hadoop 4\n", "Name: Q34_OTHER_TEXT, dtype: int64" ] }, @@ -930,14 +930,14 @@ "text/plain": [ "Professor 43\n", "Machine Learning Engineer 32\n", - "Consultant 19\n", "Teacher 19\n", + "Consultant 19\n", "Lecturer 14\n", "CEO 13\n", "CTO 13\n", "Engineer 12\n", - "Solution Architect 11\n", "Director 11\n", + "Manager 11\n", "Name: Q5_OTHER_TEXT, dtype: int64" ] }, @@ -954,16 +954,16 @@ { "data": { "text/plain": [ - "Software Development 1\n", - "Define and Explore new uses for machine learning 1\n", - "Teach machine learning methods and applications 1\n", - "Etl on data 1\n", - "Build applications that turn my data scientist colleagues jobs easier 1\n", - "exploring potential datasets 1\n", - ". 1\n", - "Process improvement, stakeholder management 1\n", - "student 1\n", - "teaching data analytics 1\n", + "Farm twitter likes 1\n", + "RD 1\n", + "Build infrastructure around ML and data analyses. 1\n", + "Use of ML to do research 1\n", + "Currently I have not given an opportunity to work with ML 1\n", + "Software Development 1\n", + "BI developer 1\n", + "Teach Machine Learning 1\n", + "teaching 1\n", + "Explaining what I do to the team and stakeholders. 1\n", "Name: Q9_OTHER_TEXT, dtype: int64" ] }, @@ -987,7 +987,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 6, "metadata": {}, "outputs": [], "source": [ @@ -1006,27 +1006,27 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "Tableau 345\n", - "Power BI 137\n", - "Salesforce 43\n", - "Qlik 27\n", - "Spotfire 17\n", - " ... \n", - "QV 1\n", - "SQL DATA BASE 1\n", - "Power BI, SSRS 1\n", - "Tablea, Netezza, SQL server 1\n", - "For trading 1\n", - "Name: corr, Length: 177, dtype: int64" + "Tableau 345\n", + "Power BI 137\n", + "Salesforce 43\n", + "Qlik 27\n", + "Spotfire 17\n", + " ... \n", + "Bespoke in-house solution 1\n", + "Sqldeveloper 1\n", + "Saiku BI 1\n", + "Qlik Sense 1\n", + "visualisation with the help the tableau is so easy 1\n", + "Name: corr, Length: 171, dtype: int64" ] }, - "execution_count": 21, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -1037,27 +1037,27 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "Tableau 260\n", - "Power BI 71\n", - "tableau 51\n", - "PowerBI 23\n", - "Salesforce 19\n", - " ... \n", - "Tableau, Salesforce 1\n", - "Tableau, powerbi 1\n", - "Tableau and Power BI 1\n", - "MySQL and Sisense 1\n", - "Alteryx 1\n", + "Tableau 260\n", + "Power BI 71\n", + "tableau 51\n", + "PowerBI 23\n", + "Salesforce 19\n", + " ... \n", + "Qlik and Microsoft products 1\n", + "Adobe analytics, power bi 1\n", + "OBIEE, Tableau 1\n", + "Sql scripting and BI softwares 1\n", + "Google data studio 1\n", "Name: Q14_Part_3_TEXT, Length: 339, dtype: int64" ] }, - "execution_count": 22, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } From 0e40dfd3a411d65dc8df0093293a22439556ab14 Mon Sep 17 00:00:00 2001 From: softhints Date: Tue, 7 Jan 2020 12:12:45 +0200 Subject: [PATCH 44/76] Pandas_compare_columns_in_two_Dataframes --- ..._columns%2C_all_columns_and_bad_data.ipynb | 582 +++++++++++++----- 1 file changed, 428 insertions(+), 154 deletions(-) diff --git a/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb b/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb index 9ca588d..48f8734 100644 --- a/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb +++ b/notebooks/pandas/20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb @@ -66,6 +66,249 @@ "cell_type": "code", "execution_count": 4, "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0.00NaNNaNNaNNaN907.089.0NaN26.0NaNNaN...NaNNaNNaNNaNNaNNaN55.0NaNNaN2181.0
1.00NaNNaN43.0NaNNaNNaNNaNNaNNaNNaN...51.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
1.18NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaN1.0NaN
1.20NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaN1.0NaN
1.33NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaN68.0NaN
\n", + "

5 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0.00 NaN NaN NaN NaN \n", + "1.00 NaN NaN 43.0 NaN \n", + "1.18 NaN NaN NaN NaN \n", + "1.20 NaN NaN NaN NaN \n", + "1.33 NaN NaN NaN NaN \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0.00 907.0 89.0 NaN \n", + "1.00 NaN NaN NaN \n", + "1.18 NaN NaN NaN \n", + "1.20 NaN NaN NaN \n", + "1.33 NaN NaN NaN \n", + "\n", + " actor_1_facebook_likes gross genres ... num_user_for_reviews \\\n", + "0.00 26.0 NaN NaN ... NaN \n", + "1.00 NaN NaN NaN ... 51.0 \n", + "1.18 NaN NaN NaN ... NaN \n", + "1.20 NaN NaN NaN ... NaN \n", + "1.33 NaN NaN NaN ... NaN \n", + "\n", + " language country content_rating budget title_year \\\n", + "0.00 NaN NaN NaN NaN NaN \n", + "1.00 NaN NaN NaN NaN NaN \n", + "1.18 NaN NaN NaN NaN NaN \n", + "1.20 NaN NaN NaN NaN NaN \n", + "1.33 NaN NaN NaN NaN NaN \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \n", + "0.00 55.0 NaN NaN 2181.0 \n", + "1.00 NaN NaN NaN NaN \n", + "1.18 NaN NaN 1.0 NaN \n", + "1.20 NaN NaN 1.0 NaN \n", + "1.33 NaN NaN 68.0 NaN \n", + "\n", + "[5 rows x 28 columns]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_movie.apply(pd.Series.value_counts).head()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(37410, 28)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_movie.apply(pd.Series.value_counts).shape" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, "outputs": [ { "data": { @@ -221,7 +464,7 @@ "X NaN 13.0" ] }, - "execution_count": 4, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -239,7 +482,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -254,11 +497,11 @@ "text/plain": [ "LinkedIn 39\n", "Medium 38\n", - "Coursera 16\n", "Linkedin 16\n", + "Coursera 16\n", "Books 14\n", - "Facebook 11\n", "medium 11\n", + "Facebook 11\n", "linkedin 9\n", "books 8\n", "Data Science Central 7\n", @@ -287,7 +530,7 @@ "Data Science Academy 12\n", "youtube 12\n", "Springboard 11\n", - "Codecademy 10\n", + "Books 10\n", "Name: Q13_OTHER_TEXT, dtype: int64" ] }, @@ -311,8 +554,8 @@ "none 22\n", "R 13\n", "SQL 13\n", - "matlab 11\n", "MATLAB 11\n", + "matlab 11\n", "Python 9\n", "Name: Q14_OTHER_TEXT, dtype: int64" ] @@ -356,16 +599,16 @@ { "data": { "text/plain": [ - "SAS 129\n", - "SPSS 116\n", - "R 60\n", - "spss 34\n", - "Spss 25\n", - "Python 21\n", - "Sas 18\n", - "Stata 15\n", - "python 14\n", - "sas 11\n", + "SAS 129\n", + "SPSS 116\n", + "R 60\n", + "spss 34\n", + "Spss 25\n", + "Python 21\n", + "Sas 18\n", + "Stata 15\n", + "python 14\n", + "R, Python 11\n", "Name: Q14_Part_2_TEXT, dtype: int64" ] }, @@ -391,7 +634,7 @@ "Qlik 10\n", "Power Bi 9\n", "Spotfire 8\n", - "Power bi 6\n", + "SAP 6\n", "Name: Q14_Part_3_TEXT, dtype: int64" ] }, @@ -467,8 +710,8 @@ "SAS 11\n", "Google Colab 11\n", "IntelliJ IDEA 10\n", - "Anaconda 9\n", "Colab 9\n", + "Anaconda 9\n", "Xcode 8\n", "Name: Q16_OTHER_TEXT, dtype: int64" ] @@ -490,12 +733,12 @@ "databricks 12\n", "Github 12\n", "Anaconda 8\n", - "Zeppelin 5\n", "Domino Data Lab 5\n", + "Zeppelin 5\n", "Domino 4\n", "Jupyter Notebook 4\n", - "DataBricks 3\n", - "github 3\n", + "Anaconda 3\n", + "Jupyter 3\n", "Name: Q17_OTHER_TEXT, dtype: int64" ] }, @@ -518,9 +761,9 @@ "Julia 46\n", "PHP 43\n", "VBA 30\n", - "Go 27\n", "Ruby 27\n", "c# 27\n", + "Go 27\n", "Swift 20\n", "Name: Q18_OTHER_TEXT, dtype: int64" ] @@ -541,13 +784,13 @@ "Julia 22\n", "Scala 9\n", "C# 6\n", + "SAS 4\n", "Swift 4\n", "Octave 4\n", - "SAS 4\n", - "julia 2\n", "scala 2\n", - "mathematica 2\n", "Rust 2\n", + "mathematica 2\n", + "julia 2\n", "Name: Q19_OTHER_TEXT, dtype: int64" ] }, @@ -572,8 +815,8 @@ "tableau 6\n", "pandas 5\n", "PowerBI 5\n", - "Spotfire 4\n", "Dash 4\n", + "Spotfire 4\n", "Name: Q20_OTHER_TEXT, dtype: int64" ] }, @@ -590,16 +833,16 @@ { "data": { "text/plain": [ - "FPGA 7\n", - "Laptop 4\n", - "Fpga 2\n", - "Pen and Paper 1\n", - "Ms hpc 1\n", - "NVIDIA Jetson, Intel Movidius 1\n", - "Cloud stacks with sets of all three above 1\n", - "Hadoop cluster 1\n", - "PC 1\n", - "SSD 1\n", + "FPGA 7\n", + "Laptop 4\n", + "Fpga 2\n", + "Planning to use GPUs 1\n", + "spark databricks on AWS 1\n", + "Edge neurocomputing chips like Intel's NCS. 1\n", + "but i wana to use gpu 1\n", + "Intel NCS2 1\n", + "Parallel comp with MPI 1\n", + "paperspace uses GPU's I believe... I prefer them figuring that part out 1\n", "Name: Q21_OTHER_TEXT, dtype: int64" ] }, @@ -620,12 +863,12 @@ "Support Vector Machines 11\n", "KNN 7\n", "Clustering 6\n", - "SVMs 4\n", + "svm 4\n", "Support Vector Machine 4\n", "SVM, KNN 4\n", - "svm 4\n", + "SVMs 4\n", "Support vector machine 3\n", - "Collaborative Filtering 2\n", + "KMeans 2\n", "Name: Q24_OTHER_TEXT, dtype: int64" ] }, @@ -642,16 +885,16 @@ { "data": { "text/plain": [ - "DataRobot 12\n", - "catalyst 7\n", - "Catalyst 6\n", - "Microsoft ML 2\n", - "sklearn 2\n", - "fastai 2\n", - "Datarobot 2\n", - "Writing quite a bit of my own automation 1\n", - "Mostly Deep Learning Libraries & scikit learn. Also recently started using MLFlow for experiment tracking 1\n", - "Automated Distributed-Learning mechanism using Google AI Platform. 1\n", + "DataRobot 12\n", + "catalyst 7\n", + "Catalyst 6\n", + "Microsoft ML 2\n", + "Datarobot 2\n", + "sklearn 2\n", + "fastai 2\n", + "Weka 1\n", + "Stepwise regression 1\n", + "SAS, ScykitLearn 1\n", "Name: Q25_OTHER_TEXT, dtype: int64" ] }, @@ -668,16 +911,16 @@ { "data": { "text/plain": [ - "GLCM wavelet 1\n", - "text processing 1\n", - "openCV 1\n", - "3D objects classification, segmentation 1\n", - "everytime daily constantly, never stoping, never ceasing to fail with measurable risk and damage to never stop knowing the perfect limit of our capacity and perfection. Cancerous attitude towards ourselves but victorious for our comformists. 1\n", - "OpenCV 1\n", - "Custom Built Tools 1\n", - "OpenCV with GPUs 1\n", "Handwriting Recognize tools 1\n", - "Text Classification 1\n", + "Custom Built Tools 1\n", + "everytime daily constantly, never stoping, never ceasing to fail with measurable risk and damage to never stop knowing the perfect limit of our capacity and perfection. Cancerous attitude towards ourselves but victorious for our comformists. 1\n", + "Anomaly detection on videos 1\n", + "SSD-Keras 1\n", + "Fast.ai 1\n", + "Wavenet 1\n", + "openCV 1\n", + "GIS 1\n", + "text processing 1\n", "Name: Q26_OTHER_TEXT, dtype: int64" ] }, @@ -694,16 +937,16 @@ { "data": { "text/plain": [ - "SpaCy 2\n", - "Am learning this 1\n", - "Flair 1\n", - "Spss nlp 1\n", - "O 1\n", - "Record linkage, clustering, edit distance 1\n", - "OCR 1\n", - "svm 1\n", - "Stopwords, Lemmatization, TF-IDF, BoW 1\n", - "SCDV 1\n", + "SpaCy 2\n", + "Text mining by R and Python libraries only 1\n", + "Retrofitting 1\n", + "OWL 1\n", + "Flair 1\n", + "Making your own 1\n", + "Stopwords, Lemmatization, TF-IDF, BoW 1\n", + "fastai 1\n", + "svm 1\n", + "FastAI 1\n", "Name: Q27_OTHER_TEXT, dtype: int64" ] }, @@ -727,9 +970,9 @@ "H2O 10\n", "MATLAB 9\n", "Chainer 6\n", - "Spacy 4\n", + "MXNet 4\n", + "Catalyst 4\n", "Caffe 4\n", - "SAS 4\n", "Name: Q28_OTHER_TEXT, dtype: int64" ] }, @@ -749,13 +992,13 @@ "Digital Ocean 5\n", "DigitalOcean 4\n", "Tencent Cloud 3\n", - "OVH 2\n", + "DataRobot 2\n", "Google Colab 2\n", "Private cloud 2\n", - "RStudio cloud 2\n", - "Databricks 2\n", "paperspace 2\n", - "DataRobot 2\n", + "SAS Cloud 2\n", + "Databricks 2\n", + "OVH 2\n", "Name: Q29_OTHER_TEXT, dtype: int64" ] }, @@ -772,16 +1015,16 @@ { "data": { "text/plain": [ - "non-binary 4\n", - "Attack Helicopter 2\n", - "bionicle 2\n", - "Unicorn 1\n", - "half femal-half male whole animal 1\n", - "MALE 1\n", - "Lvl 129 Dust Devil 1\n", - "genderfluid helicopter 1\n", - "Indescribable 1\n", - "chakka 1\n", + "non-binary 4\n", + "Attack Helicopter 2\n", + "bionicle 2\n", + "T-rex shaped meteor made out of cheese 1\n", + "Pharoah 1\n", + "queer 1\n", + "Lvl 129 Dust Devil 1\n", + "What is your gender? - Prefer to self-describe - Text 1\n", + "genderfluid helicopter 1\n", + "none 1\n", "Name: Q2_OTHER_TEXT, dtype: int64" ] }, @@ -800,14 +1043,14 @@ "text/plain": [ "IBM Cloud 7\n", "Databricks 6\n", - "AWS SageMaker 4\n", "AWS EMR 4\n", - "IBM Watson 3\n", + "AWS SageMaker 4\n", "AWS S3 3\n", "Azure Databricks 3\n", - "Athena 2\n", - "Google BigQuery 2\n", - "Watson 2\n", + "IBM Watson 3\n", + "Inhouse 2\n", + "AWS Fargate 2\n", + "OpenShift 2\n", "Name: Q30_OTHER_TEXT, dtype: int64" ] }, @@ -827,13 +1070,13 @@ "Snowflake 13\n", "SAS 6\n", "Spark 4\n", + "Cloudera 4\n", "Hadoop 4\n", "DataRobot 4\n", - "Cloudera 4\n", - "Apache Spark 3\n", "IBM Cloud Pak for Data 3\n", "Splunk 3\n", - "Vertica 2\n", + "Apache Spark 3\n", + "Tableau 2\n", "Name: Q31_OTHER_TEXT, dtype: int64" ] }, @@ -850,16 +1093,16 @@ { "data": { "text/plain": [ - "DataRobot 15\n", - "Knime 11\n", - "KNIME 8\n", - "IBM Watson Studio 7\n", - "Alteryx 4\n", - "MATLAB 4\n", - "IBM Watson 3\n", - "R 3\n", - "Datarobot 3\n", - "Watson Machine Learning 3\n", + "DataRobot 15\n", + "Knime 11\n", + "KNIME 8\n", + "IBM Watson Studio 7\n", + "Alteryx 4\n", + "MATLAB 4\n", + "IBM Cloud 3\n", + "H2O 3\n", + "Datarobot 3\n", + "IBM Watson 3\n", "Name: Q32_OTHER_TEXT, dtype: int64" ] }, @@ -876,16 +1119,16 @@ { "data": { "text/plain": [ - "IBM AutoAI 6\n", - "Prevision.io 4\n", - "H2O AutoML 3\n", - "H2O.ai AutoML 2\n", - "SAS 2\n", - "H20 AutoML 2\n", - "prevision.io 2\n", - "H2O auto ML 1\n", - "MATLAB Classification Learner 1\n", - "IBM SPSS Modeler 1\n", + "IBM AutoAI 6\n", + "Prevision.io 4\n", + "H2O AutoML 3\n", + "H20 AutoML 2\n", + "H2O.ai AutoML 2\n", + "SAS 2\n", + "prevision.io 2\n", + "Which automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply) - Other - Text 1\n", + "Watson ML 1\n", + "custom 1\n", "Name: Q33_OTHER_TEXT, dtype: int64" ] }, @@ -907,8 +1150,8 @@ "MongoDB 15\n", "Teradata 12\n", "IBM DB2 7\n", - "SAP HANA 6\n", "Mongo 6\n", + "SAP HANA 6\n", "MariaDB 5\n", "SAS 5\n", "Hadoop 4\n", @@ -933,11 +1176,11 @@ "Teacher 19\n", "Consultant 19\n", "Lecturer 14\n", - "CEO 13\n", "CTO 13\n", + "CEO 13\n", "Engineer 12\n", - "Director 11\n", - "Manager 11\n", + "Mechanical Engineer 11\n", + "Solution Architect 11\n", "Name: Q5_OTHER_TEXT, dtype: int64" ] }, @@ -954,16 +1197,16 @@ { "data": { "text/plain": [ - "Farm twitter likes 1\n", - "RD 1\n", - "Build infrastructure around ML and data analyses. 1\n", - "Use of ML to do research 1\n", - "Currently I have not given an opportunity to work with ML 1\n", - "Software Development 1\n", - "BI developer 1\n", - "Teach Machine Learning 1\n", - "teaching 1\n", - "Explaining what I do to the team and stakeholders. 1\n", + "i'm professor 1\n", + "I am a Technical Project Manager and I involve with customer product owner and my teams to analyse and suggest business decisions. 1\n", + "Architecture 1\n", + "Model methodology development 1\n", + "Produce data driven research 1\n", + "human-centered data science research 1\n", + "\"> 1\n", + "Analyze business systems and processes; recommend solutions. 1\n", + "Support a product that provides analytics & ML libraries 1\n", + "Conceptualize workflows and design experiments 1\n", "Name: Q9_OTHER_TEXT, dtype: int64" ] }, @@ -978,6 +1221,16 @@ " display(df_resp[col].value_counts().head(10))" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TypeError: unhashable type: 'dict'\n", + "df = pd.DataFrame({'a':[1,2,3], 'b':[{'c':1}, {'d':3}, {'c':5, 'd':6}], 'c':[[1],[2],[3]]})" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -987,7 +1240,28 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Power BI', 'PowerBI', 'Power Bi']" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import difflib \n", + "difflib.get_close_matches('Power BI', ['Power BI', 'tableau', 'PowerBI', 'Power Bi','Salesforce', 'Tableau ', 'Qlik', 'Power bi'], n=3, cutoff=0.6)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ @@ -1006,27 +1280,27 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "Tableau 345\n", - "Power BI 137\n", - "Salesforce 43\n", - "Qlik 27\n", - "Spotfire 17\n", - " ... \n", - "Bespoke in-house solution 1\n", - "Sqldeveloper 1\n", - "Saiku BI 1\n", - "Qlik Sense 1\n", - "visualisation with the help the tableau is so easy 1\n", - "Name: corr, Length: 171, dtype: int64" + "Tableau 345\n", + "Power BI 137\n", + "Salesforce 43\n", + "Qlik 27\n", + "Spotfire 17\n", + " ... \n", + "tableau which is very fast and easy to analyse 1\n", + "We use Tableau to analyse through histograms,bargraphs and many more tools in Tableau 1\n", + "Izenda, Excel, XtraReports 1\n", + "ssrs 1\n", + "XLcubed 1\n", + "Name: corr, Length: 179, dtype: int64" ] }, - "execution_count": 7, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } @@ -1037,27 +1311,27 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "Tableau 260\n", - "Power BI 71\n", - "tableau 51\n", - "PowerBI 23\n", - "Salesforce 19\n", - " ... \n", - "Qlik and Microsoft products 1\n", - "Adobe analytics, power bi 1\n", - "OBIEE, Tableau 1\n", - "Sql scripting and BI softwares 1\n", - "Google data studio 1\n", + "Tableau 260\n", + "Power BI 71\n", + "tableau 51\n", + "PowerBI 23\n", + "Salesforce 19\n", + " ... \n", + "Domo 1\n", + "MySQL Client, Tableau 1\n", + "Datastudio 1\n", + "Abinitio 1\n", + "XLcubed 1\n", "Name: Q14_Part_3_TEXT, Length: 339, dtype: int64" ] }, - "execution_count": 8, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } From faafc20de467f9442b406b05f518554b7537ec98 Mon Sep 17 00:00:00 2001 From: softhints Date: Fri, 10 Jan 2020 11:27:26 +0200 Subject: [PATCH 45/76] 21. pandas-dataframe-sampling-rows-or-columns.ipynb --- ...s-dataframe-sampling-rows-or-columns.ipynb | 3681 +++++++++++++++++ 1 file changed, 3681 insertions(+) create mode 100644 notebooks/pandas/21. pandas-dataframe-sampling-rows-or-columns.ipynb diff --git a/notebooks/pandas/21. pandas-dataframe-sampling-rows-or-columns.ipynb b/notebooks/pandas/21. pandas-dataframe-sampling-rows-or-columns.ipynb new file mode 100644 index 0000000..35e9f2b --- /dev/null +++ b/notebooks/pandas/21. pandas-dataframe-sampling-rows-or-columns.ipynb @@ -0,0 +1,3681 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 21. Pandas - Random Sample of a subset of a dataframe" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Random sampling of rows, columns from DataFrame with sample()" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
2525ColorJoel Schumacher106.098.0541.071.0David Murray214.01569918.0Biography|Crime|Drama|Thriller...113.0EnglishIrelandR17000000.02003.096.06.92.350
\n", + "

1 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "2525 Color Joel Schumacher 106.0 98.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "2525 541.0 71.0 David Murray \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "2525 214.0 1569918.0 Biography|Crime|Drama|Thriller ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "2525 113.0 English Ireland R 17000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "2525 2003.0 96.0 6.9 2.35 \n", + "\n", + " movie_facebook_likes \n", + "2525 0 \n", + "\n", + "[1 rows x 28 columns]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Default behavior of sample()\n", + "df.sample()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
535ColorDennis Dugan179.0102.0221.04000.0Adam Sandler12000.0162001186.0Comedy...311.0EnglishUSAPG-1380000000.02010.011000.06.01.8512000
2987ColorFred Schepisi61.0109.040.0794.0Ray Winstone5000.02326407.0Drama...99.0EnglishUKR12000000.02001.01000.07.02.35305
1475ColorDavid Koepp248.091.0192.0346.0Dania Ramirez23000.020275446.0Action|Crime|Thriller...178.0EnglishUSAPG-1335000000.02012.01000.06.52.3520000
\n", + "

3 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "535 Color Dennis Dugan 179.0 102.0 \n", + "2987 Color Fred Schepisi 61.0 109.0 \n", + "1475 Color David Koepp 248.0 91.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "535 221.0 4000.0 Adam Sandler \n", + "2987 40.0 794.0 Ray Winstone \n", + "1475 192.0 346.0 Dania Ramirez \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "535 12000.0 162001186.0 Comedy ... \n", + "2987 5000.0 2326407.0 Drama ... \n", + "1475 23000.0 20275446.0 Action|Crime|Thriller ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "535 311.0 English USA PG-13 80000000.0 \n", + "2987 99.0 English UK R 12000000.0 \n", + "1475 178.0 English USA PG-13 35000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "535 2010.0 11000.0 6.0 1.85 \n", + "2987 2001.0 1000.0 7.0 2.35 \n", + "1475 2012.0 1000.0 6.5 2.35 \n", + "\n", + " movie_facebook_likes \n", + "535 12000 \n", + "2987 305 \n", + "1475 20000 \n", + "\n", + "[3 rows x 28 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# return n rows\n", + "df.sample(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colornum_critic_for_reviewstitle_year
0Color723.02009.0
1Color302.02007.0
2Color602.02015.0
3Color813.02012.0
4NaNNaNNaN
\n", + "
" + ], + "text/plain": [ + " color num_critic_for_reviews title_year\n", + "0 Color 723.0 2009.0\n", + "1 Color 302.0 2007.0\n", + "2 Color 602.0 2015.0\n", + "3 Color 813.0 2012.0\n", + "4 NaN NaN NaN" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# columns\n", + "df.sample(3, axis=1).head()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
2389ColorTerry Gilliam156.0118.00.0551.0Michael Jeter40000.010562387.0Adventure|Comedy|Drama...648.0EnglishUSAR18500000.01998.0693.07.72.3515000
4233Black and WhiteTay Garnett7.0119.010.0275.0Greer Garson509.0NaNDrama...29.0EnglishUSAPassed2160000.01945.0284.07.51.3768
4737ColorGreg Harrison46.086.07.017.0Ari Gold328.01114943.0Drama|Music...74.0EnglishUSAR500000.02000.027.06.51.850
3717ColorMike Flanagan336.0104.059.0202.0Rory Cochrane972.027689474.0Horror|Mystery...339.0EnglishUSAR5000000.02013.0407.06.52.3523000
1854ColorGarry Marshall200.0113.00.0307.0Common22000.054540525.0Comedy|Romance...134.0EnglishUSAPG-1356000000.02011.0988.05.71.8520000
\n", + "

5 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "2389 Color Terry Gilliam 156.0 118.0 \n", + "4233 Black and White Tay Garnett 7.0 119.0 \n", + "4737 Color Greg Harrison 46.0 86.0 \n", + "3717 Color Mike Flanagan 336.0 104.0 \n", + "1854 Color Garry Marshall 200.0 113.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "2389 0.0 551.0 Michael Jeter \n", + "4233 10.0 275.0 Greer Garson \n", + "4737 7.0 17.0 Ari Gold \n", + "3717 59.0 202.0 Rory Cochrane \n", + "1854 0.0 307.0 Common \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "2389 40000.0 10562387.0 Adventure|Comedy|Drama ... \n", + "4233 509.0 NaN Drama ... \n", + "4737 328.0 1114943.0 Drama|Music ... \n", + "3717 972.0 27689474.0 Horror|Mystery ... \n", + "1854 22000.0 54540525.0 Comedy|Romance ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "2389 648.0 English USA R 18500000.0 \n", + "4233 29.0 English USA Passed 2160000.0 \n", + "4737 74.0 English USA R 500000.0 \n", + "3717 339.0 English USA R 5000000.0 \n", + "1854 134.0 English USA PG-13 56000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "2389 1998.0 693.0 7.7 2.35 \n", + "4233 1945.0 284.0 7.5 1.37 \n", + "4737 2000.0 27.0 6.5 1.85 \n", + "3717 2013.0 407.0 6.5 2.35 \n", + "1854 2011.0 988.0 5.7 1.85 \n", + "\n", + " movie_facebook_likes \n", + "2389 15000 \n", + "4233 68 \n", + "4737 0 \n", + "3717 23000 \n", + "1854 20000 \n", + "\n", + "[5 rows x 28 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# The fraction of rows and columns: frac\n", + "df.sample(frac=0.001)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
4232ColorHayley Cloake11.090.00.0306.0Kayla Ewell676.0NaNThriller...9.0EnglishUSAR2200000.02008.0399.04.3NaN77
4877ColorTom Seidman4.098.03.0104.0Derek Brandon337.0NaNDrama|Family...10.0EnglishUSAPG250000.02010.0168.06.2NaN0
3399ColorMike Leigh248.0129.0608.0386.0Imelda Staunton1000.03205244.0Comedy|Drama...141.0EnglishUKPG-1310000000.02010.0579.07.32.350
\n", + "

3 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "4232 Color Hayley Cloake 11.0 90.0 \n", + "4877 Color Tom Seidman 4.0 98.0 \n", + "3399 Color Mike Leigh 248.0 129.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "4232 0.0 306.0 Kayla Ewell \n", + "4877 3.0 104.0 Derek Brandon \n", + "3399 608.0 386.0 Imelda Staunton \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "4232 676.0 NaN Thriller ... \n", + "4877 337.0 NaN Drama|Family ... \n", + "3399 1000.0 3205244.0 Comedy|Drama ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "4232 9.0 English USA R 2200000.0 \n", + "4877 10.0 English USA PG 250000.0 \n", + "3399 141.0 English UK PG-13 10000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "4232 2008.0 399.0 4.3 NaN \n", + "4877 2010.0 168.0 6.2 NaN \n", + "3399 2010.0 579.0 7.3 2.35 \n", + "\n", + " movie_facebook_likes \n", + "4232 77 \n", + "4877 0 \n", + "3399 0 \n", + "\n", + "[3 rows x 28 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# sample with seed\n", + "df.sample(n=3, random_state=5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. p.random.choice" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
893ColorGabriele Muccino202.0123.0125.0835.0Rosario Dawson10000.069951824.0Drama|Romance...599.0EnglishUSAPG-1355000000.02008.03000.07.72.3526000
818ColorBrian Levant57.090.032.0809.0Mark Addy1000.035231365.0Comedy|Family|Romance|Sci-Fi...85.0EnglishUSAPG60000000.02000.0891.03.61.85500
460ColorSimon West139.0123.0165.0744.0Monica Potter12000.0101087161.0Action|Crime|Thriller...339.0EnglishUSAR75000000.01997.0878.06.82.350
772ColorClint Eastwood306.0134.016000.0204.0Morgan Freeman13000.037479778.0Biography|Drama|History|Sport...259.0EnglishUSAPG-1360000000.02009.011000.07.42.3523000
269ColorLen Wiseman354.0129.0235.0297.0Jonathan Sadowski13000.0134520804.0Action|Adventure|Thriller...782.0EnglishUSAPG-13110000000.02007.0300.07.22.350
\n", + "

5 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "893 Color Gabriele Muccino 202.0 123.0 \n", + "818 Color Brian Levant 57.0 90.0 \n", + "460 Color Simon West 139.0 123.0 \n", + "772 Color Clint Eastwood 306.0 134.0 \n", + "269 Color Len Wiseman 354.0 129.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "893 125.0 835.0 Rosario Dawson \n", + "818 32.0 809.0 Mark Addy \n", + "460 165.0 744.0 Monica Potter \n", + "772 16000.0 204.0 Morgan Freeman \n", + "269 235.0 297.0 Jonathan Sadowski \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "893 10000.0 69951824.0 Drama|Romance ... \n", + "818 1000.0 35231365.0 Comedy|Family|Romance|Sci-Fi ... \n", + "460 12000.0 101087161.0 Action|Crime|Thriller ... \n", + "772 13000.0 37479778.0 Biography|Drama|History|Sport ... \n", + "269 13000.0 134520804.0 Action|Adventure|Thriller ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "893 599.0 English USA PG-13 55000000.0 \n", + "818 85.0 English USA PG 60000000.0 \n", + "460 339.0 English USA R 75000000.0 \n", + "772 259.0 English USA PG-13 60000000.0 \n", + "269 782.0 English USA PG-13 110000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "893 2008.0 3000.0 7.7 2.35 \n", + "818 2000.0 891.0 3.6 1.85 \n", + "460 1997.0 878.0 6.8 2.35 \n", + "772 2009.0 11000.0 7.4 2.35 \n", + "269 2007.0 300.0 7.2 2.35 \n", + "\n", + " movie_facebook_likes \n", + "893 26000 \n", + "818 500 \n", + "460 0 \n", + "772 23000 \n", + "269 0 \n", + "\n", + "[5 rows x 28 columns]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# nu,py choice for DataFrame sampling\n", + "import numpy as np\n", + "chosen_idx = np.random.choice(1000, replace=False, size=5)\n", + "df.iloc[chosen_idx]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Random sample of rows based on column values" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Color - " + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
2693ColorAlbert Brooks97.097.0745.0745.0Bradley Whitford12000.011614236.0Comedy...140.0EnglishUSAPG-1315000000.01999.0821.05.61.37251
1613ColorAlexander Payne217.0125.0729.0322.0June Squibb442.065010106.0Comedy|Drama...612.0EnglishUSAR30000000.02002.0344.07.21.850
698ColorLawrence Kasdan40.0212.0759.0812.0Catherine O'Hara2000.025052000.0Adventure|Biography|Crime|Drama|Western...145.0EnglishUSAPG-1363000000.01994.0925.06.62.350
\n", + "

3 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "2693 Color Albert Brooks 97.0 97.0 \n", + "1613 Color Alexander Payne 217.0 125.0 \n", + "698 Color Lawrence Kasdan 40.0 212.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "2693 745.0 745.0 Bradley Whitford \n", + "1613 729.0 322.0 June Squibb \n", + "698 759.0 812.0 Catherine O'Hara \n", + "\n", + " actor_1_facebook_likes gross \\\n", + "2693 12000.0 11614236.0 \n", + "1613 442.0 65010106.0 \n", + "698 2000.0 25052000.0 \n", + "\n", + " genres ... num_user_for_reviews \\\n", + "2693 Comedy ... 140.0 \n", + "1613 Comedy|Drama ... 612.0 \n", + "698 Adventure|Biography|Crime|Drama|Western ... 145.0 \n", + "\n", + " language country content_rating budget title_year \\\n", + "2693 English USA PG-13 15000000.0 1999.0 \n", + "1613 English USA R 30000000.0 2002.0 \n", + "698 English USA PG-13 63000000.0 1994.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \n", + "2693 821.0 5.6 1.37 251 \n", + "1613 344.0 7.2 1.85 0 \n", + "698 925.0 6.6 2.35 0 \n", + "\n", + "[3 rows x 28 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " Black and White - " + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
4786Black and WhiteLloyd Bacon65.089.024.045.0Dick Powell610.02300000.0Comedy|Musical|Romance...97.0EnglishUSAUnrated439000.01933.0105.07.71.37439
3983Black and WhiteJohn Schlesinger88.0113.0154.077.0Barnard Hughes183.0NaNDrama...334.0EnglishUSAX3600000.01969.089.07.91.850
479Black and WhiteNaN31.025.0NaN474.0Agnes Moorehead1000.0NaNComedy|Family|Fantasy...71.0EnglishUSATV-GNaNNaN960.07.64.000
\n", + "

3 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "4786 Black and White Lloyd Bacon 65.0 89.0 \n", + "3983 Black and White John Schlesinger 88.0 113.0 \n", + "479 Black and White NaN 31.0 25.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "4786 24.0 45.0 Dick Powell \n", + "3983 154.0 77.0 Barnard Hughes \n", + "479 NaN 474.0 Agnes Moorehead \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "4786 610.0 2300000.0 Comedy|Musical|Romance ... \n", + "3983 183.0 NaN Drama ... \n", + "479 1000.0 NaN Comedy|Family|Fantasy ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "4786 97.0 English USA Unrated 439000.0 \n", + "3983 334.0 English USA X 3600000.0 \n", + "479 71.0 English USA TV-G NaN \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "4786 1933.0 105.0 7.7 1.37 \n", + "3983 1969.0 89.0 7.9 1.85 \n", + "479 NaN 960.0 7.6 4.00 \n", + "\n", + " movie_facebook_likes \n", + "4786 439 \n", + "3983 0 \n", + "479 0 \n", + "\n", + "[3 rows x 28 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# conditional DataFrame sampling few values - separate display \n", + "col = 'color'\n", + "for typ in list(df[col].dropna().unique()):\n", + " print(typ, end=' - ')\n", + " display(df[df[col] == typ].sample(3))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['PG-13', 'PG', 'G', 'R', 'TV-14', 'TV-PG', 'TV-MA', 'TV-G', 'Not Rated', 'Unrated', 'Approved', 'TV-Y', 'NC-17', 'X', 'TV-Y7', 'GP', 'Passed', 'M']\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
3758ColorBrian Dannelly121.092.012.0797.0Patrick Fugit3000.08786715.0Comedy|Drama...324.0EnglishUSAPG-135000000.02004.0835.06.91.850
64ColorAndrew Adamson284.0150.080.082.0Kiran Shah1000.0291709845.0Adventure|Family|Fantasy...1463.0EnglishUSAPG180000000.02005.0190.06.92.350
4725ColorJoe Camp5.086.024.0142.0Peter Breck407.039552600.0Adventure|Family|Romance...36.0EnglishUSAG500000.01974.0189.06.11.85816
2955ColorFranklin J. Schaffner55.0125.076.0801.0Anne Meara1000.0NaNDrama|Thriller...123.0EnglishUKR12000000.01978.0837.07.01.850
3509ColorNaN11.060.0NaN652.0Ashley Scott10000.0NaNAction|Drama|Mystery|Sci-Fi...160.0EnglishUSATV-14NaNNaN794.07.41.330
4803ColorNaN11.022.0NaN6.0Ron Lynch59.0NaNAnimation|Comedy|Drama...82.0EnglishUSATV-PGNaNNaN11.08.21.33526
826ColorNaN46.030.0NaN479.0Kristin Davis962.0NaNComedy|Romance...238.0EnglishUSATV-MANaNNaN722.07.01.330
3880ColorKenny Ortega57.098.0197.0578.0Corbin Bleu755.0NaNComedy|Drama|Family|Music|Musical|Romance...726.0EnglishUSATV-G4200000.02006.0632.05.21.330
4328Black and WhiteOrson Welles90.092.00.018.0Everett Sloane1000.07927.0Crime|Drama|Film-Noir|Mystery|Thriller...175.0EnglishUSANot Rated2300000.01947.029.07.71.370
4997ColorDavid Gordon Green75.090.0234.015.0Eddie Rouse552.0241816.0Drama...76.0EnglishUSAUnrated42000.02000.061.07.52.35451
4497Black and WhiteWalter Lang7.083.09.051.0Nigel Bruce94.0NaNDrama|Family|Fantasy...25.0EnglishUSAApprovedNaN1940.062.06.51.37548
1265ColorNaN3.030.0NaN12.0Melissa Altro51.0NaNAnimation|Comedy|Family...43.0EnglishCanadaTV-YNaNNaN21.07.41.33301
5025ColorJohn Waters73.0108.00.0105.0Mink Stole462.0180483.0Comedy|Crime|Horror...183.0EnglishUSANC-1710000.01972.0143.06.11.370
3559ColorBrian De Palma121.0104.00.0517.0David Margulies754.031899000.0Mystery|Romance|Thriller...201.0EnglishUSAX6500000.01980.0567.07.12.350
1972ColorNaN7.030.0NaN265.0Jennifer Hale971.0NaNAction|Animation|Comedy|Family|Fantasy|Sci-Fi...60.0EnglishUSATV-Y7NaNNaN918.07.24.00581
4529ColorDouglas Trumbull87.089.0136.042.0Ron Rifkin844.0NaNDrama|Sci-Fi...199.0EnglishUSAGP1000000.01972.0184.06.71.850
4812Black and WhiteHarry Beaumont36.0100.04.04.0Bessie Love77.02808000.0Musical|Romance...71.0EnglishUSAPassed379000.01929.028.06.31.37167
3584ColorGeorge Roy Hill130.0110.0131.0399.0Ted Cassidy640.0102308900.0Biography|Crime|Drama|Western...309.0EnglishUSAM6000000.01969.0566.08.12.350
\n", + "

18 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews \\\n", + "3758 Color Brian Dannelly 121.0 \n", + "64 Color Andrew Adamson 284.0 \n", + "4725 Color Joe Camp 5.0 \n", + "2955 Color Franklin J. Schaffner 55.0 \n", + "3509 Color NaN 11.0 \n", + "4803 Color NaN 11.0 \n", + "826 Color NaN 46.0 \n", + "3880 Color Kenny Ortega 57.0 \n", + "4328 Black and White Orson Welles 90.0 \n", + "4997 Color David Gordon Green 75.0 \n", + "4497 Black and White Walter Lang 7.0 \n", + "1265 Color NaN 3.0 \n", + "5025 Color John Waters 73.0 \n", + "3559 Color Brian De Palma 121.0 \n", + "1972 Color NaN 7.0 \n", + "4529 Color Douglas Trumbull 87.0 \n", + "4812 Black and White Harry Beaumont 36.0 \n", + "3584 Color George Roy Hill 130.0 \n", + "\n", + " duration director_facebook_likes actor_3_facebook_likes \\\n", + "3758 92.0 12.0 797.0 \n", + "64 150.0 80.0 82.0 \n", + "4725 86.0 24.0 142.0 \n", + "2955 125.0 76.0 801.0 \n", + "3509 60.0 NaN 652.0 \n", + "4803 22.0 NaN 6.0 \n", + "826 30.0 NaN 479.0 \n", + "3880 98.0 197.0 578.0 \n", + "4328 92.0 0.0 18.0 \n", + "4997 90.0 234.0 15.0 \n", + "4497 83.0 9.0 51.0 \n", + "1265 30.0 NaN 12.0 \n", + "5025 108.0 0.0 105.0 \n", + "3559 104.0 0.0 517.0 \n", + "1972 30.0 NaN 265.0 \n", + "4529 89.0 136.0 42.0 \n", + "4812 100.0 4.0 4.0 \n", + "3584 110.0 131.0 399.0 \n", + "\n", + " actor_2_name actor_1_facebook_likes gross \\\n", + "3758 Patrick Fugit 3000.0 8786715.0 \n", + "64 Kiran Shah 1000.0 291709845.0 \n", + "4725 Peter Breck 407.0 39552600.0 \n", + "2955 Anne Meara 1000.0 NaN \n", + "3509 Ashley Scott 10000.0 NaN \n", + "4803 Ron Lynch 59.0 NaN \n", + "826 Kristin Davis 962.0 NaN \n", + "3880 Corbin Bleu 755.0 NaN \n", + "4328 Everett Sloane 1000.0 7927.0 \n", + "4997 Eddie Rouse 552.0 241816.0 \n", + "4497 Nigel Bruce 94.0 NaN \n", + "1265 Melissa Altro 51.0 NaN \n", + "5025 Mink Stole 462.0 180483.0 \n", + "3559 David Margulies 754.0 31899000.0 \n", + "1972 Jennifer Hale 971.0 NaN \n", + "4529 Ron Rifkin 844.0 NaN \n", + "4812 Bessie Love 77.0 2808000.0 \n", + "3584 Ted Cassidy 640.0 102308900.0 \n", + "\n", + " genres ... num_user_for_reviews \\\n", + "3758 Comedy|Drama ... 324.0 \n", + "64 Adventure|Family|Fantasy ... 1463.0 \n", + "4725 Adventure|Family|Romance ... 36.0 \n", + "2955 Drama|Thriller ... 123.0 \n", + "3509 Action|Drama|Mystery|Sci-Fi ... 160.0 \n", + "4803 Animation|Comedy|Drama ... 82.0 \n", + "826 Comedy|Romance ... 238.0 \n", + "3880 Comedy|Drama|Family|Music|Musical|Romance ... 726.0 \n", + "4328 Crime|Drama|Film-Noir|Mystery|Thriller ... 175.0 \n", + "4997 Drama ... 76.0 \n", + "4497 Drama|Family|Fantasy ... 25.0 \n", + "1265 Animation|Comedy|Family ... 43.0 \n", + "5025 Comedy|Crime|Horror ... 183.0 \n", + "3559 Mystery|Romance|Thriller ... 201.0 \n", + "1972 Action|Animation|Comedy|Family|Fantasy|Sci-Fi ... 60.0 \n", + "4529 Drama|Sci-Fi ... 199.0 \n", + "4812 Musical|Romance ... 71.0 \n", + "3584 Biography|Crime|Drama|Western ... 309.0 \n", + "\n", + " language country content_rating budget title_year \\\n", + "3758 English USA PG-13 5000000.0 2004.0 \n", + "64 English USA PG 180000000.0 2005.0 \n", + "4725 English USA G 500000.0 1974.0 \n", + "2955 English UK R 12000000.0 1978.0 \n", + "3509 English USA TV-14 NaN NaN \n", + "4803 English USA TV-PG NaN NaN \n", + "826 English USA TV-MA NaN NaN \n", + "3880 English USA TV-G 4200000.0 2006.0 \n", + "4328 English USA Not Rated 2300000.0 1947.0 \n", + "4997 English USA Unrated 42000.0 2000.0 \n", + "4497 English USA Approved NaN 1940.0 \n", + "1265 English Canada TV-Y NaN NaN \n", + "5025 English USA NC-17 10000.0 1972.0 \n", + "3559 English USA X 6500000.0 1980.0 \n", + "1972 English USA TV-Y7 NaN NaN \n", + "4529 English USA GP 1000000.0 1972.0 \n", + "4812 English USA Passed 379000.0 1929.0 \n", + "3584 English USA M 6000000.0 1969.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \n", + "3758 835.0 6.9 1.85 0 \n", + "64 190.0 6.9 2.35 0 \n", + "4725 189.0 6.1 1.85 816 \n", + "2955 837.0 7.0 1.85 0 \n", + "3509 794.0 7.4 1.33 0 \n", + "4803 11.0 8.2 1.33 526 \n", + "826 722.0 7.0 1.33 0 \n", + "3880 632.0 5.2 1.33 0 \n", + "4328 29.0 7.7 1.37 0 \n", + "4997 61.0 7.5 2.35 451 \n", + "4497 62.0 6.5 1.37 548 \n", + "1265 21.0 7.4 1.33 301 \n", + "5025 143.0 6.1 1.37 0 \n", + "3559 567.0 7.1 2.35 0 \n", + "1972 918.0 7.2 4.00 581 \n", + "4529 184.0 6.7 1.85 0 \n", + "4812 28.0 6.3 1.37 167 \n", + "3584 566.0 8.1 2.35 0 \n", + "\n", + "[18 rows x 28 columns]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# conditional DataFrame sampling many values - grouped display \n", + "col = 'content_rating'\n", + "sample = []\n", + "\n", + "variants = list(df[col].dropna().unique())\n", + "print(variants)\n", + "\n", + "for typ in variants:\n", + " sample.append(df[df[col] == typ].sample())\n", + "pd.concat(sample)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Dataframe sampling with numpy and weights" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
2425Black and WhiteMartin Scorsese151.0121.017000.0356.0Cathy Moriarty22000.045250.0Biography|Drama|Sport...EnglishUSAR18000000.01980.0394.08.31.8501.0
1125Black and WhiteOliver Stone83.0212.00.0805.0Bob Hoskins12000.013560960.0Biography|Drama|History...EnglishUSAR50000000.01995.05000.07.12.359151.0
3539NaNRichard Rich2.045.024.029.0Kate Higgins122.0NaNAction|Adventure|Animation|Comedy|Drama|Family......NaNUSANaN7000000.02014.035.06.0NaN411.0
2944Black and WhiteMartin Campbell400.0144.0258.0834.0Tobias Menzies6000.0167007184.0Action|Adventure|Thriller...EnglishUKPG-13150000000.02006.01000.08.02.3501.0
4359Black and WhiteStanley Kubrick192.095.00.0277.0Slim Pickens654.0NaNComedy...EnglishUSAPG1800000.01964.0575.08.51.66180001.0
\n", + "

5 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "2425 Black and White Martin Scorsese 151.0 121.0 \n", + "1125 Black and White Oliver Stone 83.0 212.0 \n", + "3539 NaN Richard Rich 2.0 45.0 \n", + "2944 Black and White Martin Campbell 400.0 144.0 \n", + "4359 Black and White Stanley Kubrick 192.0 95.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "2425 17000.0 356.0 Cathy Moriarty \n", + "1125 0.0 805.0 Bob Hoskins \n", + "3539 24.0 29.0 Kate Higgins \n", + "2944 258.0 834.0 Tobias Menzies \n", + "4359 0.0 277.0 Slim Pickens \n", + "\n", + " actor_1_facebook_likes gross \\\n", + "2425 22000.0 45250.0 \n", + "1125 12000.0 13560960.0 \n", + "3539 122.0 NaN \n", + "2944 6000.0 167007184.0 \n", + "4359 654.0 NaN \n", + "\n", + " genres ... language country \\\n", + "2425 Biography|Drama|Sport ... English USA \n", + "1125 Biography|Drama|History ... English USA \n", + "3539 Action|Adventure|Animation|Comedy|Drama|Family... ... NaN USA \n", + "2944 Action|Adventure|Thriller ... English UK \n", + "4359 Comedy ... English USA \n", + "\n", + " content_rating budget title_year actor_2_facebook_likes \\\n", + "2425 R 18000000.0 1980.0 394.0 \n", + "1125 R 50000000.0 1995.0 5000.0 \n", + "3539 NaN 7000000.0 2014.0 35.0 \n", + "2944 PG-13 150000000.0 2006.0 1000.0 \n", + "4359 PG 1800000.0 1964.0 575.0 \n", + "\n", + " imdb_score aspect_ratio movie_facebook_likes weights \n", + "2425 8.3 1.85 0 1.0 \n", + "1125 7.1 2.35 915 1.0 \n", + "3539 6.0 NaN 41 1.0 \n", + "2944 8.0 2.35 0 1.0 \n", + "4359 8.5 1.66 18000 1.0 \n", + "\n", + "[5 rows x 29 columns]" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# excluding 'Color' values by applying weights 0 - Color and 1 - rest\n", + "df['weights'] = np.where(df['color'] == 'Color', .0, 1)\n", + "df.sample(frac=.001, weights='weights')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
3083ColorFloyd Mutrux5.099.011.0327.0Kelli Williams665.0125169.0Comedy|Drama...EnglishUSAR10500000.01994.0446.06.4NaN1191.0
3435ColorGeorge Tillman Jr.34.0115.088.0890.0Mekhi Phifer1000.043490057.0Comedy|Drama...EnglishUSAR7500000.01997.01000.06.91.855081.0
3940ColorRenny Harlin68.0102.0212.0195.0Lane Smith10000.0354704.0Crime|Drama|Horror|Thriller...EnglishUSAR1300000.01987.0633.05.91.853141.0
5006ColorDamir CaticNaN89.02.00.0Ron Gelner5.0NaNHorror...EnglishUSANot Rated60000.02013.00.05.4NaN481.0
1872ColorMichael Hoffman85.0118.097.0437.0Gerald McRaney775.026761283.0Drama|Romance...EnglishUSAPG-1326000000.02014.0523.06.72.35190001.0
\n", + "

5 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "3083 Color Floyd Mutrux 5.0 99.0 \n", + "3435 Color George Tillman Jr. 34.0 115.0 \n", + "3940 Color Renny Harlin 68.0 102.0 \n", + "5006 Color Damir Catic NaN 89.0 \n", + "1872 Color Michael Hoffman 85.0 118.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "3083 11.0 327.0 Kelli Williams \n", + "3435 88.0 890.0 Mekhi Phifer \n", + "3940 212.0 195.0 Lane Smith \n", + "5006 2.0 0.0 Ron Gelner \n", + "1872 97.0 437.0 Gerald McRaney \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "3083 665.0 125169.0 Comedy|Drama ... \n", + "3435 1000.0 43490057.0 Comedy|Drama ... \n", + "3940 10000.0 354704.0 Crime|Drama|Horror|Thriller ... \n", + "5006 5.0 NaN Horror ... \n", + "1872 775.0 26761283.0 Drama|Romance ... \n", + "\n", + " language country content_rating budget title_year \\\n", + "3083 English USA R 10500000.0 1994.0 \n", + "3435 English USA R 7500000.0 1997.0 \n", + "3940 English USA R 1300000.0 1987.0 \n", + "5006 English USA Not Rated 60000.0 2013.0 \n", + "1872 English USA PG-13 26000000.0 2014.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \\\n", + "3083 446.0 6.4 NaN 119 \n", + "3435 1000.0 6.9 1.85 508 \n", + "3940 633.0 5.9 1.85 314 \n", + "5006 0.0 5.4 NaN 48 \n", + "1872 523.0 6.7 2.35 19000 \n", + "\n", + " weights \n", + "3083 1.0 \n", + "3435 1.0 \n", + "3940 1.0 \n", + "5006 1.0 \n", + "1872 1.0 \n", + "\n", + "[5 rows x 29 columns]" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Including only 'Color' values by applying weights 1 - Color and 0 - rest\n", + "df['weights'] = np.where(df['color'] == 'Color', 1, 0.0)\n", + "df.sample(frac=.001, weights='weights')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 1, 0, 1, 1, 1, 1, 1, 1, 0])" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.where(df['color'] == 'Color', 1, 0)[270:280]" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['Color', 'Color', ' Black and White', 'Color', 'Color', 'Color', 'Color', 'Color', 'Color', nan]\n" + ] + } + ], + "source": [ + "print(list(df['color'][270:280]))" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2153 PG\n", + "2238 PG-13\n", + "836 PG-13\n", + "4725 G\n", + "3205 PG\n", + "3388 PG-13\n", + "152 PG-13\n", + "2574 PG-13\n", + "2724 PG-13\n", + "1854 PG-13\n", + "Name: content_rating, dtype: object" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Including/excluding list of values - with equal probability\n", + "df['weights'] = np.where(df['content_rating'].isin(['PG-13', 'PG', 'G']), 1, 0)\n", + "df.sample(frac=.002, weights='weights')['content_rating']" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PG-13 1461\n", + "PG 701\n", + "G 112\n", + "Name: content_rating, dtype: int64" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['content_rating'].isin(['PG-13', 'PG', 'G'])]['content_rating'].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Pandas sample rows by group" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
color
Black and White2734Black and WhiteFritz Lang260.0145.0756.018.0Gustav Fröhlich136.026435.0Drama|Sci-Fi...GermanGermanyNot Rated6000000.01927.023.08.31.33120000
4741Black and WhiteMorgan J. Freeman17.086.0204.0474.0Heather Matarazzo659.0334041.0Crime|Drama|Romance...EnglishUSAR500000.01997.0529.06.51.85510
1979Black and WhiteNeil Jordan44.0133.0277.08000.0Liam Neeson25000.011030963.0Biography|Drama|Thriller|War...EnglishUKR28000000.01996.014000.07.11.8500
Color3436ColorStanley Tong62.089.07.036.0Anita Mui186.032333860.0Action|Comedy...CantoneseHong KongR7500000.01995.0147.06.72.3500
734ColorOliver Stone171.0156.00.01000.0Dennis Quaid14000.075530832.0Drama|Sport...EnglishUSAR55000000.01999.02000.06.82.3500
2336ColorAndrew Jarecki140.0101.046.0902.0Kirsten Dunst33000.0578382.0Crime|Drama|Mystery|Romance|Thriller...EnglishUSARNaN2010.04000.06.31.8500
\n", + "

6 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name \\\n", + "color \n", + " Black and White 2734 Black and White Fritz Lang \n", + " 4741 Black and White Morgan J. Freeman \n", + " 1979 Black and White Neil Jordan \n", + "Color 3436 Color Stanley Tong \n", + " 734 Color Oliver Stone \n", + " 2336 Color Andrew Jarecki \n", + "\n", + " num_critic_for_reviews duration \\\n", + "color \n", + " Black and White 2734 260.0 145.0 \n", + " 4741 17.0 86.0 \n", + " 1979 44.0 133.0 \n", + "Color 3436 62.0 89.0 \n", + " 734 171.0 156.0 \n", + " 2336 140.0 101.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes \\\n", + "color \n", + " Black and White 2734 756.0 18.0 \n", + " 4741 204.0 474.0 \n", + " 1979 277.0 8000.0 \n", + "Color 3436 7.0 36.0 \n", + " 734 0.0 1000.0 \n", + " 2336 46.0 902.0 \n", + "\n", + " actor_2_name actor_1_facebook_likes gross \\\n", + "color \n", + " Black and White 2734 Gustav Fröhlich 136.0 26435.0 \n", + " 4741 Heather Matarazzo 659.0 334041.0 \n", + " 1979 Liam Neeson 25000.0 11030963.0 \n", + "Color 3436 Anita Mui 186.0 32333860.0 \n", + " 734 Dennis Quaid 14000.0 75530832.0 \n", + " 2336 Kirsten Dunst 33000.0 578382.0 \n", + "\n", + " genres ... language \\\n", + "color ... \n", + " Black and White 2734 Drama|Sci-Fi ... German \n", + " 4741 Crime|Drama|Romance ... English \n", + " 1979 Biography|Drama|Thriller|War ... English \n", + "Color 3436 Action|Comedy ... Cantonese \n", + " 734 Drama|Sport ... English \n", + " 2336 Crime|Drama|Mystery|Romance|Thriller ... English \n", + "\n", + " country content_rating budget title_year \\\n", + "color \n", + " Black and White 2734 Germany Not Rated 6000000.0 1927.0 \n", + " 4741 USA R 500000.0 1997.0 \n", + " 1979 UK R 28000000.0 1996.0 \n", + "Color 3436 Hong Kong R 7500000.0 1995.0 \n", + " 734 USA R 55000000.0 1999.0 \n", + " 2336 USA R NaN 2010.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "color \n", + " Black and White 2734 23.0 8.3 1.33 \n", + " 4741 529.0 6.5 1.85 \n", + " 1979 14000.0 7.1 1.85 \n", + "Color 3436 147.0 6.7 2.35 \n", + " 734 2000.0 6.8 2.35 \n", + " 2336 4000.0 6.3 1.85 \n", + "\n", + " movie_facebook_likes weights \n", + "color \n", + " Black and White 2734 12000 0 \n", + " 4741 51 0 \n", + " 1979 0 0 \n", + "Color 3436 0 0 \n", + " 734 0 0 \n", + " 2336 0 0 \n", + "\n", + "[6 rows x 29 columns]" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby('color').apply(lambda x: x.sample(n=3))" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
0Black and WhiteSteven Zaillian127.0128.0234.0581.0Anthony Hopkins14000.07221458.0Drama|Thriller...EnglishGermanyPG-1355000000.02006.012000.06.21.8501
1Black and WhiteRandy Moore143.090.013.0432.0Lee Armstrong977.0169719.0Fantasy|Horror...EnglishUSANot RatedNaN2013.0511.05.21.8500
2Black and WhiteTodd Haynes231.0135.0162.0228.0Heath Ledger23000.04001121.0Biography|Drama|Music...EnglishUSAR20000000.02007.013000.07.02.3500
3ColorMartin Brest94.0105.0102.0383.0Ronny Cox901.0234760500.0Action|Comedy|Crime...EnglishUSAR14000000.01984.0605.07.31.8500
4ColorDario Argento76.0120.0930.0433.0Adrienne Barbeau982.0349618.0Horror...EnglishItalyR9000000.01990.0602.06.11.853750
5ColorTyler Perry36.0113.00.0256.0Mary J. Blige607.051697449.0Comedy|Drama...EnglishUSAPG-1313000000.02009.0269.04.11.8510001
\n", + "

6 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Black and White Steven Zaillian 127.0 128.0 \n", + "1 Black and White Randy Moore 143.0 90.0 \n", + "2 Black and White Todd Haynes 231.0 135.0 \n", + "3 Color Martin Brest 94.0 105.0 \n", + "4 Color Dario Argento 76.0 120.0 \n", + "5 Color Tyler Perry 36.0 113.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 234.0 581.0 Anthony Hopkins \n", + "1 13.0 432.0 Lee Armstrong \n", + "2 162.0 228.0 Heath Ledger \n", + "3 102.0 383.0 Ronny Cox \n", + "4 930.0 433.0 Adrienne Barbeau \n", + "5 0.0 256.0 Mary J. Blige \n", + "\n", + " actor_1_facebook_likes gross genres ... language \\\n", + "0 14000.0 7221458.0 Drama|Thriller ... English \n", + "1 977.0 169719.0 Fantasy|Horror ... English \n", + "2 23000.0 4001121.0 Biography|Drama|Music ... English \n", + "3 901.0 234760500.0 Action|Comedy|Crime ... English \n", + "4 982.0 349618.0 Horror ... English \n", + "5 607.0 51697449.0 Comedy|Drama ... English \n", + "\n", + " country content_rating budget title_year actor_2_facebook_likes \\\n", + "0 Germany PG-13 55000000.0 2006.0 12000.0 \n", + "1 USA Not Rated NaN 2013.0 511.0 \n", + "2 USA R 20000000.0 2007.0 13000.0 \n", + "3 USA R 14000000.0 1984.0 605.0 \n", + "4 Italy R 9000000.0 1990.0 602.0 \n", + "5 USA PG-13 13000000.0 2009.0 269.0 \n", + "\n", + " imdb_score aspect_ratio movie_facebook_likes weights \n", + "0 6.2 1.85 0 1 \n", + "1 5.2 1.85 0 0 \n", + "2 7.0 2.35 0 0 \n", + "3 7.3 1.85 0 0 \n", + "4 6.1 1.85 375 0 \n", + "5 4.1 1.85 1000 1 \n", + "\n", + "[6 rows x 29 columns]" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby('color').apply(lambda x: x.sample(n=3)).reset_index(drop = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Bonus: get first and last rows of DataFrame" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...EnglishUSAPG-13237000000.02009.0936.07.91.78330001
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...EnglishUSAPG-13300000000.02007.05000.07.12.3501
\n", + "

2 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy ... \n", + "\n", + " language country content_rating budget title_year \\\n", + "0 English USA PG-13 237000000.0 2009.0 \n", + "1 English USA PG-13 300000000.0 2007.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \\\n", + "0 936.0 7.9 1.78 33000 \n", + "1 5000.0 7.1 2.35 0 \n", + "\n", + " weights \n", + "0 1 \n", + "1 1 \n", + "\n", + "[2 rows x 29 columns]" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
5041ColorDaniel Hsia14.0100.00.0489.0Daniel Henney946.010443.0Comedy|Drama|Romance...EnglishUSAPG-13NaN2012.0719.06.32.356601
5042ColorJon Gunn43.090.016.016.0Brian Herzlinger86.085222.0Documentary...EnglishUSAPG1100.02004.023.06.61.854561
\n", + "

2 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "5041 Color Daniel Hsia 14.0 100.0 \n", + "5042 Color Jon Gunn 43.0 90.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "5041 0.0 489.0 Daniel Henney \n", + "5042 16.0 16.0 Brian Herzlinger \n", + "\n", + " actor_1_facebook_likes gross genres ... language \\\n", + "5041 946.0 10443.0 Comedy|Drama|Romance ... English \n", + "5042 86.0 85222.0 Documentary ... English \n", + "\n", + " country content_rating budget title_year actor_2_facebook_likes \\\n", + "5041 USA PG-13 NaN 2012.0 719.0 \n", + "5042 USA PG 1100.0 2004.0 23.0 \n", + "\n", + " imdb_score aspect_ratio movie_facebook_likes weights \n", + "5041 6.3 2.35 660 1 \n", + "5042 6.6 1.85 456 1 \n", + "\n", + "[2 rows x 29 columns]" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.tail(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...languagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likesweights
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...EnglishUSAPG-13237000000.02009.0936.07.91.78330001
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...EnglishUSAPG-13300000000.02007.05000.07.12.3501
5041ColorDaniel Hsia14.0100.00.0489.0Daniel Henney946.010443.0Comedy|Drama|Romance...EnglishUSAPG-13NaN2012.0719.06.32.356601
5042ColorJon Gunn43.090.016.016.0Brian Herzlinger86.085222.0Documentary...EnglishUSAPG1100.02004.023.06.61.854561
\n", + "

4 rows × 29 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "5041 Color Daniel Hsia 14.0 100.0 \n", + "5042 Color Jon Gunn 43.0 90.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "5041 0.0 489.0 Daniel Henney \n", + "5042 16.0 16.0 Brian Herzlinger \n", + "\n", + " actor_1_facebook_likes gross genres \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy \n", + "5041 946.0 10443.0 Comedy|Drama|Romance \n", + "5042 86.0 85222.0 Documentary \n", + "\n", + " ... language country content_rating budget title_year \\\n", + "0 ... English USA PG-13 237000000.0 2009.0 \n", + "1 ... English USA PG-13 300000000.0 2007.0 \n", + "5041 ... English USA PG-13 NaN 2012.0 \n", + "5042 ... English USA PG 1100.0 2004.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \\\n", + "0 936.0 7.9 1.78 33000 \n", + "1 5000.0 7.1 2.35 0 \n", + "5041 719.0 6.3 2.35 660 \n", + "5042 23.0 6.6 1.85 456 \n", + "\n", + " weights \n", + "0 1 \n", + "1 1 \n", + "5041 1 \n", + "5042 1 \n", + "\n", + "[4 rows x 29 columns]" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# combine head and tail variant 1\n", + "rows = 2\n", + "df.head(rows).append(df.tail(rows))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 462d9c343ed9c2140120243731cd9e57bebfa968 Mon Sep 17 00:00:00 2001 From: softhints Date: Mon, 13 Jan 2020 14:45:49 +0200 Subject: [PATCH 46/76] pandas-random-sample-of-a-subset-of-a-dataframe-rows-or-columns --- ...ow-to-filter-results-of-value_counts.ipynb | 1019 +++++++++++++++++ 1 file changed, 1019 insertions(+) create mode 100644 notebooks/pandas/22.pandas-how-to-filter-results-of-value_counts.ipynb diff --git a/notebooks/pandas/22.pandas-how-to-filter-results-of-value_counts.ipynb b/notebooks/pandas/22.pandas-how-to-filter-results-of-value_counts.ipynb new file mode 100644 index 0000000..4fb69be --- /dev/null +++ b/notebooks/pandas/22.pandas-how-to-filter-results-of-value_counts.ipynb @@ -0,0 +1,1019 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 22. Pandas How to filter results of value_counts?" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
\n", + "

2 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "0 3054.0 English USA PG-13 237000000.0 \n", + "1 1238.0 English USA PG-13 300000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "0 2009.0 936.0 7.9 1.78 \n", + "1 2007.0 5000.0 7.1 2.35 \n", + "\n", + " movie_facebook_likes \n", + "0 33000 \n", + "1 0 \n", + "\n", + "[2 rows x 28 columns]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# sample of the data\n", + "df.head(2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. How value counts works" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 English\n", + "1 English\n", + "2 English\n", + "3 English\n", + "4 NaN\n", + " ... \n", + "5038 English\n", + "5039 English\n", + "5040 English\n", + "5041 English\n", + "5042 English\n", + "Name: language, Length: 5043, dtype: object" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "col = 'language'\n", + "df[col]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "English 4704\n", + "French 73\n", + "Spanish 40\n", + "Hindi 28\n", + "Mandarin 26\n", + "German 19\n", + "Japanese 18\n", + "Russian 11\n", + "Cantonese 11\n", + "Italian 11\n", + "Portuguese 8\n", + "Korean 8\n", + "Arabic 5\n", + "Hebrew 5\n", + "Swedish 5\n", + "Danish 5\n", + "Persian 4\n", + "Norwegian 4\n", + "Polish 4\n", + "Dutch 4\n", + "Chinese 3\n", + "Thai 3\n", + "Icelandic 2\n", + "Dari 2\n", + "Zulu 2\n", + "None 2\n", + "Romanian 2\n", + "Aboriginal 2\n", + "Indonesian 2\n", + "Panjabi 1\n", + "Kazakh 1\n", + "Kannada 1\n", + "Aramaic 1\n", + "Urdu 1\n", + "Dzongkha 1\n", + "Czech 1\n", + "Tamil 1\n", + "Bosnian 1\n", + "Telugu 1\n", + "Hungarian 1\n", + "Filipino 1\n", + "Mongolian 1\n", + "Slovenian 1\n", + "Greek 1\n", + "Vietnamese 1\n", + "Maya 1\n", + "Swahili 1\n", + "Name: language, dtype: int64" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[col].value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dtype('int64')" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[col].value_counts().dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([4704, 73, 40, 28, 26, 19, 18, 11, 11, 11, 8,\n", + " 8, 5, 5, 5, 5, 4, 4, 4, 4, 3, 3,\n", + " 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1,\n", + " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", + " 1, 1, 1])" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[col].value_counts().values" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['English', 'French', 'Spanish', 'Hindi', 'Mandarin', 'German',\n", + " 'Japanese', 'Russian', 'Cantonese', 'Italian', 'Portuguese', 'Korean',\n", + " 'Arabic', 'Hebrew', 'Swedish', 'Danish', 'Persian', 'Norwegian',\n", + " 'Polish', 'Dutch', 'Chinese', 'Thai', 'Icelandic', 'Dari', 'Zulu',\n", + " 'None', 'Romanian', 'Aboriginal', 'Indonesian', 'Panjabi', 'Kazakh',\n", + " 'Kannada', 'Aramaic', 'Urdu', 'Dzongkha', 'Czech', 'Tamil', 'Bosnian',\n", + " 'Telugu', 'Hungarian', 'Filipino', 'Mongolian', 'Slovenian', 'Greek',\n", + " 'Vietnamese', 'Maya', 'Swahili'],\n", + " dtype='object')" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[col].value_counts().index" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Filter value_counts with isin" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2388 Chinese\n", + "2740 Thai\n", + "3022 Chinese\n", + "3311 Thai\n", + "3427 Chinese\n", + "3659 Thai\n", + "Name: language, dtype: object" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['language'].isin(df['language'].value_counts()[df['language'].value_counts()==3].index)].language" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 True\n", + "1 True\n", + "2 True\n", + "3 True\n", + "4 False\n", + " ... \n", + "5038 True\n", + "5039 True\n", + "5040 True\n", + "5041 True\n", + "5042 True\n", + "Name: language, Length: 5043, dtype: bool" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[col].isin(df[col].value_counts().index)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "English False\n", + "French False\n", + "Spanish False\n", + "Hindi False\n", + "Mandarin False\n", + "German False\n", + "Japanese False\n", + "Russian False\n", + "Cantonese False\n", + "Italian False\n", + "Portuguese False\n", + "Korean False\n", + "Arabic False\n", + "Hebrew False\n", + "Swedish False\n", + "Danish False\n", + "Persian False\n", + "Norwegian False\n", + "Polish False\n", + "Dutch False\n", + "Chinese True\n", + "Thai True\n", + "Icelandic False\n", + "Dari False\n", + "Zulu False\n", + "None False\n", + "Romanian False\n", + "Aboriginal False\n", + "Indonesian False\n", + "Panjabi False\n", + "Kazakh False\n", + "Kannada False\n", + "Aramaic False\n", + "Urdu False\n", + "Dzongkha False\n", + "Czech False\n", + "Tamil False\n", + "Bosnian False\n", + "Telugu False\n", + "Hungarian False\n", + "Filipino False\n", + "Mongolian False\n", + "Slovenian False\n", + "Greek False\n", + "Vietnamese False\n", + "Maya False\n", + "Swahili False\n", + "Name: language, dtype: bool" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['language'].value_counts()==3" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Chinese 3\n", + "Thai 3\n", + "Name: language, dtype: int64" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['language'].value_counts()[df['language'].value_counts()==3].index" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 False\n", + "1 False\n", + "2 False\n", + "3 False\n", + "4 False\n", + " ... \n", + "5038 False\n", + "5039 False\n", + "5040 False\n", + "5041 False\n", + "5042 False\n", + "Name: language, Length: 5043, dtype: bool" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['language'].isin(df['language'].value_counts()[df['language'].value_counts()==1].index)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "2388 Chinese\n", + "2740 Thai\n", + "3022 Chinese\n", + "3311 Thai\n", + "3427 Chinese\n", + "3659 Thai\n", + "Name: language, dtype: object" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['language'].isin(df['language'].value_counts()[df['language'].value_counts()==3].index)]['language']" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "English 4704\n", + "French 73\n", + "Spanish 40\n", + "Hindi 28\n", + "Mandarin 26\n", + "German 19\n", + "Japanese 18\n", + "Russian 11\n", + "Cantonese 11\n", + "Italian 11\n", + "Name: language, dtype: int64" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['language'].value_counts()[df['language'].value_counts()> 10]" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 English\n", + "1 English\n", + "2 English\n", + "3 English\n", + "5 English\n", + " ... \n", + "5038 English\n", + "5039 English\n", + "5040 English\n", + "5041 English\n", + "5042 English\n", + "Name: language, Length: 4941, dtype: object" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['language'].isin(df['language'].value_counts()[df['language'].value_counts() > 10].index)].language" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['Chinese', 'Thai'], dtype=object)" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['language'].isin(df['language'].value_counts()[df['language'].value_counts() == 3].index)].language.unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Use group by and lambda to simulate filter on value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
2388ColorDanny Pang23.0107.015.027.0Angelica Lee82.0NaNAction...4.0ChineseChinaPG-13NaN2013.039.05.71.85124
2740ColorTony Jaa110.0110.00.07.0Petchtai Wongkamlao64.0102055.0Action...72.0ThaiThailandR300000000.02008.045.06.22.350
3022ColorMabel Cheung6.0130.03.02.0Ching Wan Lau215.0NaNDrama...6.0ChineseChinaNaN12000000.02015.027.06.22.354
3311ColorChatrichalerm Yukol31.0300.06.06.0Chatchai Plengpanich7.0454255.0Action|Adventure|Drama|History|War...47.0ThaiThailandR400000000.02001.06.06.61.85124
3427ColorDennie Gordon11.0114.029.011.0Ruby Lin163.050000.0Action|Adventure|Comedy|Romance...2.0ChineseChinaNaNNaN2013.020.05.12.3581
3659ColorPrachya Pinkaew112.0111.064.0380.0Nathan Jones778.011905519.0Action|Crime|Drama|Thriller...214.0ThaiThailandR200000000.02005.0635.07.11.850
\n", + "

6 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "2388 Color Danny Pang 23.0 107.0 \n", + "2740 Color Tony Jaa 110.0 110.0 \n", + "3022 Color Mabel Cheung 6.0 130.0 \n", + "3311 Color Chatrichalerm Yukol 31.0 300.0 \n", + "3427 Color Dennie Gordon 11.0 114.0 \n", + "3659 Color Prachya Pinkaew 112.0 111.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "2388 15.0 27.0 Angelica Lee \n", + "2740 0.0 7.0 Petchtai Wongkamlao \n", + "3022 3.0 2.0 Ching Wan Lau \n", + "3311 6.0 6.0 Chatchai Plengpanich \n", + "3427 29.0 11.0 Ruby Lin \n", + "3659 64.0 380.0 Nathan Jones \n", + "\n", + " actor_1_facebook_likes gross genres \\\n", + "2388 82.0 NaN Action \n", + "2740 64.0 102055.0 Action \n", + "3022 215.0 NaN Drama \n", + "3311 7.0 454255.0 Action|Adventure|Drama|History|War \n", + "3427 163.0 50000.0 Action|Adventure|Comedy|Romance \n", + "3659 778.0 11905519.0 Action|Crime|Drama|Thriller \n", + "\n", + " ... num_user_for_reviews language country content_rating \\\n", + "2388 ... 4.0 Chinese China PG-13 \n", + "2740 ... 72.0 Thai Thailand R \n", + "3022 ... 6.0 Chinese China NaN \n", + "3311 ... 47.0 Thai Thailand R \n", + "3427 ... 2.0 Chinese China NaN \n", + "3659 ... 214.0 Thai Thailand R \n", + "\n", + " budget title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "2388 NaN 2013.0 39.0 5.7 1.85 \n", + "2740 300000000.0 2008.0 45.0 6.2 2.35 \n", + "3022 12000000.0 2015.0 27.0 6.2 2.35 \n", + "3311 400000000.0 2001.0 6.0 6.6 1.85 \n", + "3427 NaN 2013.0 20.0 5.1 2.35 \n", + "3659 200000000.0 2005.0 635.0 7.1 1.85 \n", + "\n", + " movie_facebook_likes \n", + "2388 124 \n", + "2740 0 \n", + "3022 4 \n", + "3311 124 \n", + "3427 81 \n", + "3659 0 \n", + "\n", + "[6 rows x 28 columns]" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby('language').filter(lambda x: len(x) == 3)" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2388 Chinese\n", + "2740 Thai\n", + "3022 Chinese\n", + "3311 Thai\n", + "3427 Chinese\n", + "3659 Thai\n", + "Name: language, dtype: object" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby('language').filter(lambda x: len(x) == 3)['language']" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['Chinese', 'Thai'], dtype=object)" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby('language').filter(lambda x: len(x) == 3)['language'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Bonus: Which is faster?" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "100 loops, best of 3: 10.9 ms per loop\n" + ] + } + ], + "source": [ + "%timeit df.groupby('language').filter(lambda x: len(x) == 3)['language']" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "100 loops, best of 3: 3.19 ms per loop\n" + ] + } + ], + "source": [ + "%timeit df[df['language'].isin(df['language'].value_counts()[df['language'].value_counts()==3].index)]['language']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 597823658edffb34fef3086885d899bae3ed658b Mon Sep 17 00:00:00 2001 From: softhints Date: Mon, 13 Jan 2020 14:46:01 +0200 Subject: [PATCH 47/76] pandas-random-sample-of-a-subset-of-a-dataframe-rows-or-columns --- 22__pandas-how-to-filter-results-of-value_counts.patch | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 22__pandas-how-to-filter-results-of-value_counts.patch diff --git a/22__pandas-how-to-filter-results-of-value_counts.patch b/22__pandas-how-to-filter-results-of-value_counts.patch new file mode 100644 index 0000000..e69de29 From 6ffe304c3db255cb3dc6bf7480fb05fbd180d093 Mon Sep 17 00:00:00 2001 From: softhints Date: Tue, 14 Jan 2020 13:03:40 +0200 Subject: [PATCH 48/76] 23.pandas-typeerror-unhashable-type-list-dict.ipynb --- ...-typeerror-unhashable-type-list-dict.ipynb | 1056 +++++++++++++++++ 1 file changed, 1056 insertions(+) create mode 100644 notebooks/pandas/23.pandas-typeerror-unhashable-type-list-dict.ipynb diff --git a/notebooks/pandas/23.pandas-typeerror-unhashable-type-list-dict.ipynb b/notebooks/pandas/23.pandas-typeerror-unhashable-type-list-dict.ipynb new file mode 100644 index 0000000..281a81d --- /dev/null +++ b/notebooks/pandas/23.pandas-typeerror-unhashable-type-list-dict.ipynb @@ -0,0 +1,1056 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 23. Pandas TypeError: unhashable type: 'list'/'dict'\n", + "\n", + "Topics\n", + "\n", + "* apply value_counts for list/dict column\n", + "* value_counts for list column\n", + "* identify list/dict columns\n", + "* `TypeError: unhashable type: 'dict'`\n", + "* `TypeError: unhashable type: 'list'`\n", + "* Correct way to expand list column" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "pd.set_option('display.max_colwidth', -1)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame({'col1': [1, 2], 'col2': [[0.5, 0.1], [0.75, 0.25]],'col3': [{0:'a', 1:'b'}, {0:'c', 1:'d'}]})" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
01[0.5, 0.1]{0: 'a', 1: 'b'}
12[0.75, 0.25]{0: 'c', 1: 'd'}
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "0 1 [0.5, 0.1] {0: 'a', 1: 'b'}\n", + "1 2 [0.75, 0.25] {0: 'c', 1: 'd'}" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. TypeError: unhashable type: 'list'/'dict'" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "unhashable type: 'list'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# TypeError: unhashable type: 'list'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcol2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/base.py\u001b[0m in \u001b[0;36mvalue_counts\u001b[0;34m(self, normalize, sort, ascending, bins, dropna)\u001b[0m\n\u001b[1;32m 1390\u001b[0m \u001b[0mnormalize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mnormalize\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1391\u001b[0m \u001b[0mbins\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mbins\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1392\u001b[0;31m \u001b[0mdropna\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdropna\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1393\u001b[0m )\n\u001b[1;32m 1394\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36mvalue_counts\u001b[0;34m(values, sort, ascending, normalize, bins, dropna)\u001b[0m\n\u001b[1;32m 755\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 756\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 757\u001b[0;31m \u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcounts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_value_counts_arraylike\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdropna\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 758\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 759\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mIndex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36m_value_counts_arraylike\u001b[0;34m(values, dropna)\u001b[0m\n\u001b[1;32m 800\u001b[0m \u001b[0;31m# TODO: handle uint8\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 801\u001b[0m \u001b[0mf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhtable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"value_count_{dtype}\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mndtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 802\u001b[0;31m \u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcounts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdropna\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 803\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 804\u001b[0m \u001b[0mmask\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0misna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_func_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.value_count_object\u001b[0;34m()\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_func_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.value_count_object\u001b[0;34m()\u001b[0m\n", + "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'" + ] + } + ], + "source": [ + "# TypeError: unhashable type: 'list'\n", + "df.col2.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "unhashable type: 'dict'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# TypeError: unhashable type: 'dict'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcol3\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/base.py\u001b[0m in \u001b[0;36mvalue_counts\u001b[0;34m(self, normalize, sort, ascending, bins, dropna)\u001b[0m\n\u001b[1;32m 1390\u001b[0m \u001b[0mnormalize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mnormalize\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1391\u001b[0m \u001b[0mbins\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mbins\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1392\u001b[0;31m \u001b[0mdropna\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdropna\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1393\u001b[0m )\n\u001b[1;32m 1394\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36mvalue_counts\u001b[0;34m(values, sort, ascending, normalize, bins, dropna)\u001b[0m\n\u001b[1;32m 755\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 756\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 757\u001b[0;31m \u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcounts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_value_counts_arraylike\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdropna\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 758\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 759\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mIndex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36m_value_counts_arraylike\u001b[0;34m(values, dropna)\u001b[0m\n\u001b[1;32m 800\u001b[0m \u001b[0;31m# TODO: handle uint8\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 801\u001b[0m \u001b[0mf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhtable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"value_count_{dtype}\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mndtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 802\u001b[0;31m \u001b[0mkeys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcounts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdropna\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 803\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 804\u001b[0m \u001b[0mmask\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0misna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_func_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.value_count_object\u001b[0;34m()\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_func_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.value_count_object\u001b[0;34m()\u001b[0m\n", + "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'dict'" + ] + } + ], + "source": [ + "# TypeError: unhashable type: 'dict'\n", + "df.col3.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.groupby('col3').transform({'col1': [min], 'col2': max})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. How to detect if column contains list or dict" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "col1 int64 \n", + "col2 object\n", + "col3 object\n", + "dtype: object" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "col1 False\n", + "col2 True \n", + "col3 False\n", + "dtype: bool" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# detect list columns\n", + "df.applymap(lambda x: isinstance(x, list)).all()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "col1 False\n", + "col2 False\n", + "col3 True \n", + "dtype: bool" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# detect dict columns\n", + "df.applymap(lambda x: isinstance(x, dict)).all()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "col1 False\n", + "col2 True \n", + "col3 True \n", + "dtype: bool" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# detect dict or list columns\n", + "df.applymap(lambda x: isinstance(x, dict) or isinstance(x, list)).all()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.1 Convert the column to string and apply value_counts" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0.75, 0.25] 1\n", + "[0.5, 0.1] 1\n", + "Name: col2, dtype: int64" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['col2'].astype('str').value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{0: 'c', 1: 'd'} 1\n", + "{0: 'a', 1: 'b'} 1\n", + "Name: col3, dtype: int64" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['col3'].astype('str').value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.2 Convert the column to string and use group by" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "unhashable type: 'dict'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# TypeError: unhashable type: 'dict'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcol3\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnotna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroupby\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'col3'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcount\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/generic.py\u001b[0m in \u001b[0;36mcount\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1594\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1595\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_data_to_aggregate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1596\u001b[0;31m \u001b[0mids\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mngroups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgrouper\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroup_info\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1597\u001b[0m \u001b[0mmask\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mids\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1598\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32mpandas/_libs/properties.pyx\u001b[0m in \u001b[0;36mpandas._libs.properties.CachedProperty.__get__\u001b[0;34m()\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/ops.py\u001b[0m in \u001b[0;36mgroup_info\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 294\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0mcache_readonly\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 295\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mgroup_info\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 296\u001b[0;31m \u001b[0mcomp_ids\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mobs_group_ids\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_compressed_labels\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 297\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 298\u001b[0m \u001b[0mngroups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mobs_group_ids\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/ops.py\u001b[0m in \u001b[0;36m_get_compressed_labels\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 310\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 311\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_compressed_labels\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 312\u001b[0;31m \u001b[0mall_labels\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mping\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlabels\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mping\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroupings\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 313\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mall_labels\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 314\u001b[0m \u001b[0mgroup_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_group_index\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mall_labels\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxnull\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/ops.py\u001b[0m in \u001b[0;36m\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 310\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 311\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_compressed_labels\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 312\u001b[0;31m \u001b[0mall_labels\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mping\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlabels\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mping\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroupings\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 313\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mall_labels\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 314\u001b[0m \u001b[0mgroup_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_group_index\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mall_labels\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxnull\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/grouper.py\u001b[0m in \u001b[0;36mlabels\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 395\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mlabels\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 396\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_labels\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 397\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_labels\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 398\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_labels\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 399\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/groupby/grouper.py\u001b[0m in \u001b[0;36m_make_labels\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 419\u001b[0m \u001b[0muniques\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgrouper\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult_index\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 420\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 421\u001b[0;31m \u001b[0mlabels\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muniques\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0malgorithms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfactorize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgrouper\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 422\u001b[0m \u001b[0muniques\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mIndex\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0muniques\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 423\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_labels\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlabels\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/util/_decorators.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 206\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 207\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnew_arg_name\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnew_arg_value\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 208\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 209\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 210\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36mfactorize\u001b[0;34m(values, sort, order, na_sentinel, size_hint)\u001b[0m\n\u001b[1;32m 670\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 671\u001b[0m labels, uniques = _factorize_array(\n\u001b[0;32m--> 672\u001b[0;31m \u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mna_sentinel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mna_sentinel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msize_hint\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msize_hint\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mna_value\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mna_value\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 673\u001b[0m )\n\u001b[1;32m 674\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/algorithms.py\u001b[0m in \u001b[0;36m_factorize_array\u001b[0;34m(values, na_sentinel, size_hint, na_value)\u001b[0m\n\u001b[1;32m 506\u001b[0m \u001b[0mtable\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mhash_klass\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msize_hint\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 507\u001b[0m uniques, labels = table.factorize(\n\u001b[0;32m--> 508\u001b[0;31m \u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mna_sentinel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mna_sentinel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mna_value\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mna_value\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 509\u001b[0m )\n\u001b[1;32m 510\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.factorize\u001b[0;34m()\u001b[0m\n", + "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable._unique\u001b[0;34m()\u001b[0m\n", + "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'dict'" + ] + } + ], + "source": [ + "# TypeError: unhashable type: 'dict'\n", + "df[df.col3.notna()].groupby(['col3']).count()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col3
col2
[0.5, 0.1]11
[0.75, 0.25]11
\n", + "
" + ], + "text/plain": [ + " col1 col3\n", + "col2 \n", + "[0.5, 0.1] 1 1 \n", + "[0.75, 0.25] 1 1 " + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df.col2.notna()].astype('str').groupby(['col2']).count()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2
col3
{0: 'a', 1: 'b'}11
{0: 'c', 1: 'd'}11
\n", + "
" + ], + "text/plain": [ + " col1 col2\n", + "col3 \n", + "{0: 'a', 1: 'b'} 1 1 \n", + "{0: 'c', 1: 'd'} 1 1 " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df.col3.notna()].astype('str').groupby(['col3']).count()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Convert list/dict column to tuple" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(0.5, 0.1) 1\n", + "(0.75, 0.25) 1\n", + "Name: col2, dtype: int64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# for list\n", + "df['col2'].apply(tuple).value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(0, 1) 2\n", + "Name: col3, dtype: int64" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# for dict\n", + "df['col3'].apply(tuple).value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Expand the list column" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.75 1\n", + "0.50 1\n", + "Name: 0, dtype: int64" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.col2.apply(pd.Series)[0].value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.10 1\n", + "0.25 1\n", + "Name: 1, dtype: int64" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.col2.apply(pd.Series)[1].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. List column mixed: strings and list items" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame({'col1': [1, 2], 'col2': [[0.5], 3],'col3': [{0:'a', 1:'b'}, {0:'c', 1:'d'}]})" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
01[0.5]{0: 'a', 1: 'b'}
123{0: 'c', 1: 'd'}
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "0 1 [0.5] {0: 'a', 1: 'b'}\n", + "1 2 3 {0: 'c', 1: 'd'}" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3.0 1\n", + "0.5 1\n", + "Name: col2, dtype: int64" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.applymap(lambda x: x[0] if isinstance(x, list) else x)['col2'].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus Step #1: Correct way to expand list column" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame({'col1': [1, 2], 'col2': [[0.5, 0.1], [0.75, 0.25]],'col3': [{0:'a', 1:'b'}, {0:'c', 1:'d'}]})" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
01[0.5, 0.1]{0: 'a', 1: 'b'}
12[0.75, 0.25]{0: 'c', 1: 'd'}
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "0 1 [0.5, 0.1] {0: 'a', 1: 'b'}\n", + "1 2 [0.75, 0.25] {0: 'c', 1: 'd'}" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 NaN\n", + "1 NaN\n", + "Name: col2, dtype: float64" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.col2.str.split(',', expand=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01
0[0.50.1]
1[0.750.25]
\n", + "
" + ], + "text/plain": [ + " 0 1\n", + "0 [0.5 0.1] \n", + "1 [0.75 0.25]" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.col2.astype('str').str.split(',', expand=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01
00.500.10
10.750.25
\n", + "
" + ], + "text/plain": [ + " 0 1\n", + "0 0.50 0.10\n", + "1 0.75 0.25" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.col2.apply(pd.Series)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "df[['l1', 'l2']] = df.col2.apply(pd.Series)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
col1col2col3
l1l2
0.500.101[0.5, 0.1]{0: 'a', 1: 'b'}
0.750.252[0.75, 0.25]{0: 'c', 1: 'd'}
\n", + "
" + ], + "text/plain": [ + " col1 col2 col3\n", + "l1 l2 \n", + "0.50 0.10 1 [0.5, 0.1] {0: 'a', 1: 'b'}\n", + "0.75 0.25 2 [0.75, 0.25] {0: 'c', 1: 'd'}" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.set_index(['l1', 'l2'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From f780890b9004693be96110711d6f8d037e337e6b Mon Sep 17 00:00:00 2001 From: softhints Date: Sun, 26 Jan 2020 11:54:00 +0200 Subject: [PATCH 49/76] Think Python: Chapter 5 Conditionals and recursion --- ...hapter_5__Conditionals_and_recursion.ipynb | 1682 +++++++++++++++++ 1 file changed, 1682 insertions(+) create mode 100644 notebooks/Books/Think Python/Chapter_5__Conditionals_and_recursion.ipynb diff --git a/notebooks/Books/Think Python/Chapter_5__Conditionals_and_recursion.ipynb b/notebooks/Books/Think Python/Chapter_5__Conditionals_and_recursion.ipynb new file mode 100644 index 0000000..73f5439 --- /dev/null +++ b/notebooks/Books/Think Python/Chapter_5__Conditionals_and_recursion.ipynb @@ -0,0 +1,1682 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 5 Conditionals and recursion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Modulus operator\n", + "* Boolean expressions\n", + "* Logical operators\n", + "* Conditional and Alternative execution\n", + "* Chained and Nested conditionals\n", + "* Recursion and Infinite recursion\n", + "* Keyboard input\n", + "* Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.1 Floor division and modulus" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The main topic of this chapter is the if statement, which\n", + "executes different code depending on the state of the program.\n", + "But first I want to introduce two new operators: floor division\n", + "and modulus." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The floor division operator, //, divides\n", + "two numbers and rounds down to an integer. For example, suppose the\n", + "run time of a movie is 105 minutes. You might want to know how\n", + "long that is in hours. Conventional division\n", + "returns a floating-point number:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1.75" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "minutes = 105\n", + "minutes / 60" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But we don’t normally write hours with decimal points. Floor\n", + "division returns the integer number of hours, rounding down:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "minutes = 105\n", + "hours = minutes // 60\n", + "hours" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To get the remainder, you could subtract off one hour in minutes:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "45" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "remainder = minutes - hours * 60\n", + "remainder" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An alternative is to use the modulus operator, %, which\n", + "divides two numbers and returns the remainder." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "45" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "remainder = minutes % 60\n", + "remainder" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The modulus operator is more useful than it seems. For\n", + "example, you can check whether one number is divisible by another—if\n", + "x % y is zero, then x is divisible by y.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Also, you can extract the right-most digit\n", + "or digits from a number. For example, x % 10 yields the\n", + "right-most digit of x (in base 10). Similarly x % 100\n", + "yields the last two digits." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you are using Python 2, division works differently. The\n", + "division operator, /, performs floor division if both\n", + "operands are integers, and floating-point division if either\n", + "operand is a float.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.2 Boolean expressions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A boolean expression is an expression that is either true\n", + "or false. The following examples use the \n", + "operator ==, which compares two operands and produces\n", + "True if they are equal and False otherwise:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "5 == 5" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "5 == 6" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "True and False are special\n", + "values that belong to the type bool; they are not strings:\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "bool" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(True)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "bool" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(False)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "str" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type('True') ## Question?" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "str" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type('true')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The == operator is one of the relational operators; the\n", + "others are:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x != y # x is not equal to y\n", + "x > y # x is greater than y\n", + "x < y # x is less than y\n", + "x >= y # x is greater than or equal to y\n", + "x <= y # x is less than or equal to y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Although these operations are probably familiar to you, the Python\n", + "symbols are different from the mathematical symbols. A common error\n", + "is to use a single equal sign (=) instead of a double equal sign\n", + "(==). Remember that = is an assignment operator and\n", + "== is a relational operator. There is no such thing as\n", + "=< or =>.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.3 Logical operators" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are three logical operators: and, or, and not. The semantics (meaning) of these operators is\n", + "similar to their meaning in English. For example,\n", + "x > 0 and x < 10 is true only if x is greater than 0\n", + "and less than 10.\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = 5\n", + "x > 0 and x < 10" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = 15\n", + "x > 0 and x < 10" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "n%2 == 0 or n%3 == 0 is true if either or both of the\n", + "conditions is true, that is, if the number is divisible by 2 or\n", + "3." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "False True False\n", + "False False True\n", + "True True True\n", + "False False False\n" + ] + } + ], + "source": [ + "for n in [4,9,6, 7]:\n", + " print(n%2 == 0 and n%3 == 0, n%2 == 0, n%3 == 0 )" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True True False\n", + "True False True\n", + "True True True\n", + "False False False\n" + ] + } + ], + "source": [ + "for n in [4,9,6,7]:\n", + " print(n%2 == 0 or n%3 == 0, n%2 == 0, n%3 == 0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, the not operator negates a boolean\n", + "expression, so not (x > y) is true if x > y is false,\n", + "that is, if x is less than or equal to y." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "not True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Strictly speaking, the operands of the logical operators should be\n", + "boolean expressions, but Python is not very strict.\n", + "Any nonzero number is interpreted as True:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "42 and True" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "0 and True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This flexibility can be useful, but there are some subtleties to\n", + "it that might be confusing. You might want to avoid it (unless\n", + "you know what you are doing)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Bonus: Boolean algebra and Truth table\n", + "\n", + "* https://en.wikipedia.org/wiki/Boolean_algebra\n", + "* https://en.wikipedia.org/wiki/Truth_table" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.4 Conditional execution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "In order to write useful programs, we almost always need the ability\n", + "to check conditions and change the behavior of the program\n", + "accordingly. Conditional statements give us this ability. The\n", + "simplest form is the if statement:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "x is positive\n" + ] + } + ], + "source": [ + "x = 42\n", + "if x > 0:\n", + " print('x is positive')" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1 is positive\n", + "4 is positive\n" + ] + } + ], + "source": [ + "for x in [1, -2, 4]: ## Question?\n", + " if x > 0:\n", + " print(f'{x} is positive')" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "'>' not supported between instances of 'str' and 'int'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m'5'\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: '>' not supported between instances of 'str' and 'int'" + ] + } + ], + "source": [ + "'5' > 0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The boolean expression after if is\n", + "called the condition. If it is true, the indented\n", + "statement runs. If not, nothing happens.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "if statements have the same structure as function definitions:\n", + "a header followed by an indented body. Statements like this are\n", + "called compound statements." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There is no limit on the number of statements that can appear in\n", + "the body, but there has to be at least one.\n", + "Occasionally, it is useful to have a body with no statements (usually\n", + "as a place keeper for code you haven’t written yet). In that\n", + "case, you can use the pass statement, which does nothing.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = -42\n", + "if x < 0:\n", + " pass # TODO: need to handle negative values!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.5 Alternative execution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A second form of the if statement is “alternative execution”,\n", + "in which there are two possibilities and the condition determines\n", + "which one runs. The syntax looks like this:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "x is even\n" + ] + } + ], + "source": [ + "if x % 2 == 0:\n", + " print('x is even')\n", + "else:\n", + " print('x is odd')" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1 is odd\n", + "-2 is even\n", + "4 is even\n" + ] + } + ], + "source": [ + "# f-strings or string interpollation\n", + "\n", + "for x in [1, -2, 4]:\n", + " if x % 2 == 0:\n", + " print(f'{x} is even')\n", + " else:\n", + " print(f'{x} is odd')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the remainder when x is divided by 2 is 0, then we know that\n", + "x is even, and the program displays an appropriate message. If\n", + "the condition is false, the second set of statements runs.\n", + "Since the condition must be true or false, exactly one of the\n", + "alternatives will run. The alternatives are called branches, because they are branches in the flow of execution.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.6 Chained conditionals" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Sometimes there are more than two possibilities and we need more than\n", + "two branches. One way to express a computation like that is a chained conditional:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "x is less than y\n" + ] + } + ], + "source": [ + "y = 42\n", + "if x < y and 1:\n", + " print('x is less than y')\n", + "elif x > y:\n", + " print('x is greater than y')\n", + "else:\n", + " print('x and y are equal')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "elif is an abbreviation of “else if”. Again, exactly one\n", + "branch will run. There is no limit on the number of elif statements. If there is an else clause, it has to be\n", + "at the end, but there doesn’t have to be one.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if choice == 'a':\n", + " draw_a()\n", + "elif choice == 'b':\n", + " draw_b()\n", + "elif choice == 'c':\n", + " draw_c()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Each condition is checked in order. If the first is false,\n", + "the next is checked, and so on. If one of them is\n", + "true, the corresponding branch runs and the statement\n", + "ends. Even if more than one condition is true, only the\n", + "first true branch runs. " + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "100\n" + ] + } + ], + "source": [ + "if x < 100: ## Question: What will be the output?\n", + " print('100')\n", + "elif x < 101:\n", + " print('101')\n", + "elif x < 102:\n", + " print('102')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.7 Nested conditionals" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One conditional can also be nested within another. We could have\n", + "written the example in the previous section like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if x == y:\n", + " print('x and y are equal')\n", + "else:\n", + " if x < y:\n", + " print('x is less than y')\n", + " else:\n", + " print('x is greater than y')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The outer conditional contains two branches. The\n", + "first branch contains a simple statement. The second branch\n", + "contains another if statement, which has two branches of its\n", + "own. Those two branches are both simple statements,\n", + "although they could have been conditional statements as well." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Although the indentation of the statements makes the structure\n", + "apparent, nested conditionals become difficult to read very\n", + "quickly. It is a good idea to avoid them when you can." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Logical operators often provide a way to simplify nested conditional\n", + "statements. For example, we can rewrite the following code using a\n", + "single conditional:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if 0 < x:\n", + " if x < 10:\n", + " print('x is a positive single-digit number.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The print statement runs only if we make it past both\n", + "conditionals, so we can get the same effect with the and operator:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if 0 < x and x < 10:\n", + " print('x is a positive single-digit number.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For this kind of condition, Python provides a more concise option:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if 0 < x < 10:\n", + " print('x is a positive single-digit number.') ## Question?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.8 Recursion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is legal for one function to call another;\n", + "it is also legal for a function to call itself. It may not be obvious\n", + "why that is a good thing, but it turns out to be one of the most\n", + "magical things a program can do.\n", + "For example, look at the following function:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "def countdown(n):\n", + " if n <= 0:\n", + " print('Blastoff!')\n", + " else:\n", + " print(n)\n", + " countdown(n-1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If n is 0 or negative, it outputs the word, “Blastoff!”\n", + "Otherwise, it outputs n and then calls a function named countdown—itself—passing n-1 as an argument." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What happens if we call this function like this?" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3\n", + "2\n", + "1\n", + "Blastoff!\n" + ] + } + ], + "source": [ + "countdown(3) ## Question?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The execution of countdown begins with n=3, and since\n", + "n is greater than 0, it outputs the value 3, and then calls itself..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The countdown that got n=3 returns." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And then you’re back in __main__. So, the\n", + "total output looks like this:\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A function that calls itself is recursive; the process of\n", + "executing it is called recursion.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As another example, we can write a function that prints a\n", + "string n times." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "def print_n(s, n):\n", + " if n <= 0:\n", + " return\n", + " print(s)\n", + " print_n(s, n-1)" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "s\n", + "s\n" + ] + } + ], + "source": [ + "print_n('s', 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If n <= 0 the return statement exits the function. The\n", + "flow of execution immediately returns to the caller, and the remaining\n", + "lines of the function don’t run.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The rest of the function is similar to countdown: it displays\n", + "s and then calls itself to display s n−1 additional\n", + "times. So the number of lines of output is 1 + (n - 1), which\n", + "adds up to n." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For simple examples like this, it is probably easier to use a for loop. But we will see examples later that are hard to write\n", + "with a for loop and easy to write with recursion, so it is\n", + "good to start early.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.9 Stack diagrams for recursive functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In Section 3.9, we used a stack diagram to represent\n", + "the state of a program during a function call. The same kind of\n", + "diagram can help interpret a recursive function." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Every time a function gets called, Python creates a\n", + "frame to contain the function’s local variables and parameters.\n", + "For a recursive function, there might be more than one frame on the\n", + "stack at the same time." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Figure 5.1 shows a stack diagram for countdown called with\n", + "n = 3." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As usual, the top of the stack is the frame for __main__.\n", + "It is empty because we did not create any variables in \n", + "__main__ or pass any arguments to it.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The four countdown frames have different values for the\n", + "parameter n. The bottom of the stack, where n=0, is\n", + "called the base case. It does not make a recursive call, so\n", + "there are no more frames." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an exercise, draw a stack diagram for print_n called with\n", + "s = 'Hello' and n=2.\n", + "Then write a function called do_n that takes a function\n", + "object and a number, n, as arguments, and that calls\n", + "the given function n times." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.10 Infinite recursion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If a recursion never reaches a base case, it goes on making\n", + "recursive calls forever, and the program never terminates. This is\n", + "known as infinite recursion, and it is generally not\n", + "a good idea. Here is a minimal program with an infinite recursion:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "def recurse():\n", + " recurse()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In most programming environments, a program with infinite recursion\n", + "does not really run forever. Python reports an error\n", + "message when the maximum recursion depth is reached:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "ename": "RecursionError", + "evalue": "maximum recursion depth exceeded", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mRecursionError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mrecurse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m## Question?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mrecurse\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrecurse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mrecurse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "... last 1 frames repeated, from the frame below ...\n", + "\u001b[0;32m\u001b[0m in \u001b[0;36mrecurse\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrecurse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mrecurse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mRecursionError\u001b[0m: maximum recursion depth exceeded" + ] + } + ], + "source": [ + "recurse() ## Question?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This traceback is a little bigger than the one we saw in the\n", + "previous chapter. When the error occurs, there are 1000\n", + "recurse frames on the stack!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you encounter an infinite recursion by accident, review\n", + "your function to confirm that there is a base case that does not\n", + "make a recursive call. And if there is a base case, check whether\n", + "you are guaranteed to reach it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.11 Keyboard input" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The programs we have written so far accept no input from the user.\n", + "They just do the same thing every time." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Python provides a built-in function called input that\n", + "stops the program and\n", + "waits for the user to type something. When the user presses Return or Enter, the program resumes and input\n", + "returns what the user typed as a string. In Python 2, the same\n", + "function is called raw_input.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "x\n" + ] + } + ], + "source": [ + "text = input()" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'x'" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Before getting input from the user, it is a good idea to print a\n", + "prompt telling the user what to type. input can take a\n", + "prompt as an argument:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "What...is your name?\n", + "x\n" + ] + } + ], + "source": [ + "name = input('What...is your name?\\n')" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'x'" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "name" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The sequence \\n at the end of the prompt represents a newline, which is a special character that causes a line break.\n", + "That’s why the user’s input appears below the prompt. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you expect the user to type an integer, you can try to convert\n", + "the return value to int:" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "What...is the airspeed velocity of an unladen swallow?\n", + "100\n" + ] + } + ], + "source": [ + "prompt = 'What...is the airspeed velocity of an unladen swallow?\\n'\n", + "speed = input(prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'100'" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "speed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "But if the user types something other than a string of digits,\n", + "you get an error:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "speed = input(prompt)\n", + "What...is the airspeed velocity of an unladen swallow?\n", + "What do you mean, an African or a European swallow?\n", + "int(speed)\n", + "ValueError: invalid literal for int() with base 10" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "We will see how to handle this kind of error later.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "str" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(speed) ## Question?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.12 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When a syntax or runtime error occurs, the error message contains\n", + "a lot of information, but it can be overwhelming. The most\n", + "useful parts are usually:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Syntax errors are usually easy to find, but there are a few\n", + "gotchas. Whitespace errors can be tricky because spaces and\n", + "tabs are invisible and we are used to ignoring them.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [], + "source": [ + "x = 5 ## Question?\n", + "y = 6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In this example, the problem is that the second line is indented by\n", + "one space. But the error message points to y, which is\n", + "misleading. In general, error messages indicate where the problem was\n", + "discovered, but the actual error might be earlier in the code,\n", + "sometimes on a previous line.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The same is true of runtime errors. Suppose you are trying\n", + "to compute a signal-to-noise ratio in decibels. The formula\n", + "is SNRdb = 10 log10 (Psignal / Pnoise). In Python,\n", + "you might write something like this:" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "ename": "ValueError", + "evalue": "math domain error", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mnoise_power\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m10\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mratio\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msignal_power\u001b[0m \u001b[0;34m//\u001b[0m \u001b[0mnoise_power\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0mdecibels\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m10\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlog10\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mratio\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdecibels\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mValueError\u001b[0m: math domain error" + ] + } + ], + "source": [ + "import math\n", + "signal_power = 9\n", + "noise_power = 10\n", + "ratio = signal_power // noise_power\n", + "decibels = 10 * math.log10(ratio)\n", + "print(decibels)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "When you run this program, you get an exception:\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The error message indicates line 5, but there is nothing\n", + "wrong with that line. To find the real error, it might be\n", + "useful to print the value of ratio, which turns out to\n", + "be 0. The problem is in line 4, which uses floor division\n", + "instead of floating-point division.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should take the time to read error messages carefully, but don’t\n", + "assume that everything they say is correct." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5.13 Glossary" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 06f086acad6d4035bf5cd1180285c5a1e45e29f4 Mon Sep 17 00:00:00 2001 From: softhints Date: Sat, 1 Feb 2020 10:51:27 +0200 Subject: [PATCH 50/76] Chapter_6__Fruitful_functions --- .../Chapter_6__Fruitful_functions.ipynb | 1555 +++++++++++++++++ 1 file changed, 1555 insertions(+) create mode 100644 notebooks/Books/Think Python/Chapter_6__Fruitful_functions.ipynb diff --git a/notebooks/Books/Think Python/Chapter_6__Fruitful_functions.ipynb b/notebooks/Books/Think Python/Chapter_6__Fruitful_functions.ipynb new file mode 100644 index 0000000..500e7e4 --- /dev/null +++ b/notebooks/Books/Think Python/Chapter_6__Fruitful_functions.ipynb @@ -0,0 +1,1555 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 6  Fruitful functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "* Return values\n", + "* Incremental development\n", + "* Composition\n", + "* Boolean functions\n", + "* More recursion\n", + "* Leap of faith" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.1 Return values" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Many of the Python functions we have used, such as the math\n", + "functions, produce return values. But the functions we’ve written\n", + "are all void: they have an effect, like printing a value\n", + "or moving a turtle, but they don’t have a return value. In\n", + "this chapter you will learn to write fruitful functions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def print_str(s):\n", + " print(s)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def print(s):\n", + " print(s)\n", + "print(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def double_int(i):\n", + " return i * 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Calling the function generates a return\n", + "value, which we usually assign to a variable or use as part of an\n", + "expression." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "e = math.exp(1.0)\n", + "height = radius * math.sin(radians)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "math.exp(1.0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import math \n", + "math.sin(radians)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The functions we have written so far are void. Speaking casually,\n", + "they have no return value; more precisely,\n", + "their return value is None." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def print_me(s):\n", + " print(s)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print_me('x')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = print_me('x')\n", + "y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this chapter, we are (finally) going to write fruitful functions.\n", + "The first example is area, which returns the area of a circle\n", + "with the given radius:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def area(radius):\n", + " a = math.pi * radius**2\n", + " return a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "We have seen the return statement before, but in a fruitful\n", + "function the return statement includes\n", + "an expression. This statement means: “Return immediately from\n", + "this function and use the following expression as a return value.”\n", + "The expression can be arbitrarily complicated, so we could\n", + "have written this function more concisely:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def area(radius):\n", + " return math.pi * radius**2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "On the other hand, temporary variables like a can make\n", + "debugging easier.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Sometimes it is useful to have multiple return statements, one in each\n", + "branch of a conditional:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def absolute_value(x):\n", + " if x < 0:\n", + " return -x\n", + " else:\n", + " return x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Since these return statements are in an alternative conditional,\n", + "only one runs." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As soon as a return statement runs, the function\n", + "terminates without executing any subsequent statements.\n", + "Code that appears after a return statement, or any other place\n", + "the flow of execution can never reach, is called dead code.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def area_x(radius):\n", + " return 0\n", + " print('x')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "area_x(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In a fruitful function, it is a good idea to ensure\n", + "that every possible path through the program hits a\n", + "return statement. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def absolute_value(x):\n", + " if x < 0:\n", + " return -x\n", + " if x > 0:\n", + " return x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This function is incorrect because if x happens to be 0,\n", + "neither condition is true, and the function ends without hitting a\n", + "return statement. If the flow of execution gets to the end\n", + "of a function, the return value is None, which is not\n", + "the absolute value of 0.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(absolute_value(0))\n", + "None" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "By the way, Python provides a built-in function called \n", + "abs that computes absolute values.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an **exercise**, write a compare function that\n", + "takes two values, x and y, and returns 1 if x > y,\n", + "0 if x == y, and -1 if x < y.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Bonus** You can return more tham one variable from a function by using list/tuple etc" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "def area_y(radius):\n", + " return 0, 1, 3" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(0, 1, 3)" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "area_y(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.2 Incremental development" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note**: Have a clear \n", + "* use case\n", + "* specifications\n", + "* test results/case:\n", + "\n", + "> these values so that the horizontal distance is 3 and the\n", + "vertical distance is 4; that way, the result is 5, the hypotenuse \n", + "of a 3-4-5 triangle." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you write larger functions, you might find yourself\n", + "spending more time debugging." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To deal with increasingly complex programs,\n", + "you might want to try a process called\n", + "incremental development. The goal of incremental development\n", + "is to avoid long debugging sessions by adding and testing only\n", + "a small amount of code at a time.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an example, suppose you want to find the distance between two\n", + "points, given by the coordinates (x1, y1) and (x2, y2).\n", + "By the Pythagorean theorem, the distance is:\n", + "\n", + "distance = \t√(x2 − x1)2 + (y2 − y1)2\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first step is to consider what a distance function should\n", + "look like in Python. In other words, what are the inputs (parameters)\n", + "and what is the output (return value)?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this case, the inputs are two points, which you can represent\n", + "using four numbers. The return value is the distance represented by\n", + "a floating-point value." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Immediately you can write an outline of the function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def distance(x1, y1, x2, y2):\n", + " return 0.0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Obviously, this version doesn’t compute distances; it always returns\n", + "zero. But it is syntactically correct, and it runs, which means that\n", + "you can test it before you make it more complicated." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To test the new function, call it with sample arguments:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "distance(1, 2, 4, 6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "I chose these values so that the horizontal distance is 3 and the\n", + "vertical distance is 4; that way, the result is 5, the hypotenuse \n", + "of a 3-4-5 triangle. When testing a function, it is\n", + "useful to know the right answer.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "At this point we have confirmed that the function is syntactically\n", + "correct, and we can start adding code to the body.\n", + "A reasonable next step is to find the differences\n", + "x2 − x1 and y2 − y1. The next version stores those values in\n", + "temporary variables and prints them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def distance(x1, y1, x2, y2):\n", + " dx = x2 - x1\n", + " dy = y2 - y1\n", + " print('dx is', dx)\n", + " print('dy is', dy)\n", + " return 0.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "distance(1, 2, 4, 6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the function is working, it should display dx is 3 and \n", + "dy is 4. If so, we know that the function is getting the right\n", + "arguments and performing the first computation correctly. If not,\n", + "there are only a few lines to check." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we compute the sum of squares of dx and dy:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def distance(x1, y1, x2, y2):\n", + " dx = x2 - x1\n", + " dy = y2 - y1\n", + " dsquared = dx**2 + dy**2\n", + " print('dsquared is: ', dsquared)\n", + " return 0.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "distance(1, 2, 4, 6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Again, you would run the program at this stage and check the output\n", + "(which should be 25).\n", + "Finally, you can use math.sqrt to compute and return the result:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def distance(x1, y1, x2, y2):\n", + " dx = x2 - x1\n", + " dy = y2 - y1\n", + " dsquared = dx**2 + dy**2\n", + " result = math.sqrt(dsquared)\n", + " return result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "distance(1, 2, 4, 6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If that works correctly, you are done. Otherwise, you might\n", + "want to print the value of result before the return\n", + "statement." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The final version of the function doesn’t display anything when it\n", + "runs; it only returns a value. The print statements we wrote\n", + "are useful for debugging, but once you get the function working, you\n", + "should remove them. Code like that is called scaffolding\n", + "because it is helpful for building the program but is not part of the\n", + "final product.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you start out, you should add only a line or two of code at a\n", + "time. As you gain more experience, you might find yourself writing\n", + "and debugging bigger chunks. Either way, incremental development\n", + "can save you a lot of debugging time." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The key aspects of the process are:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an exercise, use incremental development to write a function\n", + "called hypotenuse that returns the length of the hypotenuse of a\n", + "right triangle given the lengths of the other two legs as arguments.\n", + "Record each stage of the development process as you go.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.3 Composition" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you should expect by now, you can call one function from within\n", + "another. As an example, we’ll write a function that takes two points,\n", + "the center of the circle and a point on the perimeter, and computes\n", + "the area of the circle." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Assume that the center point is stored in the variables xc and\n", + "yc, and the perimeter point is in xp and yp. The\n", + "first step is to find the radius of the circle, which is the distance\n", + "between the two points. We just wrote a function, distance, that does that:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "radius = distance(1, 2, 4, 6)\n", + "radius" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The next step is to find the area of a circle with that radius;\n", + "we just wrote that, too:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "result = area(radius)\n", + "result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Encapsulating these steps in a function, we get:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def circle_area(xc, yc, xp, yp): # 1, 2, 4, 6\n", + " radius = distance(xc, yc, xp, yp) # 5\n", + " result = area(radius) # 78.539\n", + " return result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The temporary variables radius and result are useful for\n", + "development and debugging, but once the program is working, we can\n", + "make it more concise by composing the function calls:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def circle_area(xc, yc, xp, yp):\n", + " return area(distance(xc, yc, xp, yp))\n", + "circle_area(1, 2, 4, 6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.4 Boolean functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Functions can return booleans, which is often convenient for hiding\n", + "complicated tests inside functions. \n", + "For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_divisible(x, y):\n", + " if x % y == 0:\n", + " return True\n", + " else:\n", + " return False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "It is common to give boolean functions names that sound like yes/no\n", + "questions; is_divisible returns either True or False\n", + "to indicate whether x is divisible by y." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is an example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_divisible(6, 4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_divisible(6, 3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_divisible(0, 0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result of the == operator is a boolean, so we can write the\n", + "function more concisely by returning it directly:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_divisible(x, y):\n", + " return x % y == 0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Boolean functions are often used in conditional statements:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if is_divisible(x, y):\n", + " print('x is divisible by y')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "It might be tempting to write something like:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if is_divisible(x, y) == True:\n", + " print('x is divisible by y')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "But the extra comparison is unnecessary." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an exercise, write a function is_between(x, y, z) that\n", + "returns True if x ≤ y ≤ z or False otherwise." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.5 More recursion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have only covered a small subset of Python, but you might\n", + "be interested to know that this subset is a complete\n", + "programming language, which means that anything that can be\n", + "computed can be expressed in this language. Any program ever written\n", + "could be rewritten using only the language features you have learned\n", + "so far (actually, you would need a few commands to control devices\n", + "like the mouse, disks, etc., but that’s all)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Proving that claim is a nontrivial exercise first accomplished by Alan\n", + "Turing, one of the first computer scientists (some would argue that he\n", + "was a mathematician, but a lot of early computer scientists started as\n", + "mathematicians). Accordingly, it is known as the Turing Thesis.\n", + "For a more complete (and accurate) discussion of the Turing Thesis,\n", + "I recommend Michael Sipser’s book Introduction to the\n", + "Theory of Computation." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To give you an idea of what you can do with the tools you have learned\n", + "so far, we’ll evaluate a few recursively defined mathematical\n", + "functions. A recursive definition is similar to a circular\n", + "definition, in the sense that the definition contains a reference to\n", + "the thing being defined. A truly circular definition is not very\n", + "useful:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you saw that definition in the dictionary, you might be annoyed. On\n", + "the other hand, if you looked up the definition of the factorial\n", + "function, denoted with the symbol !, you might get something like\n", + "this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "0! = 1 \n", + "n! = n (n−1)!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This definition says that the factorial of 0 is 1, and the factorial\n", + "of any other value, n, is n multiplied by the factorial of n−1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So 3! is 3 times 2!, which is 2 times 1!, which is 1 times\n", + "0!. Putting it all together, 3! equals 3 times 2 times 1 times 1,\n", + "which is 6.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you can write a recursive definition of something, you can\n", + "write a Python program to evaluate it. The first step is to decide\n", + "what the parameters should be. In this case it should be clear\n", + "that factorial takes an integer:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def factorial(n):" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the argument happens to be 0, all we have to do is return 1:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def factorial(n):\n", + " if n == 0:\n", + " return 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Otherwise, and this is the interesting part, we have to make a\n", + "recursive call to find the factorial of n−1 and then multiply it by\n", + "n:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def factorial(n):\n", + " if n == 0:\n", + " return 1\n", + " else:\n", + " recurse = factorial(n-1)\n", + " result = n * recurse\n", + " return result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The flow of execution for this program is similar to the flow of countdown in Section 5.8. If we call factorial\n", + "with the value 3:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since 3 is not 0, we take the second branch and calculate the factorial\n", + "of n-1..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The return value (2) is multiplied by n, which is 3, and the result, 6,\n", + "becomes the return value of the function call that started the whole\n", + "process.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Figure 6.1 shows what the stack diagram looks like for\n", + "this sequence of function calls." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The return values are shown being passed back up the stack. In each\n", + "frame, the return value is the value of result, which is the\n", + "product of n and recurse.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the last frame, the local\n", + "variables recurse and result do not exist, because\n", + "the branch that creates them does not run." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.6 Leap of faith\n", + "\n", + "#### flow of execution vs Leap of faith" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Following the flow of execution is one way to read programs, but\n", + "it can quickly become overwhelming. An\n", + "alternative is what I call the “leap of faith”. When you come to a\n", + "function call, instead of following the flow of execution, you assume that the function works correctly and returns the right\n", + "result." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In fact, you are already practicing this leap of faith when you use\n", + "built-in functions. When you call math.cos or math.exp,\n", + "you don’t examine the bodies of those functions. You just\n", + "assume that they work because the people who wrote the built-in\n", + "functions were good programmers." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def circle_area(xc, yc, xp, yp): # 1, 2, 4, 6\n", + " radius = distance(xc, yc, xp, yp) # 5\n", + " result = area(radius) # 78.539\n", + " return result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The same is true when you call one of your own functions. For\n", + "example, in Section 6.4, we wrote a function called \n", + "is_divisible that determines whether one number is divisible by\n", + "another. Once we have convinced ourselves that this function is\n", + "correct—by examining the code and testing—we can use the function\n", + "without looking at the body again.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_divisible(x, y):\n", + " if x % y == 0:\n", + " return True\n", + " else:\n", + " return False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The same is true of recursive programs. When you get to the recursive\n", + "call, instead of following the flow of execution, you should assume\n", + "that the recursive call works (returns the correct result) and then ask\n", + "yourself, “Assuming that I can find the factorial of n−1, can I\n", + "compute the factorial of n?” It is clear that you\n", + "can, by multiplying by n." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Of course, it’s a bit strange to assume that the function works\n", + "correctly when you haven’t finished writing it, but that’s why\n", + "it’s called a leap of faith!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.7 One more example" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "After factorial, the most common example of a recursively\n", + "defined mathematical function is fibonacci, which has the\n", + "following definition (see\n", + "http://en.wikipedia.org/wiki/Fibonacci_number):\n", + "\n", + "\n", + "```\n", + "fibonacci(0) = 0 \n", + " \t \tfibonacci(1) = 1 \n", + " \t \tfibonacci(n) = fibonacci(n−1) + fibonacci(n−2)\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Translated into Python, it looks like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def fibonacci(n):\n", + " if n == 0:\n", + " return 0\n", + " elif n == 1:\n", + " return 1\n", + " else:\n", + " return fibonacci(n-1) + fibonacci(n-2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you try to follow the flow of execution here, even for fairly\n", + "small values of n, your head explodes. But according to the\n", + "leap of faith, if you assume that the two recursive calls\n", + "work correctly, then it is clear that you get\n", + "the right result by adding them together.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for i in range(0,10):\n", + " print(fibonacci(i), end=', ')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fibonacci(8.0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fibonacci(8.5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.8 Checking types" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What happens if we call factorial and give it 1.5 as an argument?\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "factorial(1.5)\n", + "RuntimeError: Maximum recursion depth exceeded" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "It looks like an infinite recursion. How can that be? The function\n", + "has a base case—when n == 0. But if n is not an integer,\n", + "we can miss the base case and recurse forever.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the first recursive call, the value of n is 0.5.\n", + "In the next, it is -0.5. From there, it gets smaller\n", + "(more negative), but it will never be 0." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have two choices. We can try to generalize the factorial\n", + "function to work with floating-point numbers, or we can make factorial check the type of its argument. The first option is\n", + "called the gamma function and it’s a\n", + "little beyond the scope of this book. So we’ll go for the second.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use the built-in function isinstance to verify the type\n", + "of the argument. While we’re at it, we can also make sure the\n", + "argument is positive:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def factorial(n):\n", + " if not isinstance(n, int):\n", + " print('Factorial is only defined for integers.')\n", + " return None\n", + " elif n < 0:\n", + " print('Factorial is not defined for negative integers.')\n", + " return None\n", + " elif n == 0:\n", + " return 1\n", + " else:\n", + " return n * factorial(n-1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first base case handles nonintegers; the\n", + "second handles negative integers. In both cases, the program prints\n", + "an error message and returns None to indicate that something\n", + "went wrong:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(factorial('fred'))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(factorial(-2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If we get past both checks, we know that n is a non-negative integer, so we can prove that the recursion terminates.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This program demonstrates a pattern sometimes called a guardian.\n", + "The first two conditionals act as guardians, protecting the code that\n", + "follows from values that might cause an error. The guardians make it\n", + "possible to prove the correctness of the code." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In Section 11.4 we will see a more flexible alternative to printing\n", + "an error message: raising an exception." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.9 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Breaking a large program into smaller functions creates natural\n", + "checkpoints for debugging. If a function is not\n", + "working, there are three possibilities to consider:\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To rule out the first possibility, you can add a print statement\n", + "at the beginning of the function and display the values of the\n", + "parameters (and maybe their types). Or you can write code\n", + "that checks the preconditions explicitly.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the parameters look good, add a print statement before each\n", + "return statement and display the return value. If\n", + "possible, check the result by hand. Consider calling the\n", + "function with values that make it easy to check the result\n", + "(as in Section 6.2)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the function seems to be working, look at the function call\n", + "to make sure the return value is being used correctly (or used\n", + "at all!).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Adding print statements at the beginning and end of a function\n", + "can help make the flow of execution more visible.\n", + "For example, here is a version of factorial with\n", + "print statements:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def factorial(n):\n", + " space = ' ' * (4 * n)\n", + " print(space, 'factorial', n)\n", + " if n == 0:\n", + " print(space, 'returning 1')\n", + " return 1\n", + " else:\n", + " recurse = factorial(n-1)\n", + " result = n * recurse\n", + " print(space, 'returning', result)\n", + " return result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "space is a string of space characters that controls the\n", + "indentation of the output. Here is the result of factorial(4) :" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "factorial(4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you are confused about the flow of execution, this kind of\n", + "output can be helpful. It takes some time to develop effective\n", + "scaffolding, but a little bit of scaffolding can save a lot of debugging." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6.10 Glossary" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 97ddc318d544d5ffc08a8a04420b566510b35bf2 Mon Sep 17 00:00:00 2001 From: softhints Date: Sat, 1 Feb 2020 12:36:21 +0200 Subject: [PATCH 51/76] Chapter_6__Fruitful_functions --- .../Chapter_6__Fruitful_functions.ipynb | 208 +++++++++++++++--- 1 file changed, 176 insertions(+), 32 deletions(-) diff --git a/notebooks/Books/Think Python/Chapter_6__Fruitful_functions.ipynb b/notebooks/Books/Think Python/Chapter_6__Fruitful_functions.ipynb index 500e7e4..b47322b 100644 --- a/notebooks/Books/Think Python/Chapter_6__Fruitful_functions.ipynb +++ b/notebooks/Books/Think Python/Chapter_6__Fruitful_functions.ipynb @@ -46,7 +46,8 @@ "outputs": [], "source": [ "def print_str(s):\n", - " print(s)" + " print(s)\n", + "print_str(1)" ] }, { @@ -60,6 +61,15 @@ "print(1)" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "del print" + ] + }, { "cell_type": "code", "execution_count": null, @@ -67,7 +77,35 @@ "outputs": [], "source": [ "def double_int(i):\n", - " return i * 2" + " return i * 2\n", + "double_int(2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = print_str(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = double_int(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(x, y)" ] }, { @@ -105,7 +143,7 @@ "outputs": [], "source": [ "import math \n", - "math.sin(radians)" + "math.sin(5)" ] }, { @@ -344,7 +382,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -354,24 +392,31 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(0, 1, 3)" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "area_y(5)" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x,y,z = area_y(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1045,6 +1090,26 @@ " return result" ] }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "720" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "factorial(6)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1223,7 +1288,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ @@ -1250,9 +1315,17 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0, 1, 1, 2, 3, 5, 8, 13, 21, 34, " + ] + } + ], "source": [ "for i in range(0,10):\n", " print(fibonacci(i), end=', ')" @@ -1260,18 +1333,44 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "21" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "fibonacci(8.0)" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, "metadata": {}, - "outputs": [], + "outputs": [ + { + "ename": "RecursionError", + "evalue": "maximum recursion depth exceeded in comparison", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mRecursionError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfibonacci\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m8.5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mfibonacci\u001b[0;34m(n)\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfibonacci\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mfibonacci\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "... last 1 frames repeated, from the frame below ...\n", + "\u001b[0;32m\u001b[0m in \u001b[0;36mfibonacci\u001b[0;34m(n)\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfibonacci\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mfibonacci\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mRecursionError\u001b[0m: maximum recursion depth exceeded in comparison" + ] + } + ], "source": [ "fibonacci(8.5)" ] @@ -1345,7 +1444,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 12, "metadata": {}, "outputs": [], "source": [ @@ -1375,18 +1474,36 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 13, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Factorial is only defined for integers.\n", + "None\n" + ] + } + ], "source": [ "print(factorial('fred'))" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 14, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Factorial is not defined for negative integers.\n", + "None\n" + ] + } + ], "source": [ "print(factorial(-2))" ] @@ -1478,7 +1595,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 15, "metadata": {}, "outputs": [], "source": [ @@ -1506,9 +1623,36 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 16, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " factorial 4\n", + " factorial 3\n", + " factorial 2\n", + " factorial 1\n", + " factorial 0\n", + " returning 1\n", + " returning 1\n", + " returning 2\n", + " returning 6\n", + " returning 24\n" + ] + }, + { + "data": { + "text/plain": [ + "24" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "factorial(4)" ] From 41a111f794951d7410464b8e6b40c320317331bb Mon Sep 17 00:00:00 2001 From: softhints Date: Sat, 7 Mar 2020 11:55:03 +0200 Subject: [PATCH 52/76] Think_Python_Chapter_7__Iteration --- .../Think_Python_Chapter_7__Iteration.ipynb | 1198 +++++++++++++++++ 1 file changed, 1198 insertions(+) create mode 100644 notebooks/Books/Think Python/Think_Python_Chapter_7__Iteration.ipynb diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_7__Iteration.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_7__Iteration.ipynb new file mode 100644 index 0000000..82a8cd7 --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_7__Iteration.ipynb @@ -0,0 +1,1198 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 7  Iteration" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* 7.1  Reassignment\n", + "* 7.2  Updating variables\n", + "* 7.3  The while statement\n", + "* 7.4  break\n", + "* 7.5  Square roots\n", + "* 7.6  Algorithms\n", + "* 7.7  Debugging - Demo break the problem in half\n", + "\n", + "\n", + "This chapter is about iteration, which is the ability to run\n", + "a block of statements repeatedly. We saw a kind of iteration,\n", + "using recursion, in Section 5.8.\n", + "We saw another kind, using a for loop,\n", + "in Section 4.2. In this chapter we’ll see yet another\n", + "kind, using a while statement." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.1 Reassignment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But first I want to say a little more about variable assignment." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you may have discovered, it is legal to make more than one\n", + "assignment to the same variable. A new assignment makes an existing\n", + "variable refer to a new value (and stop referring to the old value)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = 5\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "7" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = 7\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first time we display \n", + "x, its value is 5; the second time, its\n", + "value is 7." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Figure 7.1 shows what reassignment looks\n", + "like in a state diagram. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "At this point I want to address a common source of\n", + "confusion.\n", + "**Because Python uses the equal sign (=) for assignment, it is\n", + "tempting to interpret a statement like a = b as a\n", + "mathematical\n", + "proposition of equality**; that is, the claim that a and\n", + "b are equal. But this interpretation is wrong.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, equality is a symmetric relationship and assignment is not. For\n", + "example, in mathematics, if `a=7` then `7=a`. But in Python, the\n", + "statement `a = 7` is legal and `7 = a` is not." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Also, in mathematics, a proposition of equality is either true or\n", + "false for all time. If `a=b` now, then a will always equal b.\n", + "In Python, an assignment statement can make two variables equal, but\n", + "they don’t have to stay that way:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "a = 5" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "b = a # a and b are now equal" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "a = 3 # are a and b equal ?" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "b" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The third line changes the value of a but does not change the\n", + "value of b, so they are no longer equal. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Reassigning variables is often useful, but you should use it\n", + "with caution. If the values of variables change frequently, it can\n", + "make the code difficult to read and debug." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus: Python Constant\n", + "\n", + "https://docs.python.org/3/library/typing.html#typing.Final " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "MAX_SIZE: Final = 9000\n", + "MAX_SIZE += 1 # Error reported by type checker\n", + "\n", + "class Connection:\n", + " TIMEOUT: Final[int] = 10\n", + "\n", + "class FastConnector(Connection):\n", + " TIMEOUT = 1 # Error reported by type checker" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.2 Updating variables" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A common kind of reassignment is an update,\n", + "where the new value of the variable depends on the old." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "x = x + 1" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "8" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This means “get the current value of x, add one, and then\n", + "update x with the new value.”" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you try to update a variable that doesn’t exist, you get an\n", + "error, because Python evaluates the right side before it assigns\n", + "a value to x:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "8" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "ename": "NameError", + "evalue": "name 'x' is not defined", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdel\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mNameError\u001b[0m: name 'x' is not defined" + ] + } + ], + "source": [ + "del x\n", + "x = x + 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Before you can update a variable, you have to initialize\n", + "it, usually with a simple assignment:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "x = 0\n", + "x = x + 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Updating a variable by adding 1 is called an increment;\n", + "subtracting 1 is called a decrement.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus: Why are there no ++ and --​ operators in Python?\n", + "\n", + "https://stackoverflow.com/questions/3654830/why-are-there-no-and-operators-in-python\n", + "\n", + "1) Simple increment and decrement aren't needed as much as in other languages. You don't write things like \n", + "`for(int i = 0; i < 10; ++i)` \n", + "in Python very often; instead you do things like \n", + "`for i in range(0, 10)`.\n", + "\n", + "2) Python is a lot about **clarity** and no programmer is likely to correctly guess the meaning of --a unless s/he's learned a language having that construct." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x++\n", + "++x" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x+=1\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x-=1\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.3 The while statement" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Computers are often used to automate repetitive tasks. Repeating\n", + "identical or similar tasks without making errors is something that\n", + "computers do well and people do poorly. In a computer program,\n", + "repetition is also called iteration." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have already seen two functions, countdown and\n", + "print_n, that iterate using recursion. Because iteration is so\n", + "common, Python provides language features to make it easier.\n", + "One is the for statement we saw in Section 4.2.\n", + "We’ll get back to that later." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another is the while statement. Here is a version of countdown that uses a while statement:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def countdown(n):\n", + " while n > 0:\n", + " print(n)\n", + " n = n - 1\n", + " print('Blastoff!')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "You can almost read the while statement as if it were English.\n", + "It means, “While n is greater than 0,\n", + "display the value of n and then decrement\n", + "n. When you get to 0, display the word Blastoff!”\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "More formally, here is the flow of execution for a while statement:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This type of flow is called a loop because the third step\n", + "loops back around to the top. \n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The body of the loop should change the value of one or more variables\n", + "so that the condition becomes false eventually and the loop\n", + "terminates. Otherwise the loop will repeat forever, which is called\n", + "an infinite loop. An endless source of amusement for computer\n", + "scientists is the observation that the directions on shampoo,\n", + "“Lather, rinse, repeat”, are an infinite loop.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the case of countdown, we can prove that the loop\n", + "terminates: if n is zero or negative, the loop never runs.\n", + "Otherwise, n gets smaller each time through the\n", + "loop, so eventually we have to get to 0." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For some other loops, it is not so easy to tell. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def sequence(n):\n", + " while n != 1:\n", + " print(n)\n", + " if n % 2 == 0: # n is even\n", + " n = n / 2\n", + " else: # n is odd\n", + " n = n*3 + 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The condition for this loop is n != 1, so the loop will continue\n", + "until n is 1, which makes the condition false." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each time through the loop, the program outputs the value of n\n", + "and then checks whether it is even or odd. If it is even, n is\n", + "divided by 2. If it is odd, the value of n is replaced with\n", + "n*3 + 1. For example, if the argument passed to sequence\n", + "is 3, the resulting values of n are 3, 10, 5, 16, 8, 4, 2, 1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since n sometimes increases and sometimes decreases, there is no\n", + "obvious proof that n will ever reach 1, or that the program\n", + "terminates. For some particular values of n, we can prove\n", + "termination. For example, if the starting value is a power of two,\n", + "n will be even every time through the loop\n", + "until it reaches 1. The previous example ends with such a sequence,\n", + "starting with 16.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The hard question is whether we can prove that this program terminates\n", + "for all positive values of n. So far, no one has\n", + "been able to prove it or disprove it! (See\n", + "http://en.wikipedia.org/wiki/Collatz_conjecture.)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an exercise, rewrite the function print_n from\n", + "Section 5.8 using iteration instead of recursion." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus: Collatz conjecture sequence" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "def collatz_sequence(x):\n", + " seq = [x]\n", + " if x < 1:\n", + " return []\n", + " while x > 1:\n", + " if x % 2 == 0:\n", + " x = x / 2\n", + " else:\n", + " x = 3 * x + 1 \n", + " seq.append(x)\n", + " return seq" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[5, 16, 8.0, 4.0, 2.0, 1.0]" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "collatz_sequence(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "![alt text](https://wikimedia.org/api/rest_v1/media/math/render/svg/ec22031bdc2a1ab2e4effe47ae75a836e7dea459)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Resources\n", + "\n", + "* [Project Euler is a series of challenging mathematical/computer programming problems ](https://projecteuler.net/)\n", + "* [Collatz Conjecture in Color - Numberphile](https://www.youtube.com/watch?v=LqKpkdRRLZw)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.4 break" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Sometimes you don’t know it’s time to end a loop until you get half\n", + "way through the body. In that case you can use the break\n", + "statement to jump out of the loop." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For example, suppose you want to take input from the user until they\n", + "type done. You could write:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "> d\n", + "d\n", + "> wed\n", + "wed\n", + "> done\n", + "Done!\n" + ] + } + ], + "source": [ + "while True:\n", + " line = input('> ')\n", + " if line == 'done':\n", + " break\n", + " print(line)\n", + "\n", + "print('Done!')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The loop condition is True, which is always true, so the\n", + "loop runs until it hits the break statement." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each time through, it prompts the user with an angle bracket.\n", + "If the user types done, the break statement exits\n", + "the loop. Otherwise the program echoes whatever the user types\n", + "and goes back to the top of the loop. Here’s a sample run:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "> not done\n", + "not done\n", + "> done\n", + "Done!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This way of writing while loops is common because you\n", + "can check the condition anywhere in the loop (not just at the\n", + "top) and you can express the stop condition affirmatively\n", + "(“stop when this happens”) rather than negatively (“keep going\n", + "until that happens”)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.5 Square roots" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Loops are often used in programs that compute\n", + "numerical results by starting with an approximate answer and\n", + "iteratively improving it.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For example, one way of computing square roots is Newton’s method.\n", + "Suppose that you want to know the square root of a. If you start\n", + "with almost any estimate, x, you can compute a better\n", + "estimate with the following formula:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "For example, if a is 4 and x is 3:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.1666666666666665" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a = 4\n", + "x = 3\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result is closer to the correct answer (√4 = 2). If we\n", + "repeat the process with the new estimate, it gets even closer:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.0064102564102564" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = y\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "After a few more updates, the estimate is almost exact:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.0000102400262145" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = y\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.0000000000262146" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = y\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In general we don’t know ahead of time how many steps it takes\n", + "to get to the right answer, but we know when we get there\n", + "because the estimate\n", + "stops changing:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.0" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = y\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.0" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = y\n", + "y = (x + a/x) / 2\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "When y == x, we can stop. Here is a loop that starts\n", + "with an initial estimate, x, and improves it until it\n", + "stops changing:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "while True:\n", + " print(x)\n", + " y = (x + a/x) / 2\n", + " if y == x:\n", + " break\n", + " x = y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "For most values of a this works fine, but in general it is\n", + "dangerous to test float equality.\n", + "Floating-point values are only approximately right:\n", + "most rational numbers, like 1/3, and irrational numbers, like\n", + "√2, can’t be represented exactly with a float.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Rather than checking whether x and y are exactly equal, it\n", + "is safer to use the built-in function abs to compute the\n", + "absolute value, or magnitude, of the difference between them:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + " if abs(y-x) < epsilon:\n", + " break" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Where epsilon has a value like 0.0000001 that\n", + "determines how close is close enough." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.6 Algorithms" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Newton’s method is an example of an algorithm: it is a\n", + "mechanical process for solving a category of problems (in this\n", + "case, computing square roots)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To understand what an algorithm is, it might help to start with\n", + "something that is not an algorithm. When you learned to multiply\n", + "single-digit numbers, you probably memorized the multiplication table.\n", + "In effect, you memorized 100 specific solutions. That kind of\n", + "knowledge is not algorithmic." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But if you were “lazy”, you might have learned a few\n", + "tricks. For example, to find the product of n and 9, you can\n", + "write n−1 as the first digit and 10−n as the second\n", + "digit. This trick is a general solution for multiplying any\n", + "single-digit number by 9. That’s an algorithm!\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Similarly, the techniques you learned for addition with carrying,\n", + "subtraction with borrowing, and long division are all algorithms. One\n", + "of the characteristics of algorithms is that they do not require any\n", + "intelligence to carry out. They are mechanical processes where\n", + "each step follows from the last according to a simple set of rules." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Executing algorithms is boring, but designing them is interesting,\n", + "intellectually challenging, and a central part of computer science." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Some of the things that people do naturally, without difficulty or\n", + "conscious thought, are the hardest to express algorithmically.\n", + "Understanding natural language is a good example. We all do it, but\n", + "so far no one has been able to explain how we do it, at least\n", + "not in the form of an algorithm." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7.7 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you start writing bigger programs, you might find yourself\n", + "spending more time debugging. More code means more chances to\n", + "make an error and more places for bugs to hide.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One way to cut your debugging time is “debugging by bisection”.\n", + "For example, if there are 100 lines in your program and you\n", + "check them one at a time, it would take 100 steps." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Instead, try to break the problem in half. Look at the middle\n", + "of the program, or near it, for an intermediate value you\n", + "can check. Add a print statement (or something else\n", + "that has a verifiable effect) and run the program." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the mid-point check is incorrect, there must be a problem in the\n", + "first half of the program. If it is correct, the problem is\n", + "in the second half." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Every time you perform a check like this, you halve the number of\n", + "lines you have to search. After six steps (which is fewer than 100),\n", + "you would be down to one or two lines of code, at least in theory." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In practice it is not always clear what\n", + "the “middle of the program” is and not always possible to\n", + "check it. It doesn’t make sense to count lines and find the\n", + "exact midpoint. Instead, think about places\n", + "in the program where there might be errors and places where it\n", + "is easy to put a check. Then choose a spot where you\n", + "think the chances are about the same that the bug is before\n", + "or after the check." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 4e7169b03a46b7cdf5bb2b91c275563761058fa5 Mon Sep 17 00:00:00 2001 From: softhints Date: Mon, 16 Mar 2020 08:01:37 +0200 Subject: [PATCH 53/76] Think_Python_Chapter_8__Strings --- ...apter_4__Case_study_interface_design.ipynb | 2 +- .../Think_Python_Chapter_8__Strings.ipynb | 1627 +++++ notebooks/Books/Think Python/ch7_debug.py | 52 + ...ch in column, every column and regex.ipynb | 2 +- notebooks/Python Extract Table from PDF.ipynb | 2994 +--------- ...of lists by a specific index,pattern.ipynb | 424 ++ ...e wiki tables with pandas and python.ipynb | 2 +- ...is_the_usage_of_*_asterisk_in_Python.ipynb | 4 +- ...en_two_dates_-_DataFrame_or_CSV_file.ipynb | 124 +- ...ount_values_in_a_column_of_type_list.ipynb | 2 +- notebooks/youtube/Youtube-PewDiePie.ipynb | 5215 ++--------------- scripts/__init__.py | 0 test.py | 2 +- 13 files changed, 2892 insertions(+), 7558 deletions(-) create mode 100644 notebooks/Books/Think Python/Think_Python_Chapter_8__Strings.ipynb create mode 100644 notebooks/Books/Think Python/ch7_debug.py create mode 100644 notebooks/Python group and sort a list of lists by a specific index,pattern.ipynb create mode 100644 scripts/__init__.py diff --git a/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb b/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb index 853b283..217df97 100644 --- a/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb +++ b/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb @@ -610,7 +610,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.8" } }, "nbformat": 4, diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_8__Strings.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_8__Strings.ipynb new file mode 100644 index 0000000..56f6504 --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_8__Strings.ipynb @@ -0,0 +1,1627 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 8  Strings\n", + "\n", + "* 8.1  A string is a sequence\n", + "* 8.2  len\n", + "* 8.3  Traversal with a for loop\n", + "* 8.4  String slices\n", + "* 8.5  Strings are immutable\n", + "* 8.6  Searching\n", + "* 8.7  Looping and counting\n", + "* 8.8  String methods\n", + "* 8.9  The in operator\n", + "* 8.10  String comparison\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "https://en.wikipedia.org/wiki/ASCII\n", + "![strings_in_python](strings_in_python.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.1 A string is a sequence" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Strings are not like integers, floats, and booleans. A string\n", + "is a sequence, which means it is\n", + "an ordered collection of other values. In this chapter you’ll see\n", + "how to access the characters that make up a string, and you’ll\n", + "learn about some of the methods strings provide.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "\n", + "\n", + "A string is a sequence of characters. \n", + "You can access the characters one at a time with the\n", + "bracket operator:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "fruit = 'banana'\n", + "letter = fruit[1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The second statement selects character number 1 from fruit and assigns it to letter. \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The expression in brackets is called an index. \n", + "The index indicates which character in the sequence you\n", + "want (hence the name)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But you might not get what you expect:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'a'" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letter" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "For most people, the first letter of 'banana' is b, not\n", + "a. But for computer scientists, the index is an offset from the\n", + "beginning of the string, and the offset of the first letter is zero." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'b'" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letter = fruit[0]\n", + "letter" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "So b is the 0th letter (“zero-eth”) of 'banana', a is the 1th letter (“one-eth”), and n is the 2th letter\n", + "(“two-eth”). " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an index you can use an expression that contains variables and\n", + "operators:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'a'" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "i = 1\n", + "fruit[i]" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'n'" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fruit[i+1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But the value of the index has to be an integer. Otherwise you\n", + "get:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "string indices must be integers", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mletter\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfruit\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1.5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: string indices must be integers" + ] + } + ], + "source": [ + "letter = fruit[1.5]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.2 len" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "len is a built-in function that returns the number of characters\n", + "in a string:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fruit = 'banana'\n", + "len(fruit)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "To get the last letter of a string, you might be tempted to try something\n", + "like this:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "length = len(fruit)\n", + "last = fruit[length]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The reason for the IndexError is that there is no letter in ’banana’ with the index 6. Since we started counting at zero, the\n", + "six letters are numbered 0 to 5. To get the last character, you have\n", + "to subtract 1 from length:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "last = fruit[length-1]\n", + "last" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Or you can use negative indices, which count backward from\n", + "the end of the string. The expression fruit[-1] yields the last\n", + "letter, fruit[-2] yields the second to last, and so on.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fruit[-1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.3 Traversal with a for loop" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A lot of computations involve processing a string one character at a\n", + "time. Often they start at the beginning, select each character in\n", + "turn, do something to it, and continue until the end. This pattern of\n", + "processing is called a traversal. One way to write a traversal\n", + "is with a while loop:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "index = 0\n", + "while index < len(fruit):\n", + " letter = fruit[index]\n", + " print(letter, end='')\n", + " index = index + 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This loop traverses the string and displays each letter on a line by\n", + "itself. The loop condition is index < len(fruit), so\n", + "when index is equal to the length of the string, the\n", + "condition is false, and the body of the loop doesn’t run. The\n", + "last character accessed is the one with the index len(fruit)-1,\n", + "which is the last character in the string." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exercise" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an **exercise**, write a function that takes a string as an argument\n", + "and displays the letters backward, one per line." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "index = len(fruit) -1\n", + "while index >= 0:\n", + " letter = fruit[index]\n", + " print(letter)\n", + " index -= 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another way to write a traversal is with a for loop:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for letter in fruit:\n", + " print(letter)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Each time through the loop, the next character in the string is assigned\n", + "to the variable letter. The loop continues until no characters are\n", + "left.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following example shows how to use concatenation (string addition)\n", + "and a for loop to generate an abecedarian series (that is, in\n", + "alphabetical order). In Robert McCloskey’s book Make\n", + "Way for Ducklings, the names of the ducklings are Jack, Kack, Lack,\n", + "Mack, Nack, Ouack, Pack, and Quack. This loop outputs these names in\n", + "order:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "prefixes = 'JKLMNOPQ'\n", + "suffix = 'ack'\n", + "\n", + "for letter in prefixes:\n", + " print(letter + suffix)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The output is:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Jack\n", + "Kack\n", + "Lack\n", + "Mack\n", + "Nack\n", + "Oack\n", + "Pack\n", + "Qack" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Of course, that’s not quite right because “Ouack” and “Quack” are\n", + "misspelled. As an exercise, modify the program to fix this error." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.4 String slices" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A segment of a string is called a slice. Selecting a slice is\n", + "similar to selecting a character:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'Monty Python'\n", + "s[0:5]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s[6:12]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The operator [n:m] returns the part of the string from the \n", + "“n-eth” character to the “m-eth” character, including the first but\n", + "excluding the last. This behavior is counterintuitive, but it might\n", + "help to imagine the indices pointing between the\n", + "characters, as in Figure 8.1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you omit the first index (before the colon), the slice starts at\n", + "the beginning of the string. If you omit the second index, the slice\n", + "goes to the end of the string:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fruit = 'banana'\n", + "fruit[:3]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fruit[7:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the first index is greater than or equal to the second the result\n", + "is an empty string, represented by two quotation marks:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fruit = 'banana'\n", + "fruit[3:3]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "An empty string contains no characters and has length 0, but other\n", + "than that, it is the same as any other string." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Continuing this example, what do you think \n", + "fruit[:] means? Try it and see.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus: Extended Slices\n", + "\n", + "https://docs.python.org/2/whatsnew/2.3.html#extended-slices\n", + "\n", + "[begin:end:step]\n", + "\n", + "* leaving begin and end off\n", + "* specify a step of -1" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'ananab'" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fruit[::-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'bnn'" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fruit[::2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.5 Strings are immutable" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is tempting to use the [] operator on the left side of an\n", + "assignment, with the intention of changing a character in a string.\n", + "For example:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "'str' object does not support item assignment", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mgreeting\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'Hello, world!'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mgreeting\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'J'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment" + ] + } + ], + "source": [ + "greeting = 'Hello, world!'\n", + "greeting[0] = 'J'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The “object” in this case is the string and the “item” is\n", + "the character you tried to assign. For now, an object is\n", + "the same thing as a value, but we will refine that definition\n", + "later (Section 10.10). \n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The reason for the error is that\n", + "strings are immutable, which means you can’t change an\n", + "existing string. The best you can do is create a new string\n", + "that is a variation on the original:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Jello, world!'" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "greeting = 'Hello, world!'\n", + "new_greeting = 'J' + greeting[1:]\n", + "new_greeting" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'a5'" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'a' + str(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This example concatenates a new first letter onto\n", + "a slice of greeting. It has no effect on\n", + "the original string.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.6 Searching" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What does the following function do?\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def find(word, letter):\n", + " index = 0\n", + " while index < len(word):\n", + " if word[index] == letter:\n", + " return index\n", + " index = index + 1\n", + " return -1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In a sense, find is the inverse of the [] operator.\n", + "Instead of taking an index and extracting the corresponding character,\n", + "it takes a character and finds the index where that character\n", + "appears. If the character is not found, the function returns -1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is the first example we have seen of a return statement\n", + "inside a loop. If word[index] == letter, the function breaks\n", + "out of the loop and returns immediately." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the character doesn’t appear in the string, the program\n", + "exits the loop normally and returns -1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This pattern of computation—traversing a sequence and returning\n", + "when we find what we are looking for—is called a search.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an exercise, modify find so that it has a\n", + "third parameter, the index in word where it should start\n", + "looking." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.7 Looping and counting" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following program counts the number of times the letter a\n", + "appears in a string:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3\n" + ] + } + ], + "source": [ + "word = 'banana'\n", + "count = 0\n", + "for letter in word:\n", + " if letter == 'a':\n", + " count = count + 1\n", + "print(count)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This program demonstrates another pattern of computation called a counter. The variable count is initialized to 0 and then\n", + "incremented each time an a is found.\n", + "When the loop exits, count\n", + "contains the result—the total number of a’s." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As an exercise, encapsulate this code in a function named count, and generalize it so that it accepts the string and the\n", + "letter as arguments." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then rewrite the function so that instead of\n", + "traversing the string, it uses the three-parameter version of find from the previous section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.8 String methods" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Strings provide methods that perform a variety of useful operations.\n", + "A method is similar to a function—it takes arguments and\n", + "returns a value—but the syntax is different. For example, the\n", + "method upper takes a string and returns a new string with\n", + "all uppercase letters.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Instead of the function syntax upper(word), it uses\n", + "the method syntax word.upper()." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'BANANA'" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word = 'banana'\n", + "new_word = word.upper()\n", + "new_word" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'banana'" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_word.lower()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This form of dot notation specifies the name of the method, upper, and the name of the string to apply the method to, word. The empty parentheses indicate that this method takes no\n", + "arguments.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A method call is called an invocation; in this case, we would\n", + "say that we are invoking upper on word.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As it turns out, there is a string method named find that\n", + "is remarkably similar to the function we wrote:" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word = 'banana'\n", + "index = word.find('a')\n", + "index" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In this example, we invoke find on word and pass\n", + "the letter we are looking for as a parameter." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Actually, the find method is more general than our function;\n", + "it can find substrings, not just characters:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word.find('na')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "By default, find starts at the beginning of the string, but\n", + "it can take a second argument, the index where it should start:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word.find('na', 3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This is an example of an optional argument;\n", + "find can\n", + "also take a third argument, the index where it should stop:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "-1" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "name = 'bob'\n", + "name.find('b', 1, 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This search fails because b does not\n", + "appear in the index range from 1 to 2, not including 2. Searching up to, but not including, the second index makes\n", + "find consistent with the slice operator." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus\n", + "\n", + "Split\n", + "https://docs.python.org/2/library/string.html#string.split\n", + "\n", + "Built-in Functions\n", + "https://docs.python.org/3/library/functions.html" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Monty Python, Monty Python']" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s = 'Monty Python, Monty Python'\n", + "s.split('$')" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'nbananaobananahbananatbananaybananaPbanana bananaybananatbanananbananaobananaMbanana banana,banananbananaobananahbananatbananaybananaPbanana bananaybananatbanananbananaobananaM'" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fruit.join(reversed(s))" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'a,n,a,n,a,b'" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "','.join(reversed(fruit))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.9 The in operator" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The word in is a boolean operator that takes two strings and\n", + "returns True if the first appears as a substring in the second:" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'a' in 'banana'" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'seed' in 'banana'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "For example, the following function prints all the\n", + "letters from word1 that also appear in word2:" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [], + "source": [ + "def in_both(word1, word2):\n", + " for letter in word1:\n", + " if letter in word2:\n", + " print(letter)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "With well-chosen variable names,\n", + "Python sometimes reads like English. You could read\n", + "this loop, “for (each) letter in (the first) word, if (the) letter \n", + "(appears) in (the second) word, print (the) letter.”" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s what you get if you compare apples and oranges:" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a\n", + "e\n", + "s\n" + ] + } + ], + "source": [ + "in_both('apples', 'oranges')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.10 String comparison" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The relational operators work on strings. To see if two strings are equal:" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All right, bananas.\n" + ] + } + ], + "source": [ + "if word == 'banana':\n", + " print('All right, bananas.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Other relational operations are useful for putting words in alphabetical\n", + "order:" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All right, bananas.\n" + ] + } + ], + "source": [ + "if word < 'banana':\n", + " print('Your word, ' + word + ', comes before banana.')\n", + "elif word > 'banana':\n", + " print('Your word, ' + word + ', comes after banana.')\n", + "else:\n", + " print('All right, bananas.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Python does not handle uppercase and lowercase letters the same way\n", + "people do. All the uppercase letters come before all the\n", + "lowercase letters, so:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Your word, Pineapple, comes before banana." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A common way to address this problem is to convert strings to a\n", + "standard format, such as all lowercase, before performing the\n", + "comparison. Keep that in mind in case you have to defend yourself\n", + "against a man armed with a Pineapple." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.11 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you use indices to traverse the values in a sequence,\n", + "it is tricky to get the beginning and end of the traversal\n", + "right. Here is a function that is supposed to compare two\n", + "words and return True if one of the words is the reverse\n", + "of the other, but it contains two errors:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_reverse(word1, word2):\n", + " if len(word1) != len(word2):\n", + " return False\n", + " \n", + " i = 0\n", + " j = len(word2)\n", + "\n", + " while j > 0:\n", + " if word1[i] != word2[j]:\n", + " return False\n", + " i = i+1\n", + " j = j-1\n", + "\n", + " return True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first if statement checks whether the words are the\n", + "same length. If not, we can return False immediately.\n", + "Otherwise, for the rest of the function, we can assume that the words\n", + "are the same length. This is an example of the guardian pattern\n", + "in Section 6.8.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "i and j are indices: i traverses word1\n", + "forward while j traverses word2 backward. If we find\n", + "two letters that don’t match, we can return False immediately.\n", + "If we get through the whole loop and all the letters match, we\n", + "return True." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we test this function with the words “pots” and “stop”, we\n", + "expect the return value True, but we get an IndexError:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_reverse('pots', 'stop')\n", + "...\n", + " File \"reverse.py\", line 15, in is_reverse\n", + " if word1[i] != word2[j]:\n", + "IndexError: string index out of range" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "For debugging this kind of error, my first move is to\n", + "print the values of the indices immediately before the line\n", + "where the error appears." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + " while j > 0:\n", + " print(i, j) # print here\n", + " \n", + " if word1[i] != word2[j]:\n", + " return False\n", + " i = i+1\n", + " j = j-1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Now when I run the program again, I get more information:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_reverse('pots', 'stop')\n", + "0 4\n", + "...\n", + "IndexError: string index out of range" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first time through the loop, the value of j is 4,\n", + "which is out of range for the string 'pots'.\n", + "The index of the last character is 3, so the\n", + "initial value for j should be len(word2)-1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If I fix that error and run the program again, I get:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "is_reverse('pots', 'stop')\n", + "0 3\n", + "1 2\n", + "2 1\n", + "True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This time we get the right answer, but it looks like the loop only ran\n", + "three times, which is suspicious. To get a better idea of what is\n", + "happening, it is useful to draw a state diagram. During the first\n", + "iteration, the frame for is_reverse is shown in\n", + "Figure 8.2. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I took some license by arranging the variables in the frame\n", + "and adding dotted lines to show that the values of i and\n", + "j indicate characters in word1 and word2." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Starting with this diagram, run the program on paper, changing the\n", + "values of i and j during each iteration. Find and fix the\n", + "second error in this function.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8.12 Glossary" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/Books/Think Python/ch7_debug.py b/notebooks/Books/Think Python/ch7_debug.py new file mode 100644 index 0000000..e7275d4 --- /dev/null +++ b/notebooks/Books/Think Python/ch7_debug.py @@ -0,0 +1,52 @@ +def pascal(n): + row_map = {0: {n: 1}} + + for i in range(1, n): + row_list = {} + + prev_row = row_map[i - 1] + for k, v in row_map[i - 1].items(): + + if k + 1 in row_list: + row_list[k + 1] = row_list[k + 1] + else: + row_list[k + 1] = prev_row.get(k, 0) + prev_row.get(k + 2, 0) + + if k - 1 in row_list: + row_list[k - 1] = row_list[k - 1] + else: + row_list[k - 1] = prev_row.get(k, 0) + prev_row.get(k - 2, 0) + + row_map[i] = row_list + + for k, v in row_map.items(): + print(f'k: {k}, v: {v}') + + for k, v in row_map.items(): + # print(f'k: {k}, v: {v}') + count = 0 + for kk, vv in sorted(v.items()): + count = count + 1 + if count == 1: + print(' ' * kk + f'{vv:3}', end='') + else: + print(' ' + f'{vv:3}', end='') + print() + + +def fibMemo(i): + memo = {} + if i in memo: + return memo[i] + if i <= 2: + return 1 + else: + f = fibMemo(i - 1) + fibMemo(i - 2) + memo[i] = f + # print("calc", i, memo) + return f + + +x = fibMemo(4) +print(x) +pascal(x) diff --git a/notebooks/Pandas search in column, every column and regex.ipynb b/notebooks/Pandas search in column, every column and regex.ipynb index 699ac60..c64e94f 100644 --- a/notebooks/Pandas search in column, every column and regex.ipynb +++ b/notebooks/Pandas search in column, every column and regex.ipynb @@ -1226,7 +1226,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.8" } }, "nbformat": 4, diff --git a/notebooks/Python Extract Table from PDF.ipynb b/notebooks/Python Extract Table from PDF.ipynb index 47add32..fbdb305 100644 --- a/notebooks/Python Extract Table from PDF.ipynb +++ b/notebooks/Python Extract Table from PDF.ipynb @@ -55,247 +55,16 @@ "metadata": {}, "outputs": [ { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
BREADS & CEREALSPortion size *per 100 grams (3.5 oz)Unnamed: 3energy content
0Bagel ( 1 average )140 cals (45g)310 calsNaNMedium
1Biscuit digestives86 cals (per biscuit)480 calsNaNHigh
2Jaffa cake48 cals (per biscuit)370 calsNaNMed-High
3Bread white (thick slice)96 cals (1 slice 40g)240 calsNaNMedium
4Bread wholemeal (thick)88 cals (1 slice 40g)220 calsNaNLow-med
5Chapatis250 cals300 calsNaNMedium
6Cornflakes130 cals (35g)370 calsNaNMed-High
7Crackerbread17 cals per slice325 calsNaNLow Calorie
8Cream crackers35 cals (per cracker)440 calsNaNLow / portion
9Crumpets93 cals (per crumpet)198 calsNaNLow-Med
10Flapjacks basic fruit mix320 cals500 calsNaNHigh
11Macaroni (boiled)238 cals (250g)95 calsNaNLow calorie
12Muesli195 cals (50g)390 calsNaNMed-high
13Naan bread (normal)300 cals (small plate size)320 calsNaNMedium
14Noodles (boiled)175 cals (250g)70 calsNaNLow calorie
15Pasta ( normal boiled )330 cals (300g)110 calsNaNLow calorie
16Pasta (wholemeal boiled )315 cals (300g)105 calsNaNLow calorie
17Porridge oats (with water)193 cals (350g)55 calsNaNLow calorie
18Potatoes** (boiled)210 cals (300g)70 calsNaNLow calorie
19Potatoes** (roast)420 cals (300g)140 calsNaNMedium
\n", - "
" - ], - "text/plain": [ - " BREADS & CEREALS Portion size * \\\n", - "0 Bagel ( 1 average ) 140 cals (45g) \n", - "1 Biscuit digestives 86 cals (per biscuit) \n", - "2 Jaffa cake 48 cals (per biscuit) \n", - "3 Bread white (thick slice) 96 cals (1 slice 40g) \n", - "4 Bread wholemeal (thick) 88 cals (1 slice 40g) \n", - "5 Chapatis 250 cals \n", - "6 Cornflakes 130 cals (35g) \n", - "7 Crackerbread 17 cals per slice \n", - "8 Cream crackers 35 cals (per cracker) \n", - "9 Crumpets 93 cals (per crumpet) \n", - "10 Flapjacks basic fruit mix 320 cals \n", - "11 Macaroni (boiled) 238 cals (250g) \n", - "12 Muesli 195 cals (50g) \n", - "13 Naan bread (normal) 300 cals (small plate size) \n", - "14 Noodles (boiled) 175 cals (250g) \n", - "15 Pasta ( normal boiled ) 330 cals (300g) \n", - "16 Pasta (wholemeal boiled ) 315 cals (300g) \n", - "17 Porridge oats (with water) 193 cals (350g) \n", - "18 Potatoes** (boiled) 210 cals (300g) \n", - "19 Potatoes** (roast) 420 cals (300g) \n", - "\n", - " per 100 grams (3.5 oz) Unnamed: 3 energy content \n", - "0 310 cals NaN Medium \n", - "1 480 cals NaN High \n", - "2 370 cals NaN Med-High \n", - "3 240 cals NaN Medium \n", - "4 220 cals NaN Low-med \n", - "5 300 cals NaN Medium \n", - "6 370 cals NaN Med-High \n", - "7 325 cals NaN Low Calorie \n", - "8 440 cals NaN Low / portion \n", - "9 198 cals NaN Low-Med \n", - "10 500 cals NaN High \n", - "11 95 cals NaN Low calorie \n", - "12 390 cals NaN Med-high \n", - "13 320 cals NaN Medium \n", - "14 70 cals NaN Low calorie \n", - "15 110 cals NaN Low calorie \n", - "16 105 cals NaN Low calorie \n", - "17 55 cals NaN Low calorie \n", - "18 70 cals NaN Low calorie \n", - "19 140 cals NaN Medium " - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"./tmp/pdf/Food Calories List.pdf\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" + ] } ], "source": [ @@ -309,226 +78,16 @@ "metadata": {}, "outputs": [ { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
BREADS & CEREALSPortion size *per 100 grams (3.5 oz)energy content
0Bagel ( 1 average )140 cals (45g)310 calsMedium
1Biscuit digestives86 cals (per biscuit)480 calsHigh
2Jaffa cake48 cals (per biscuit)370 calsMed-High
3Bread white (thick slice)96 cals (1 slice 40g)240 calsMedium
4Bread wholemeal (thick)88 cals (1 slice 40g)220 calsLow-med
5Chapatis250 cals300 calsMedium
6Cornflakes130 cals (35g)370 calsMed-High
7Crackerbread17 cals per slice325 calsLow Calorie
8Cream crackers35 cals (per cracker)440 calsLow / portion
9Crumpets93 cals (per crumpet)198 calsLow-Med
10Flapjacks basic fruit mix320 cals500 calsHigh
11Macaroni (boiled)238 cals (250g)95 calsLow calorie
12Muesli195 cals (50g)390 calsMed-high
13Naan bread (normal)300 cals (small plate size)320 calsMedium
14Noodles (boiled)175 cals (250g)70 calsLow calorie
15Pasta ( normal boiled )330 cals (300g)110 calsLow calorie
16Pasta (wholemeal boiled )315 cals (300g)105 calsLow calorie
17Porridge oats (with water)193 cals (350g)55 calsLow calorie
18Potatoes** (boiled)210 cals (300g)70 calsLow calorie
19Potatoes** (roast)420 cals (300g)140 calsMedium
\n", - "
" - ], - "text/plain": [ - " BREADS & CEREALS Portion size * \\\n", - "0 Bagel ( 1 average ) 140 cals (45g) \n", - "1 Biscuit digestives 86 cals (per biscuit) \n", - "2 Jaffa cake 48 cals (per biscuit) \n", - "3 Bread white (thick slice) 96 cals (1 slice 40g) \n", - "4 Bread wholemeal (thick) 88 cals (1 slice 40g) \n", - "5 Chapatis 250 cals \n", - "6 Cornflakes 130 cals (35g) \n", - "7 Crackerbread 17 cals per slice \n", - "8 Cream crackers 35 cals (per cracker) \n", - "9 Crumpets 93 cals (per crumpet) \n", - "10 Flapjacks basic fruit mix 320 cals \n", - "11 Macaroni (boiled) 238 cals (250g) \n", - "12 Muesli 195 cals (50g) \n", - "13 Naan bread (normal) 300 cals (small plate size) \n", - "14 Noodles (boiled) 175 cals (250g) \n", - "15 Pasta ( normal boiled ) 330 cals (300g) \n", - "16 Pasta (wholemeal boiled ) 315 cals (300g) \n", - "17 Porridge oats (with water) 193 cals (350g) \n", - "18 Potatoes** (boiled) 210 cals (300g) \n", - "19 Potatoes** (roast) 420 cals (300g) \n", - "\n", - " per 100 grams (3.5 oz) energy content \n", - "0 310 cals Medium \n", - "1 480 cals High \n", - "2 370 cals Med-High \n", - "3 240 cals Medium \n", - "4 220 cals Low-med \n", - "5 300 cals Medium \n", - "6 370 cals Med-High \n", - "7 325 cals Low Calorie \n", - "8 440 cals Low / portion \n", - "9 198 cals Low-Med \n", - "10 500 cals High \n", - "11 95 cals Low calorie \n", - "12 390 cals Med-high \n", - "13 320 cals Medium \n", - "14 70 cals Low calorie \n", - "15 110 cals Low calorie \n", - "16 105 cals Low calorie \n", - "17 55 cals Low calorie \n", - "18 70 cals Low calorie \n", - "19 140 cals Medium " - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"./tmp/pdf/Food Calories List.pdf\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdropna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'columns'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" + ] } ], "source": [ @@ -543,40 +102,15 @@ "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "-- ------------------------ ----------------- -------- -----------\n", - " 0 Fish fingers 50 cals per piece 220 cals Medium\n", - " 1 Gammon 320 cals 280 cals Med-High\n", - " 2 Haddock fresh 200 cals 110 cals Low calorie\n", - " 3 Halibut fresh 220 cals 125 cals Low calorie\n", - " 4 Ham 6 cals 240 cals Medium\n", - " 5 Herring fresh grilled 300 cals 200 cals Medium\n", - " 6 Kidney 200 cals 160 cals Medium\n", - " 7 Kipper 200 cals 120 cals Low calorie\n", - " 8 Liver 200 cals 150 cals Medium\n", - " 9 Liver pate 150 cals 300 cals Medium\n", - "10 Lamb (roast) 300 cals 300 cals Med-High\n", - "11 Lobster boiled 200 cals 100 cals Low calorie\n", - "12 Luncheon meat 300 cals 400 cals High\n", - "13 Mackeral 320 cals 300 cals Medium\n", - "14 Mussels 90 cals 90 cals Low-Med\n", - "15 Pheasant roast 200 cals 200 cals Medium\n", - "16 Pilchards (tinned) 140 cals 140 cals Medium\n", - "17 Prawns 180 cals 100 cals Low- Med\n", - "18 Pork 320 cals 290 cals Med-High\n", - "19 Pork pie 320 cals 450 cals High\n", - "20 Rabbit 200 cals 180 cals Medium\n", - "21 Salmon fresh 220 cals 180 cals Medium\n", - "22 Sardines tinned in oil 220 cals 220 cals Medium\n", - "23 Sardines in tomato sauce 180 cals 180 cals Medium\n", - "24 Sausage pork fried 250 cals 320 cals High\n", - "25 Sausage pork grilled 220 cals 280 cals Med-High\n", - "26 Sausage roll 290 cals 480 cals High\n", - "27 Scampi fried in oil 400 cals 340 cals High\n", - "28 Steak & kidney pie 400 cals 350 cals High\n", - "-- ------------------------ ----------------- -------- -----------\n" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"./tmp/pdf/Food Calories List.pdf\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpages\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mprint\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mtabulate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" ] } ], @@ -591,618 +125,16 @@ "metadata": {}, "outputs": [ { - "data": { - "text/plain": [ - "[{'extraction_method': 'stream',\n", - " 'top': 0.0,\n", - " 'left': 0.0,\n", - " 'width': 524.6400146484375,\n", - " 'height': 725.6300048828125,\n", - " 'data': [[{'top': 65.19,\n", - " 'left': 120.24,\n", - " 'width': 48.599998474121094,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Fish cake'},\n", - " {'top': 65.19,\n", - " 'left': 241.2,\n", - " 'width': 79.91999816894531,\n", - " 'height': 7.880000114440918,\n", - " 'text': '90 cals per cake'},\n", - " {'top': 65.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 65.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 87.75,\n", - " 'left': 114.6,\n", - " 'width': 60.00000762939453,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Fish fingers'},\n", - " {'top': 87.75,\n", - " 'left': 239.52,\n", - " 'width': 83.27998352050781,\n", - " 'height': 7.880000114440918,\n", - " 'text': '50 cals per piece'},\n", - " {'top': 87.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 87.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 110.19,\n", - " 'left': 120.72,\n", - " 'width': 47.63999938964844,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Gammon'},\n", - " {'top': 110.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '320 cals'},\n", - " {'top': 110.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '280 cals'},\n", - " {'top': 110.19,\n", - " 'left': 467.76,\n", - " 'width': 53.03997802734375,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Med-High'}],\n", - " [{'top': 132.75,\n", - " 'left': 107.88,\n", - " 'width': 73.31999969482422,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Haddock fresh'},\n", - " {'top': 132.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 132.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '110 cals'},\n", - " {'top': 132.75,\n", - " 'left': 464.04,\n", - " 'width': 60.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low calorie'}],\n", - " [{'top': 155.19,\n", - " 'left': 111.6,\n", - " 'width': 66.00000762939453,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Halibut fresh'},\n", - " {'top': 155.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 155.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '125 cals'},\n", - " {'top': 155.19,\n", - " 'left': 464.04,\n", - " 'width': 60.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low calorie'}],\n", - " [{'top': 177.75,\n", - " 'left': 131.4,\n", - " 'width': 26.279998779296875,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Ham'},\n", - " {'top': 177.75,\n", - " 'left': 265.92,\n", - " 'width': 30.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '6 cals'},\n", - " {'top': 177.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '240 cals'},\n", - " {'top': 177.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 200.19,\n", - " 'left': 93.72,\n", - " 'width': 101.63999938964844,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Herring fresh grilled'},\n", - " {'top': 200.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 200.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 200.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 222.75,\n", - " 'left': 125.4,\n", - " 'width': 38.279991149902344,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Kidney'},\n", - " {'top': 222.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 222.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '160 cals'},\n", - " {'top': 222.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 245.19,\n", - " 'left': 126.36,\n", - " 'width': 36.36000061035156,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Kipper'},\n", - " {'top': 245.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 245.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '120 cals'},\n", - " {'top': 245.19,\n", - " 'left': 464.04,\n", - " 'width': 60.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low calorie'}],\n", - " [{'top': 267.75,\n", - " 'left': 130.08,\n", - " 'width': 29.039993286132812,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Liver'},\n", - " {'top': 267.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 267.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '150 cals'},\n", - " {'top': 267.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 290.19,\n", - " 'left': 118.56,\n", - " 'width': 51.96000671386719,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Liver pate'},\n", - " {'top': 290.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '150 cals'},\n", - " {'top': 290.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 290.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 312.75,\n", - " 'left': 111.96,\n", - " 'width': 65.2800064086914,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Lamb (roast)'},\n", - " {'top': 312.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 312.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 312.75,\n", - " 'left': 467.76,\n", - " 'width': 53.03997802734375,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Med-High'}],\n", - " [{'top': 335.19,\n", - " 'left': 108.24,\n", - " 'width': 72.5999984741211,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Lobster boiled'},\n", - " {'top': 335.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 335.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '100 cals'},\n", - " {'top': 335.19,\n", - " 'left': 464.04,\n", - " 'width': 60.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low calorie'}],\n", - " [{'top': 357.75,\n", - " 'left': 105.96,\n", - " 'width': 77.2800064086914,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Luncheon meat'},\n", - " {'top': 357.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 357.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '400 cals'},\n", - " {'top': 357.75,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}],\n", - " [{'top': 380.19,\n", - " 'left': 120.36,\n", - " 'width': 48.36000061035156,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Mackeral'},\n", - " {'top': 380.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '320 cals'},\n", - " {'top': 380.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '300 cals'},\n", - " {'top': 380.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 402.75,\n", - " 'left': 123.36,\n", - " 'width': 42.36000061035156,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Mussels'},\n", - " {'top': 402.75,\n", - " 'left': 262.92,\n", - " 'width': 36.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '90 cals'},\n", - " {'top': 402.75,\n", - " 'left': 373.08,\n", - " 'width': 36.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '90 cals'},\n", - " {'top': 402.75,\n", - " 'left': 468.84,\n", - " 'width': 51.000030517578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low-Med'}],\n", - " [{'top': 425.19,\n", - " 'left': 108.6,\n", - " 'width': 72.00000762939453,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Pheasant roast'},\n", - " {'top': 425.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 425.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 425.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 447.75,\n", - " 'left': 100.2,\n", - " 'width': 88.68000793457031,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Pilchards (tinned)'},\n", - " {'top': 447.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '140 cals'},\n", - " {'top': 447.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '140 cals'},\n", - " {'top': 447.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 470.19,\n", - " 'left': 125.4,\n", - " 'width': 38.279991149902344,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Prawns'},\n", - " {'top': 470.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '180 cals'},\n", - " {'top': 470.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '100 cals'},\n", - " {'top': 470.19,\n", - " 'left': 467.28,\n", - " 'width': 54.000030517578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Low- Med'}],\n", - " [{'top': 492.75,\n", - " 'left': 131.76,\n", - " 'width': 28.680007934570312,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Pork'},\n", - " {'top': 492.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '320 cals'},\n", - " {'top': 492.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '290 cals'},\n", - " {'top': 492.75,\n", - " 'left': 467.76,\n", - " 'width': 53.03997802734375,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Med-High'}],\n", - " [{'top': 515.19,\n", - " 'left': 122.88,\n", - " 'width': 43.31999969482422,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Pork pie'},\n", - " {'top': 515.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '320 cals'},\n", - " {'top': 515.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '450 cals'},\n", - " {'top': 515.19,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}],\n", - " [{'top': 537.75,\n", - " 'left': 127.08,\n", - " 'width': 35.03999328613281,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Rabbit'},\n", - " {'top': 537.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '200 cals'},\n", - " {'top': 537.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '180 cals'},\n", - " {'top': 537.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 560.19,\n", - " 'left': 111.24,\n", - " 'width': 66.72000885009766,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Salmon fresh'},\n", - " {'top': 560.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 560.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '180 cals'},\n", - " {'top': 560.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 582.75,\n", - " 'left': 91.92,\n", - " 'width': 105.36000061035156,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Sardines tinned in oil'},\n", - " {'top': 582.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 582.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 582.75,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 605.19,\n", - " 'left': 83.28,\n", - " 'width': 122.63999938964844,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Sardines in tomato sauce'},\n", - " {'top': 605.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '180 cals'},\n", - " {'top': 605.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '180 cals'},\n", - " {'top': 605.19,\n", - " 'left': 472.44,\n", - " 'width': 43.67999267578125,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Medium'}],\n", - " [{'top': 627.75,\n", - " 'left': 98.04,\n", - " 'width': 92.99999237060547,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Sausage pork fried'},\n", - " {'top': 627.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '250 cals'},\n", - " {'top': 627.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '320 cals'},\n", - " {'top': 627.75,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}],\n", - " [{'top': 650.19,\n", - " 'left': 93.72,\n", - " 'width': 101.63999938964844,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Sausage pork grilled'},\n", - " {'top': 650.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '220 cals'},\n", - " {'top': 650.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '280 cals'},\n", - " {'top': 650.19,\n", - " 'left': 467.76,\n", - " 'width': 53.03997802734375,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Med-High'}],\n", - " [{'top': 672.75,\n", - " 'left': 113.52,\n", - " 'width': 62.040000915527344,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Sausage roll'},\n", - " {'top': 672.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '290 cals'},\n", - " {'top': 672.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '480 cals'},\n", - " {'top': 672.75,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}],\n", - " [{'top': 695.19,\n", - " 'left': 98.28,\n", - " 'width': 92.63999938964844,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Scampi fried in oil'},\n", - " {'top': 695.19,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '400 cals'},\n", - " {'top': 695.19,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '340 cals'},\n", - " {'top': 695.19,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}],\n", - " [{'top': 717.75,\n", - " 'left': 96.96,\n", - " 'width': 95.2800064086914,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'Steak & kidney pie'},\n", - " {'top': 717.75,\n", - " 'left': 259.92,\n", - " 'width': 42.5999755859375,\n", - " 'height': 7.880000114440918,\n", - " 'text': '400 cals'},\n", - " {'top': 717.75,\n", - " 'left': 370.08,\n", - " 'width': 42.600006103515625,\n", - " 'height': 7.880000114440918,\n", - " 'text': '350 cals'},\n", - " {'top': 717.75,\n", - " 'left': 480.84,\n", - " 'width': 27.0,\n", - " 'height': 7.880000114440918,\n", - " 'text': 'High'}]]}]" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"./tmp/pdf/Food Calories List.pdf\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpages\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moutput_format\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"json\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" + ] } ], "source": [ @@ -1216,387 +148,16 @@ "metadata": {}, "outputs": [ { - "data": { - "text/plain": [ - "[ 0 1 \\\n", - " 0 BREADS & CEREALS Portion size * \n", - " 1 Bagel ( 1 average ) 140 cals (45g) \n", - " 2 Biscuit digestives 86 cals (per biscuit) \n", - " 3 Jaffa cake 48 cals (per biscuit) \n", - " 4 Bread white (thick slice) 96 cals (1 slice 40g) \n", - " 5 Bread wholemeal (thick) 88 cals (1 slice 40g) \n", - " 6 Chapatis 250 cals \n", - " 7 Cornflakes 130 cals (35g) \n", - " 8 Crackerbread 17 cals per slice \n", - " 9 Cream crackers 35 cals (per cracker) \n", - " 10 Crumpets 93 cals (per crumpet) \n", - " 11 Flapjacks basic fruit mix 320 cals \n", - " 12 Macaroni (boiled) 238 cals (250g) \n", - " 13 Muesli 195 cals (50g) \n", - " 14 Naan bread (normal) 300 cals (small plate size) \n", - " 15 Noodles (boiled) 175 cals (250g) \n", - " 16 Pasta ( normal boiled ) 330 cals (300g) \n", - " 17 Pasta (wholemeal boiled ) 315 cals (300g) \n", - " 18 Porridge oats (with water) 193 cals (350g) \n", - " 19 Potatoes** (boiled) 210 cals (300g) \n", - " 20 Potatoes** (roast) 420 cals (300g) \n", - " \n", - " 2 3 4 \n", - " 0 per 100 grams (3.5 oz) NaN energy content \n", - " 1 310 cals NaN Medium \n", - " 2 480 cals NaN High \n", - " 3 370 cals NaN Med-High \n", - " 4 240 cals NaN Medium \n", - " 5 220 cals NaN Low-med \n", - " 6 300 cals NaN Medium \n", - " 7 370 cals NaN Med-High \n", - " 8 325 cals NaN Low Calorie \n", - " 9 440 cals NaN Low / portion \n", - " 10 198 cals NaN Low-Med \n", - " 11 500 cals NaN High \n", - " 12 95 cals NaN Low calorie \n", - " 13 390 cals NaN Med-high \n", - " 14 320 cals NaN Medium \n", - " 15 70 cals NaN Low calorie \n", - " 16 110 cals NaN Low calorie \n", - " 17 105 cals NaN Low calorie \n", - " 18 55 cals NaN Low calorie \n", - " 19 70 cals NaN Low calorie \n", - " 20 140 cals NaN Medium ,\n", - " 0 1 2 3\n", - " 0 Rice (white boiled) 420 cals (300g) 140 cals Low calorie\n", - " 1 Rice (egg-fried) 500 cals 200 cals High in portion\n", - " 2 Rice ( Brown ) 405 cals (300g) 135 cals Low calorie\n", - " 3 Rice cakes 28 Cals = 1 slice 373 Cals Medium\n", - " 4 Ryvita Multi grain 37 Cals per slice 331 Cals Medium\n", - " 5 Ryvita + seed & Oats 180 Cals 4 slices 362 Cals Medium\n", - " 6 Spaghetti (boiled) 303 cals (300g) 101 cals Low calorie,\n", - " 0 1 2 3 \\\n", - " 0 Meats & Fish Portion size * per 100 grams (3.5 oz) NaN \n", - " 1 Anchovies tinned 300 cals 300 cals NaN \n", - " 2 Bacon average fried 250 cals (2 rashers) 500 cals NaN \n", - " 3 Bacon average grilled 150 cals 380 cals NaN \n", - " 4 Beef (roast) 300 cals 280 cals NaN \n", - " 5 Beef burgers frozen 320 cals 280 cals NaN \n", - " 6 Chicken 220 cals 200 cals NaN \n", - " 7 Cockles 50 cals 50 cals NaN \n", - " 8 Cod fresh 150 cals 100 cals NaN \n", - " 9 Cod chip shop food 400 cals 200 cals NaN \n", - " 10 Crab fresh 200 cals 110 cals NaN \n", - " 11 Duck roast 400 cals 430 cals NaN \n", - " \n", - " 4 \n", - " 0 energy content \n", - " 1 Medium \n", - " 2 High \n", - " 3 Med-High \n", - " 4 Medium \n", - " 5 Med-High \n", - " 6 Medium \n", - " 7 Low \n", - " 8 Low calorie \n", - " 9 Med-High \n", - " 10 low calorie \n", - " 11 High ,\n", - " 0 1 2 3\n", - " 0 Fish cake 90 cals per cake 200 cals Medium\n", - " 1 Fish fingers 50 cals per piece 220 cals Medium\n", - " 2 Gammon 320 cals 280 cals Med-High\n", - " 3 Haddock fresh 200 cals 110 cals Low calorie\n", - " 4 Halibut fresh 220 cals 125 cals Low calorie\n", - " 5 Ham 6 cals 240 cals Medium\n", - " 6 Herring fresh grilled 300 cals 200 cals Medium\n", - " 7 Kidney 200 cals 160 cals Medium\n", - " 8 Kipper 200 cals 120 cals Low calorie\n", - " 9 Liver 200 cals 150 cals Medium\n", - " 10 Liver pate 150 cals 300 cals Medium\n", - " 11 Lamb (roast) 300 cals 300 cals Med-High\n", - " 12 Lobster boiled 200 cals 100 cals Low calorie\n", - " 13 Luncheon meat 300 cals 400 cals High\n", - " 14 Mackeral 320 cals 300 cals Medium\n", - " 15 Mussels 90 cals 90 cals Low-Med\n", - " 16 Pheasant roast 200 cals 200 cals Medium\n", - " 17 Pilchards (tinned) 140 cals 140 cals Medium\n", - " 18 Prawns 180 cals 100 cals Low- Med\n", - " 19 Pork 320 cals 290 cals Med-High\n", - " 20 Pork pie 320 cals 450 cals High\n", - " 21 Rabbit 200 cals 180 cals Medium\n", - " 22 Salmon fresh 220 cals 180 cals Medium\n", - " 23 Sardines tinned in oil 220 cals 220 cals Medium\n", - " 24 Sardines in tomato sauce 180 cals 180 cals Medium\n", - " 25 Sausage pork fried 250 cals 320 cals High\n", - " 26 Sausage pork grilled 220 cals 280 cals Med-High\n", - " 27 Sausage roll 290 cals 480 cals High\n", - " 28 Scampi fried in oil 400 cals 340 cals High\n", - " 29 Steak & kidney pie 400 cals 350 cals High,\n", - " 0 1 2 3\n", - " 0 Taramasalata 130 cals 490 cals High\n", - " 1 Trout fresh 200 cals 120 cals Low calorie\n", - " 2 Tuna tinned water 100 cals 100 cals Low calorie\n", - " 3 Tuna tinned oil 180 cals 180 cals Medium\n", - " 4 Turkey 200 cals 160 cals Medium\n", - " 5 Veal 300 cals 240 cals Medium,\n", - " 0 1 2 3 \\\n", - " 0 Fruits & Vegetables Portion size * per 100 grams (3.5 oz) NaN \n", - " 1 Apple 44 calories 44 calories NaN \n", - " 2 Banana 107 cals 65 calories NaN \n", - " 3 Beans baked beans 170 cals 80 calories NaN \n", - " 4 Beans dried (boiled) 180 cals 130 calories NaN \n", - " 5 Blackberries 25 cals 25 calories NaN \n", - " 6 Blackcurrant 30 cals 30 calories NaN \n", - " 7 Broccoli 27 cals 32 cals NaN \n", - " 8 Cabbage (boiled) 15 calories 20 calories NaN \n", - " 9 Carrot (boiled) 16 calories 25 calories NaN \n", - " 10 Cauliflower (boiled) 20 calories 30 calories NaN \n", - " 11 Celery (boiled) 5 calories 10 calories NaN \n", - " 12 Cherry 35 calories 50 calories NaN \n", - " 13 Courgette 8 cals 20 cals NaN \n", - " 14 Cucumber 3 calories 10 calories NaN \n", - " 15 Dates 100 calories 235 calories NaN \n", - " 16 Grapes 55 calories 62 calories NaN \n", - " 17 Grapefruit 32 calories 32 calories NaN \n", - " 18 Kiwi 40 calories 50 calories NaN \n", - " 19 Leek (boiled) 10 calories 20 calories NaN \n", - " \n", - " 4 \n", - " 0 energy content \n", - " 1 Low calorie \n", - " 2 Low calorie \n", - " 3 Low calorie \n", - " 4 Low calorie \n", - " 5 Low calorie \n", - " 6 Low calorie \n", - " 7 Very low \n", - " 8 Low calorie \n", - " 9 Low calorie \n", - " 10 Low calorie \n", - " 11 Low calorie \n", - " 12 Low calorie \n", - " 13 Very low cal \n", - " 14 Low calorie \n", - " 15 Med-High \n", - " 16 Low calorie \n", - " 17 Low calorie \n", - " 18 Low calorie \n", - " 19 Low calorie ,\n", - " 0 1 2 3\n", - " 0 Lentils (boiled) 150 calories 100 calories Medium\n", - " 1 Lettuce 4 calories 15 calories Very Low\n", - " 2 Melon 14 calories 28 calories Medium\n", - " 3 Mushrooms raw one NaN NaN NaN\n", - " 4 average 3 cals 15 cals Very low cal\n", - " 5 Mushrooms (boiled) 12 calories 12 calories Low calorie\n", - " 6 Mushrooms (fried) 100 calories 145 calories High\n", - " 7 Olives 50 calories 80 calories Low calorie\n", - " 8 Onion (boiled) 14 calories 18 calories Low calorie\n", - " 9 One red Onion 49 cals 33 cals Low calorie\n", - " 10 Onions spring 3 cals 25 cals Very low cal\n", - " 11 Onion (fried) 86 calories 155 calories High\n", - " 12 Orange 40 calories 30 calories Low calorie\n", - " 13 Peas 210 calories 148 calories Medium\n", - " 14 Peas dried & boiled 200 calories 120 calories Low calorie\n", - " 15 Peach 35 calories 30 calories Low calorie\n", - " 16 Pear 45 calories 38 calories Low calorie\n", - " 17 Pepper yellow 6 cals 16 cals Very low\n", - " 18 Pineapple 40 calories 40 calories Low calorie\n", - " 19 Plum 30 calories 39 calories Low calorie\n", - " 20 Spinach 8 calories 8 calories Low calorie\n", - " 21 Strawberries (1 average) 10 calories 30 calories Low calorie\n", - " 22 Sweetcorn 95 calories 130 calories Medium\n", - " 23 Sweetcorn on the cob 70 calories 70 calories Low calorie\n", - " 24 Tomato 30 calories 20 calories Low calorie\n", - " 25 Tomato cherry 6 cals ( 3 toms) 17 Cals Very low cal\n", - " 26 Tomato puree 70 calories 70 calories Low-Medium\n", - " 27 Watercress 5 calories 20 calories Low calorie,\n", - " 0 1 \\\n", - " 0 Milk & Dairy produce Portion size * \n", - " 1 Cheese average 110 cals (25g) \n", - " 2 Cheddar types average reduced NaN \n", - " 3 fat 130 \n", - " 4 Cheese spreads average 90 cals \n", - " 5 Cottage cheese low fat 40 calories \n", - " 6 Cottage cheese 49 cals \n", - " 7 Cream cheese 200 cals \n", - " 8 Cream fresh half 128 cals \n", - " 9 Cream fresh single 160 cals \n", - " 10 Cream fresh double 340 cals \n", - " 11 Cream fresh clotted 480 cals \n", - " 12 Custard 210 cals \n", - " 13 Eggs ( 1 average size) 90 cals \n", - " 14 Eggs fried 120 cals \n", - " 15 Fromage frais 125 cals \n", - " 16 Ice cream 200 cals \n", - " 17 Milk whole 175 cals (250ml/half pint) \n", - " 18 Milk semi-skimmed 125 cals (250ml/half pint) \n", - " 19 Milk skimmed 95 cals (250ml/half pint) \n", - " 20 Milk Soya 90 cals \n", - " 21 Mousse flavored 120 cals \n", - " 22 Omelette with cheese 300 cals \n", - " 23 Trifle with cream 290 cals \n", - " 24 Yogurt natural 90 cals \n", - " 25 Yogurt reduced fat 70 cals \n", - " \n", - " 2 3 4 \n", - " 0 per 100 grams (3.5 oz) NaN energy content \n", - " 1 440 cals NaN High \n", - " 2 NaN NaN NaN \n", - " 3 260 calories NaN Medium \n", - " 4 270 NaN Medium \n", - " 5 80 cals NaN low - med \n", - " 6 98 cals NaN Low calorie \n", - " 7 428 cals NaN High \n", - " 8 160 cals NaN Med-High \n", - " 9 200 cals NaN Med-High \n", - " 10 430 cals NaN High \n", - " 11 600 cals NaN High \n", - " 12 100 cals NaN Medium \n", - " 13 150 cals NaN Medium \n", - " 14 180 cals NaN Med-High \n", - " 15 125 cals NaN Low calorie \n", - " 16 180 cals NaN Medium \n", - " 17 70 cals NaN Med-High \n", - " 18 50 cals NaN Medium \n", - " 19 38 cals NaN Low calorie \n", - " 20 36 cals NaN Low calorie \n", - " 21 140 cals NaN Medium \n", - " 22 266 cals NaN Medium \n", - " 23 190 cals NaN Medium \n", - " 24 60 cals NaN Low calorie \n", - " 25 45 cals NaN Low calorie ,\n", - " 0 1 \\\n", - " 0 Fats & Sugars Portion size * \n", - " 1 PURE FAT 9 cals (1 gram) \n", - " 2 Bombay mix 250 cals \n", - " 3 Butter 112 cals \n", - " 4 Chewing gum 8 cals per piece \n", - " 5 Chocolate 200 cals \n", - " 6 Cod liver oil 135 cals (1 tbspoon) \n", - " 7 Corn snack 125 cals \n", - " 8 Crisps (chips US) average 100 cals \n", - " 9 Honey 42 cals \n", - " 10 Jam 38 cals \n", - " 11 Lard 225 cals \n", - " 12 Low fat spread 50 cals \n", - " 13 Margarine 50 cals \n", - " 14 Mars bar 240 cals \n", - " 15 Mint sweets 10 cals per piece \n", - " 16 Oils -corn, sunflower, olive 135 cals (1 Tbspoon) \n", - " 17 Popcorn average 150 cals \n", - " 18 Sugar white table sugar 20 cals (1 tspoon) \n", - " 19 Sweets (boiled) 100 cals \n", - " 20 Syrup 15 cals \n", - " 21 Toffee 100 cals \n", - " \n", - " 2 3 4 \n", - " 0 per 100 grams (3.5 oz) NaN energy content \n", - " 1 900 cals NaN High \n", - " 2 500 cals NaN High \n", - " 3 750 cals NaN High \n", - " 4 - NaN Low calorie \n", - " 5 500 cals NaN High \n", - " 6 900 cals NaN High \n", - " 7 500 cals NaN High \n", - " 8 500 cals NaN High \n", - " 9 280 cals NaN Medium \n", - " 10 250 cals NaN Medium \n", - " 11 890 cals NaN High \n", - " 12 400 cals NaN High \n", - " 13 750 cals NaN High \n", - " 14 480 cals NaN Med-High \n", - " 15 - NaN High \n", - " 16 900 cals NaN High \n", - " 17 460 cals NaN High \n", - " 18 400 cals NaN Medium \n", - " 19 300 cals NaN Med-High \n", - " 20 300 cals NaN Medium \n", - " 21 400 cals NaN High ,\n", - " 0 1 2 \\\n", - " 0 Fruit Calories per piece Carbs (grams) \n", - " 1 Apple (1 average) 44 calories 10.5 \n", - " 2 Apple cooking 35 calories 9 \n", - " 3 Apricot 30 calories 6.7 \n", - " 4 Avocado 150 calories 2 \n", - " 5 Banana 107 calories 26 \n", - " 6 Blackberries each 1 calorie 0.2 \n", - " 7 Blackcurrant each 1.1 calorie 0.25 \n", - " 8 Blueberries (new) 100g 49 Cals ( 100g ) 15 g \n", - " 9 Cherry each 2.4 calories 0.6 \n", - " 10 Clementine 24 cals 5 \n", - " 11 Currants 5 calories 1.4 \n", - " 12 Damson 28 calories 7.2 \n", - " 13 One average date 5g 5 cals 1.2 \n", - " 14 Dates with inverted sugar 100g 250 calories 63 \n", - " 15 Figs 10 calories 2.4 \n", - " 16 Gooseberries 2.6 calories 0.65 \n", - " 17 Grapes 100g Seedless 50 cals 15 \n", - " 18 one average Grape 6g 3 calories 0.9 \n", - " 19 Grapefruit whole 100 calories 23 \n", - " 20 Guava 24 calories 4.4 \n", - " 21 Kiwi 34 calories 8 \n", - " 22 Lemon 20 calories 3.4 \n", - " 23 Lychees 3 calories 0.7 \n", - " 24 Mango 40 calories 9.5 \n", - " 25 Melon Honeydew (130g) 36 calories 9 \n", - " 26 Melon Canteloupe (130g) 25 cals 6 \n", - " 27 Nectarines 42 calories 9 \n", - " 28 Olives 6.8 calories trace \n", - " \n", - " 3 \n", - " 0 Water Content \n", - " 1 85 % \n", - " 2 88 % \n", - " 3 85 % \n", - " 4 60 % \n", - " 5 75 % \n", - " 6 85 % \n", - " 7 77 % \n", - " 8 81 % \n", - " 9 83 % \n", - " 10 66 % \n", - " 11 16 % \n", - " 12 70 % \n", - " 13 14 % \n", - " 14 12 % \n", - " 15 24 % \n", - " 16 80 % \n", - " 17 82 % \n", - " 18 82 % \n", - " 19 65 % \n", - " 20 85 % \n", - " 21 75 % \n", - " 22 85 % \n", - " 23 80 % \n", - " 24 80 % \n", - " 25 90 % \n", - " 26 93 % \n", - " 27 80 % \n", - " 28 63 % ,\n", - " 0 1 2 3\n", - " 0 Orange average 35 calories 8.5 73 %\n", - " 1 Orange large 350g 100 Cals 22g 75 %\n", - " 2 Papaya Diced (small handful) 67 Cals (20g) 17g -\n", - " 3 Passion Fruit 30 calories 3 50 %\n", - " 4 Paw Paw 28 calories 6 70 %\n", - " 5 Peach 35 calories 7 80 %\n", - " 6 Pear 45 calories 12 77 %\n", - " 7 Pineapple 50 calories 12 85 %\n", - " 8 Plum 25 calories 6 79 %\n", - " 9 Prunes 9 calories 2.2 37 %\n", - " 10 Raisins 5 calories 1.4 13 %\n", - " 11 Raspberries each 1.1 calories 0.2 87 %\n", - " 12 Rhubarb 8 calories 0.8 95 %\n", - " 13 Satsuma one average 112g 29 cals 6.5 88 %\n", - " 14 Satsumas 100g 35 calories 8.5 88 %\n", - " 15 Strawberries (1 average) 2.7 calories 0.6 90 %\n", - " 16 Sultanas 5 calories 1.4 16 %\n", - " 17 Tangerine 26 calories 6 60 %\n", - " 18 Tomatoes (1 average size) 9 cals 2.2 93 %\n", - " 19 Tomatoes Cherry (1 average size) 2 calories 0.5 90 %]" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"./tmp/pdf/Food Calories List.pdf\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpages\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'all'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmultiple_tables\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" + ] } ], "source": [ @@ -1607,7 +168,9 @@ { "cell_type": "code", "execution_count": 7, - "metadata": {}, + "metadata": { + "scrolled": true + }, "outputs": [ { "data": { @@ -1893,1219 +456,40 @@ "metadata": {}, "outputs": [ { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
0123
0Fruits & VegetablesPortion size *oz)energy content
1Apple44 calories44 caloriesLow calorie
2Banana107 cals65 caloriesLow calorie
3Beans baked beans170 cals80 caloriesLow calorie
4Beans dried (boiled)180 cals130 caloriesLow calorie
5Blackberries25 cals25 caloriesLow calorie
6Blackcurrant30 cals30 caloriesLow calorie
7Broccoli27 cals32 calsVery low
8Cabbage (boiled)15 calories20 caloriesLow calorie
9Carrot (boiled)16 calories25 caloriesLow calorie
10Cauliflower (boiled)20 calories30 caloriesLow calorie
11Celery (boiled)5 calories10 caloriesLow calorie
12Cherry35 calories50 caloriesLow calorie
13Courgette8 cals20 calsVery low cal
14Cucumber3 calories10 caloriesLow calorie
15Dates100 calories235 caloriesMed-High
16Grapes55 calories62 caloriesLow calorie
17Grapefruit32 calories32 caloriesLow calorie
18Kiwi40 calories50 caloriesLow calorie
19Leek (boiled)10 calories20 caloriesLow calorie
\n", - "
" - ], - "text/plain": [ - " 0 1 2 3\n", - "0 Fruits & Vegetables Portion size * oz) energy content\n", - "1 Apple 44 calories 44 calories Low calorie\n", - "2 Banana 107 cals 65 calories Low calorie\n", - "3 Beans baked beans 170 cals 80 calories Low calorie\n", - "4 Beans dried (boiled) 180 cals 130 calories Low calorie\n", - "5 Blackberries 25 cals 25 calories Low calorie\n", - "6 Blackcurrant 30 cals 30 calories Low calorie\n", - "7 Broccoli 27 cals 32 cals Very low\n", - "8 Cabbage (boiled) 15 calories 20 calories Low calorie\n", - "9 Carrot (boiled) 16 calories 25 calories Low calorie\n", - "10 Cauliflower (boiled) 20 calories 30 calories Low calorie\n", - "11 Celery (boiled) 5 calories 10 calories Low calorie\n", - "12 Cherry 35 calories 50 calories Low calorie\n", - "13 Courgette 8 cals 20 cals Very low cal\n", - "14 Cucumber 3 calories 10 calories Low calorie\n", - "15 Dates 100 calories 235 calories Med-High\n", - "16 Grapes 55 calories 62 calories Low calorie\n", - "17 Grapefruit 32 calories 32 calories Low calorie\n", - "18 Kiwi 40 calories 50 calories Low calorie\n", - "19 Leek (boiled) 10 calories 20 calories Low calorie" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\", encoding = 'ISO-8859-1',\n", - " stream=True, area = [269.875, 12.75, 790.5, 961], pages = 4, guess = False, pandas_options={'header':None})\n", - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
McKinsey Global Institute
0Disruptive technologies: Advances that will tr...
1Exhibit E2
2Speed, scope, and economic value at stake of 1...
3Illustrative rates of technology improvement I...
4and diffusion resources that could be impacted...
5Mobile $5 million vs. $40024.3 billion $1.7 tr...
6Internet Price of the fastest supercomputer in...
7an iPhone 4 today, equal in performance (MFLOP...
86x 1 billion Interaction and transaction worker
9Growth in sales of smartphones and tablets sin...
10launch of iPhone in 2007 40% of global workfor...
11Automation 100x 230+ million $9+ trillion
12of knowledge Increase in computing power from ...
13work (chess champion in 1997) to Watson (Jeopa...
14winner in 2011) Smartphone users, with potenti...
15400+ million automated digital assistance apps
16Increase in number of users of intelligent dig...
17assistants like Siri and Google Now in past 5 ...
18The Internet 300% 1 trillion $36 trillion
19of Things Increase in connected machine-to-mac...
20over past 5 years across industries such as ma...
2180–90% health care, and mining and mining)
22Price decline in MEMS (microelectromechanical ...
23systems) sensors in past 5 years Global machin...
24connections across sectors like transportation,
25security, health care, and utilities
26Cloud 18 months 2 billion $1.7 trillion
27technology Time to double server performance p...
283x like Gmail, Yahoo, and Hotmail $3 trillion
29Monthly cost of owning a server vs. renting in...
......
52storage Price decline for a lithium-ion batter...
53electric vehicle since 2009 1.2 billion gasoli...
54People without access to electricity $100 billion
55Estimated value of electricity for
56households currently without access
573D printing 90% 320 million $11 trillion
58Lower price for a home 3D printer vs. 4 years ...
594x workforce $85 billion
60Increase in additive manufacturing revenue in ...
6110 years Annual number of toys manufactured gl...
62Advanced $1,000 vs. $50 7.6 million tons $1.2 ...
63materials Difference in price of 1 gram of nan...
6410 years 45,000 metric tons sales
65115x Annual global carbon fiber consumption $4...
66Strength-to-weight ratio of carbon nanotubes v...
67Advanced 3x 22 billion $800 billion
68oil and gas Increase in efficiency of US gas w...
69exploration 2x produced globally gas
70and recovery Increase in efficiency of US oil ...
71Barrels of crude oil produced globally Revenue...
72Renewable 85% 21,000 TWh $3.5 trillion
73energy Lower price for a solar photovoltaic ce...
742000 13 billion tons $80 billion
7519x Annual CO2 emissions from electricity Valu...
76Growth in solar photovoltaic and wind generati...
77capacity since 2000 and planes
781 Not comprehensive; indicative groups, produc...
792 For CDC-7600, considered the world’s faste...
803 Baxter is a general-purpose basic manufactur...
81SOURCE: McKinsey Global Institute analysis
\n", - "

82 rows × 1 columns

\n", - "
" - ], - "text/plain": [ - " McKinsey Global Institute\n", - "0 Disruptive technologies: Advances that will tr...\n", - "1 Exhibit E2\n", - "2 Speed, scope, and economic value at stake of 1...\n", - "3 Illustrative rates of technology improvement I...\n", - "4 and diffusion resources that could be impacted...\n", - "5 Mobile $5 million vs. $40024.3 billion $1.7 tr...\n", - "6 Internet Price of the fastest supercomputer in...\n", - "7 an iPhone 4 today, equal in performance (MFLOP...\n", - "8 6x 1 billion Interaction and transaction worker\n", - "9 Growth in sales of smartphones and tablets sin...\n", - "10 launch of iPhone in 2007 40% of global workfor...\n", - "11 Automation 100x 230+ million $9+ trillion\n", - "12 of knowledge Increase in computing power from ...\n", - "13 work (chess champion in 1997) to Watson (Jeopa...\n", - "14 winner in 2011) Smartphone users, with potenti...\n", - "15 400+ million automated digital assistance apps\n", - "16 Increase in number of users of intelligent dig...\n", - "17 assistants like Siri and Google Now in past 5 ...\n", - "18 The Internet 300% 1 trillion $36 trillion\n", - "19 of Things Increase in connected machine-to-mac...\n", - "20 over past 5 years across industries such as ma...\n", - "21 80–90% health care, and mining and mining)\n", - "22 Price decline in MEMS (microelectromechanical ...\n", - "23 systems) sensors in past 5 years Global machin...\n", - "24 connections across sectors like transportation,\n", - "25 security, health care, and utilities\n", - "26 Cloud 18 months 2 billion $1.7 trillion\n", - "27 technology Time to double server performance p...\n", - "28 3x like Gmail, Yahoo, and Hotmail $3 trillion\n", - "29 Monthly cost of owning a server vs. renting in...\n", - ".. ...\n", - "52 storage Price decline for a lithium-ion batter...\n", - "53 electric vehicle since 2009 1.2 billion gasoli...\n", - "54 People without access to electricity $100 billion\n", - "55 Estimated value of electricity for\n", - "56 households currently without access\n", - "57 3D printing 90% 320 million $11 trillion\n", - "58 Lower price for a home 3D printer vs. 4 years ...\n", - "59 4x workforce $85 billion\n", - "60 Increase in additive manufacturing revenue in ...\n", - "61 10 years Annual number of toys manufactured gl...\n", - "62 Advanced $1,000 vs. $50 7.6 million tons $1.2 ...\n", - "63 materials Difference in price of 1 gram of nan...\n", - "64 10 years 45,000 metric tons sales\n", - "65 115x Annual global carbon fiber consumption $4...\n", - "66 Strength-to-weight ratio of carbon nanotubes v...\n", - "67 Advanced 3x 22 billion $800 billion\n", - "68 oil and gas Increase in efficiency of US gas w...\n", - "69 exploration 2x produced globally gas\n", - "70 and recovery Increase in efficiency of US oil ...\n", - "71 Barrels of crude oil produced globally Revenue...\n", - "72 Renewable 85% 21,000 TWh $3.5 trillion\n", - "73 energy Lower price for a solar photovoltaic ce...\n", - "74 2000 13 billion tons $80 billion\n", - "75 19x Annual CO2 emissions from electricity Valu...\n", - "76 Growth in solar photovoltaic and wind generati...\n", - "77 capacity since 2000 and planes\n", - "78 1 Not comprehensive; indicative groups, produc...\n", - "79 2 For CDC-7600, considered the world’s faste...\n", - "80 3 Baxter is a general-purpose basic manufactur...\n", - "81 SOURCE: McKinsey Global Institute analysis\n", - "\n", - "[82 rows x 1 columns]" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = read_pdf(\"./tmp/pdf/output.pdf\", encoding = 'ISO-8859-1',\n", - " stream=True, guess = False)\n", - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Unnamed: 0Unnamed: 1over past 5 years across industries such as manufacturing,industries (manufacturing, health care,
0NaNNaN80–90% health care, and miningand mining)
1NaNNaNPrice decline in MEMS (microelectromechanical ...NaN
2NaNNaNsystems) sensors in past 5 years Global machin...NaN
3NaNNaNconnections across sectors like transportation,NaN
4NaNNaNsecurity, health care, and utilitiesNaN
5NaNCloud18 months 2 billion$1.7 trillion
6NaNtechnologyTime to double server performance per dollar G...GDP related to the Internet
7NaNNaN3x like Gmail, Yahoo, and Hotmail$3 trillion
8NaNNaNMonthly cost of owning a server vs. renting in...Enterprise IT spend
9NaNNaNthe cloud North American institutions hosting ...NaN
10NaNNaNto host critical applications on the cloudNaN
11NaNAdvanced75–85% 320 million$6 trillion
12NaNroboticsLower price for Baxter3 than a typical industr...Manufacturing worker employment
13NaNNaN170% workforcecosts, 19% of global employment costs
14NaNNaNGrowth in sales of industrial robots, 2009–1...$2–3 trillion
15NaNNaNAnnual major surgeriesCost of major surgeries
16NaNAutonomous7 1 billion$4 trillion
17NaNand near-Miles driven by top-performing driverless car ...Automobile industry revenue
18NaNautonomousDARPA Grand Challenge along a 150-mile route 4...$155 billion
19NaNvehicles1,540 Civilian, military, and general aviation...Revenue from sales of civilian, military,
20NaNNaNMiles cumulatively driven by cars competing in...and general aviation aircraft
21NaNNaNGrand ChallengeNaN
22NaNNaN300,000+NaN
23NaNNaNMiles driven by Google’s autonomous cars wit...NaN
24NaNNaN1 accident (which was human-caused)NaN
25NaNNext-10 months 26 million$6.5 trillion
26NaNgenerationTime to double sequencing speed per dollar Ann...Global health-care costs
27NaNgenomics100x disease, or type 2 diabetes$1.1 trillion
28NaNNaNIncrease in acreage of genetically modified cr...Global value of wheat, rice, maize, soy,
29NaNNaN1996–2012 People employed in agricultureand barley
30NaNEnergy40% 1 billion$2.5 trillion
31NaNstoragePrice decline for a lithium-ion battery pack i...Revenue from global consumption of
32NaNNaNelectric vehicle since 2009 1.2 billiongasoline and diesel
33NaNNaNPeople without access to electricity$100 billion
34NaNNaNNaNEstimated value of electricity for
35NaNNaNNaNhouseholds currently without access
36NaN3D printing90% 320 million$11 trillion
37NaNNaNLower price for a home 3D printer vs. 4 years ...Global manufacturing GDP
38NaNNaN4x workforce$85 billion
39NaNNaNIncrease in additive manufacturing revenue in ...Revenue from global toy sales
40NaNNaN10 years Annual number of toys manufactured gl...NaN
41NaNAdvanced$1,000 vs. $50 7.6 million tons$1.2 trillion
42NaNmaterialsDifference in price of 1 gram of nanotubes ove...Revenue from global semiconductor
43NaNNaN10 years 45,000 metric tonssales
44NaNNaN115x Annual global carbon fiber consumption$4 billion
45NaNNaNStrength-to-weight ratio of carbon nanotubes v...Revenue from global carbon fiber sales
46NaNAdvanced3x 22 billion$800 billion
47NaNoil and gasIncrease in efficiency of US gas wells, 2007â€...Revenue from global sales of natural
48NaNexploration2x produced globallygas
49NaNand recoveryIncrease in efficiency of US oil wells, 2007â€...$3.4 trillion
50NaNNaNBarrels of crude oil produced globallyRevenue from global sales of crude oil
51NaNRenewable85% 21,000 TWh$3.5 trillion
52NaNenergyLower price for a solar photovoltaic cell per ...Value of global electricity consumption
53NaNNaN2000 13 billion tons$80 billion
54NaNNaN19x Annual CO2 emissions from electricityValue of global carbon market
55NaNNaNGrowth in solar photovoltaic and wind generati...transactions
56NaNNaNcapacity since 2000 and planesNaN
571.0Not comprehensive; indicative groups, products...NaNNaN
582.0For CDC-7600, considered the world’s fastest...NaNNaN
593.0Baxter is a general-purpose basic manufacturin...NaNNaN
\n", - "
" - ], - "text/plain": [ - " Unnamed: 0 Unnamed: 1 \\\n", - "0 NaN NaN \n", - "1 NaN NaN \n", - "2 NaN NaN \n", - "3 NaN NaN \n", - "4 NaN NaN \n", - "5 NaN Cloud \n", - "6 NaN technology \n", - "7 NaN NaN \n", - "8 NaN NaN \n", - "9 NaN NaN \n", - "10 NaN NaN \n", - "11 NaN Advanced \n", - "12 NaN robotics \n", - "13 NaN NaN \n", - "14 NaN NaN \n", - "15 NaN NaN \n", - "16 NaN Autonomous \n", - "17 NaN and near- \n", - "18 NaN autonomous \n", - "19 NaN vehicles \n", - "20 NaN NaN \n", - "21 NaN NaN \n", - "22 NaN NaN \n", - "23 NaN NaN \n", - "24 NaN NaN \n", - "25 NaN Next- \n", - "26 NaN generation \n", - "27 NaN genomics \n", - "28 NaN NaN \n", - "29 NaN NaN \n", - "30 NaN Energy \n", - "31 NaN storage \n", - "32 NaN NaN \n", - "33 NaN NaN \n", - "34 NaN NaN \n", - "35 NaN NaN \n", - "36 NaN 3D printing \n", - "37 NaN NaN \n", - "38 NaN NaN \n", - "39 NaN NaN \n", - "40 NaN NaN \n", - "41 NaN Advanced \n", - "42 NaN materials \n", - "43 NaN NaN \n", - "44 NaN NaN \n", - "45 NaN NaN \n", - "46 NaN Advanced \n", - "47 NaN oil and gas \n", - "48 NaN exploration \n", - "49 NaN and recovery \n", - "50 NaN NaN \n", - "51 NaN Renewable \n", - "52 NaN energy \n", - "53 NaN NaN \n", - "54 NaN NaN \n", - "55 NaN NaN \n", - "56 NaN NaN \n", - "57 1.0 Not comprehensive; indicative groups, products... \n", - "58 2.0 For CDC-7600, considered the world’s fastest... \n", - "59 3.0 Baxter is a general-purpose basic manufacturin... \n", - "\n", - " over past 5 years across industries such as manufacturing, \\\n", - "0 80–90% health care, and mining \n", - "1 Price decline in MEMS (microelectromechanical ... \n", - "2 systems) sensors in past 5 years Global machin... \n", - "3 connections across sectors like transportation, \n", - "4 security, health care, and utilities \n", - "5 18 months 2 billion \n", - "6 Time to double server performance per dollar G... \n", - "7 3x like Gmail, Yahoo, and Hotmail \n", - "8 Monthly cost of owning a server vs. renting in... \n", - "9 the cloud North American institutions hosting ... \n", - "10 to host critical applications on the cloud \n", - "11 75–85% 320 million \n", - "12 Lower price for Baxter3 than a typical industr... \n", - "13 170% workforce \n", - "14 Growth in sales of industrial robots, 2009–1... \n", - "15 Annual major surgeries \n", - "16 7 1 billion \n", - "17 Miles driven by top-performing driverless car ... \n", - "18 DARPA Grand Challenge along a 150-mile route 4... \n", - "19 1,540 Civilian, military, and general aviation... \n", - "20 Miles cumulatively driven by cars competing in... \n", - "21 Grand Challenge \n", - "22 300,000+ \n", - "23 Miles driven by Google’s autonomous cars wit... \n", - "24 1 accident (which was human-caused) \n", - "25 10 months 26 million \n", - "26 Time to double sequencing speed per dollar Ann... \n", - "27 100x disease, or type 2 diabetes \n", - "28 Increase in acreage of genetically modified cr... \n", - "29 1996–2012 People employed in agriculture \n", - "30 40% 1 billion \n", - "31 Price decline for a lithium-ion battery pack i... \n", - "32 electric vehicle since 2009 1.2 billion \n", - "33 People without access to electricity \n", - "34 NaN \n", - "35 NaN \n", - "36 90% 320 million \n", - "37 Lower price for a home 3D printer vs. 4 years ... \n", - "38 4x workforce \n", - "39 Increase in additive manufacturing revenue in ... \n", - "40 10 years Annual number of toys manufactured gl... \n", - "41 $1,000 vs. $50 7.6 million tons \n", - "42 Difference in price of 1 gram of nanotubes ove... \n", - "43 10 years 45,000 metric tons \n", - "44 115x Annual global carbon fiber consumption \n", - "45 Strength-to-weight ratio of carbon nanotubes v... \n", - "46 3x 22 billion \n", - "47 Increase in efficiency of US gas wells, 2007â€... \n", - "48 2x produced globally \n", - "49 Increase in efficiency of US oil wells, 2007â€... \n", - "50 Barrels of crude oil produced globally \n", - "51 85% 21,000 TWh \n", - "52 Lower price for a solar photovoltaic cell per ... \n", - "53 2000 13 billion tons \n", - "54 19x Annual CO2 emissions from electricity \n", - "55 Growth in solar photovoltaic and wind generati... \n", - "56 capacity since 2000 and planes \n", - "57 NaN \n", - "58 NaN \n", - "59 NaN \n", - "\n", - " industries (manufacturing, health care, \n", - "0 and mining) \n", - "1 NaN \n", - "2 NaN \n", - "3 NaN \n", - "4 NaN \n", - "5 $1.7 trillion \n", - "6 GDP related to the Internet \n", - "7 $3 trillion \n", - "8 Enterprise IT spend \n", - "9 NaN \n", - "10 NaN \n", - "11 $6 trillion \n", - "12 Manufacturing worker employment \n", - "13 costs, 19% of global employment costs \n", - "14 $2–3 trillion \n", - "15 Cost of major surgeries \n", - "16 $4 trillion \n", - "17 Automobile industry revenue \n", - "18 $155 billion \n", - "19 Revenue from sales of civilian, military, \n", - "20 and general aviation aircraft \n", - "21 NaN \n", - "22 NaN \n", - "23 NaN \n", - "24 NaN \n", - "25 $6.5 trillion \n", - "26 Global health-care costs \n", - "27 $1.1 trillion \n", - "28 Global value of wheat, rice, maize, soy, \n", - "29 and barley \n", - "30 $2.5 trillion \n", - "31 Revenue from global consumption of \n", - "32 gasoline and diesel \n", - "33 $100 billion \n", - "34 Estimated value of electricity for \n", - "35 households currently without access \n", - "36 $11 trillion \n", - "37 Global manufacturing GDP \n", - "38 $85 billion \n", - "39 Revenue from global toy sales \n", - "40 NaN \n", - "41 $1.2 trillion \n", - "42 Revenue from global semiconductor \n", - "43 sales \n", - "44 $4 billion \n", - "45 Revenue from global carbon fiber sales \n", - "46 $800 billion \n", - "47 Revenue from global sales of natural \n", - "48 gas \n", - "49 $3.4 trillion \n", - "50 Revenue from global sales of crude oil \n", - "51 $3.5 trillion \n", - "52 Value of global electricity consumption \n", - "53 $80 billion \n", - "54 Value of global carbon market \n", - "55 transactions \n", - "56 NaN \n", - "57 NaN \n", - "58 NaN \n", - "59 NaN " - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\", encoding = 'ISO-8859-1',\n\u001b[0;32m----> 2\u001b[0;31m stream=True, area = [269.875, 12.75, 790.5, 961], pages = 4, guess = False, pandas_options={'header':None})\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/tabula/wrapper.py\u001b[0m in \u001b[0;36mread_pdf\u001b[0;34m(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mFileNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrerror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merrno\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mENOENT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" + ] } ], + "source": [ + "df = read_pdf(\"./tmp/pdf/Food Calories List.pdf\", encoding = 'ISO-8859-1',\n", + " stream=True, area = [269.875, 12.75, 790.5, 961], pages = 4, guess = False, pandas_options={'header':None})\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = read_pdf(\"./tmp/pdf/output.pdf\", encoding = 'ISO-8859-1',\n", + " stream=True, guess = False)\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "df = read_pdf(\"./tmp/pdf/output.pdf\", encoding = 'ISO-8859-1',\n", " stream=True, area=[269.875, 12.75, 790.5, 961], guess = False)\n", @@ -3131,66 +515,9 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
0123
1Bagel ( 1 average )140 cals (45g)310 calsMedium
2Biscuit digestives86 cals (per biscuit)480 calsHigh
\n", - "
" - ], - "text/plain": [ - " 0 1 2 3\n", - "1 Bagel ( 1 average ) 140 cals (45g) 310 cals Medium\n", - "2 Biscuit digestives 86 cals (per biscuit) 480 cals High" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import camelot\n", "tables = camelot.read_pdf(\"./tmp/pdf//Food Calories List.pdf\")\n", @@ -3199,47 +526,9 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "-- -------------\n", - " 0 Mobile\n", - " Internet\n", - " 1 Automation\n", - " of knowledge\n", - " work\n", - " 2 The Internet\n", - " of Things\n", - " 3 Cloud\n", - " technology\n", - " 4 Advanced\n", - " robotics\n", - " 5 Autonomous\n", - " and near-\n", - " autonomous\n", - " vehicles\n", - " 6 Next-\n", - " generation\n", - " genomics\n", - " 7 Energy\n", - " storage\n", - " 8 3D printing\n", - " 9 Advanced\n", - " materials\n", - "10 Advanced oil\n", - " and gas\n", - " exploration\n", - " and recovery\n", - "11 Renewable\n", - " energy\n", - "-- -------------\n" - ] - } - ], + "outputs": [], "source": [ "tables1 = camelot.read_pdf(\"./tmp/pdf/MGI_Disruptive_technologies_Full_report_May2013.pdf\", pages='32', area=[269.875, 120.75, 790.5, 561])\n", "print (tabulate(tables1[0].df))" @@ -3247,57 +536,9 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "30\n", - "NOK\n", - "31\n", - "NOK\n", - "32\n", - "-- -------------\n", - " 0 Mobile\n", - " Internet\n", - " 1 Automation\n", - " of knowledge\n", - " work\n", - " 2 The Internet\n", - " of Things\n", - " 3 Cloud\n", - " technology\n", - " 4 Advanced\n", - " robotics\n", - " 5 Autonomous\n", - " and near-\n", - " autonomous\n", - " vehicles\n", - " 6 Next-\n", - " generation\n", - " genomics\n", - " 7 Energy\n", - " storage\n", - " 8 3D printing\n", - " 9 Advanced\n", - " materials\n", - "10 Advanced oil\n", - " and gas\n", - " exploration\n", - " and recovery\n", - "11 Renewable\n", - " energy\n", - "-- -------------\n", - "NOK\n", - "33\n", - "NOK\n", - "34\n", - "NOK\n" - ] - } - ], + "outputs": [], "source": [ "for i in range(30,35):\n", " print (i)\n", @@ -3324,21 +565,18 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 9, "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "b' Fish cake\\n 90 cals per cake\\n 200 cals\\n Medium\\n Fish fingers\\n 50 cals per piece\\n 220 cals\\n Medium\\n Gammon\\n 320 cals\\n 280 cals\\n Med\\n-High\\n Haddock fresh\\n 200 cals\\n 110 cals\\n Low calorie\\n Halibut fresh\\n 220 cals\\n 125 cals\\n Low calorie\\n Ha\\nm 6 cals\\n 240 cals\\n Medium\\n Herring fresh grilled\\n 300 cals\\n 200 cals\\n Medium\\n Kidney\\n 200 cals\\n 160 cals\\n Medium\\n Kipper\\n 200 cals\\n 120 cals\\n Low calorie\\n Liver\\n 200 cals\\n 150 cals\\n Medium\\n Liver\\n pate\\n 150 cals\\n 300 cals\\n Medium\\n Lamb (roast)\\n 300 cals\\n 300 cals\\n Med\\n-High\\n Lobster boiled\\n 200 cals\\n 100 cals\\n Low calorie\\n Luncheon meat\\n 300 cals\\n 400 cals\\n High\\n Mackeral\\n 320 cals\\n 300 cal\\ns Medium\\n Mussels\\n 90 cals\\n 90 cals\\n Low\\n-Med\\n Pheasant roast\\n 200 cals\\n 200 cals\\n Medium\\n Pilchards (tinned)\\n 140 cals\\n 140 cals\\n Medium\\n Prawns\\n 180 cals\\n 100 cals\\n Low\\n- Med\\n Pork \\n 320 cals\\n 290 cals\\n Med\\n-High\\n Pork pie\\n 320 cals\\n 450 cals\\n High\\n Rabbit\\n 200 cals\\n 180 cals\\n Medium\\n Salmon fresh\\n 220 cals\\n 180 cals\\n Medium\\n Sardines tinned in oil\\n 220 cals\\n 220 cals\\n Medium\\n Sardines in tomato sauce\\n 180 cals\\n 180 cals\\n Medium\\n Sausage pork fried\\n 250 cals\\n 320 cals\\n High\\n Sausage pork grilled\\n 220 cals\\n 280 cals\\n Med\\n-High\\n Sausage roll\\n 290 cals\\n 480 cals\\n High\\n Scampi fried in oil\\n 400 cals\\n 340 cals\\n High\\n Steak & kidney pie\\n 400 cals\\n 350 cals\\n High\\n '\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]\n" + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mPyPDF2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mpdf_file\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'./tmp/pdf/Food Calories List.pdf'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'rb'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mread_pdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mPyPDF2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mPdfFileReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpdf_file\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mnumber_of_pages\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetNumPages\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mpage\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mread_pdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetPage\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'" ] } ], @@ -3354,18 +592,18 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 10, "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "[' Fish cake' ' 90 cals per cake' ' 200 cals' ' Medium']\n", - "[' Fish fingers' ' 50 cals per piece' ' 220 cals' ' Medium']\n", - "[' Gammon' ' 320 cals' ' 280 cals' ' Med']\n", - "['-High' ' Haddock fresh' ' 200 cals' ' 110 cals']\n", - "[' Low calorie' ' Halibut fresh' ' 220 cals' ' 125 cals']\n" + "ename": "NameError", + "evalue": "name 'page_content' is not defined", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mtable_list\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpage_content\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'\\n'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0ml\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnumpy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray_split\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m/\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mNameError\u001b[0m: name 'page_content' is not defined" ] } ], @@ -3378,6 +616,20 @@ " print(l[i])" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "code", "execution_count": null, @@ -3402,7 +654,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.8" } }, "nbformat": 4, diff --git a/notebooks/Python group and sort a list of lists by a specific index,pattern.ipynb b/notebooks/Python group and sort a list of lists by a specific index,pattern.ipynb new file mode 100644 index 0000000..396395f --- /dev/null +++ b/notebooks/Python group and sort a list of lists by a specific index,pattern.ipynb @@ -0,0 +1,424 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "movies = [\n", + "1, \"Avatar\" ,'good',\n", + "2, \"Titanic\" ,'not bad',\n", + "3, \"Star Wars: The Force Awakens\" ,'good',\n", + "4, \"Jurassic World\" ,'good',\n", + "5, \"The Avengers\" ,'not bad',\n", + "6, \"Furious 7\" ,'not bad',\n", + "7, \"Avengers: Age of Ultron\" ,'good',\n", + "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n", + "9, \"Frozen\" ,'good',\n", + "\n", + "\n", + "\"The Birth of a Nation\" ,1915,\n", + "\"The Birth of a Nation\" ,1940,\n", + "\"Gone with the Wind\" ,1940,\n", + "\"Gone with the Wind\" ,1963,\n", + "\"Gone with the Wind\" ,1963,\n", + "\"The Sound of Music\" ,1966]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def sortGroupList(list_unsorted, category, category2, short=True):\n", + " listx = []\n", + " listy = []\n", + " last_section = 0\n", + " for i in range(0, len(list_unsorted), 3):\n", + " if list_unsorted[i + 2] == category:\n", + " listy.append(list_unsorted[i])\n", + " listy.append(list_unsorted[i + 1])\n", + " if not short:\n", + " listy.append(list_unsorted[i + 2])\n", + " last_section = i+2\n", + " elif list_unsorted[i + 2] == category2:\n", + " listx.append(list_unsorted[i])\n", + " listx.append(list_unsorted[i + 1])\n", + " if not short:\n", + " listx.append(list_unsorted[i + 2])\n", + " last_section = i + 2\n", + " header_category = [' - ' + category + ' - ']\n", + " header_category2 = [' - ' + category2 + ' - ']\n", + " header_category3 = [' - ' + ' - ']\n", + " return header_category + listy + header_category2 + listx + header_category3 + list_unsorted[last_section:]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sortGroupList(movies, 'good', 'not bad')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "movies = [\n", + "1, \"Avatar\" ,2009,\n", + "2, \"Titanic\" ,1997,\n", + "3, \"Star Wars: The Force Awakens\" ,2015,\n", + "4, \"Jurassic World\" ,2015,\n", + "5, \"The Avengers\" ,2012,\n", + "6, \"Furious 7\" ,2015,\n", + "7, \"Avengers: Age of Ultron\" ,2015,\n", + "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,2011,\n", + "9, \"Frozen\" ,2013,\n", + "\n", + "\n", + "\"The Birth of a Nation\" ,1915,\n", + "\"The Birth of a Nation\" ,1940,\n", + "\"Gone with the Wind\" ,1940,\n", + "\"Gone with the Wind\" ,1963,\n", + "\"The Sound of Music\" ,1966]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(len(movies))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "years = [str(x) for x in range(1997, 2015)]\n", + "years" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def sortGroupList(list_unsorted):\n", + " listx = []\n", + " listy = []\n", + " for i in range(0, len(list_unsorted), 3):\n", + " if list_unsorted[i + 2] in years:\n", + " listy.append(list_unsorted[i])\n", + " listy.append(list_unsorted[i + 1])\n", + " listy.append(list_unsorted[i + 2])\n", + " else:\n", + " listx.append(list_unsorted[i])\n", + " listx.append(list_unsorted[i + 1])\n", + " listx.append(list_unsorted[i + 2])\n", + " for i in listy:\n", + " print(i)\n", + " for i in listx:\n", + " print(i)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sortGroupList(movies)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "movies = [\n", + "1, \"Avatar\" ,'good',\n", + "2, \"Titanic\" ,'not bad',\n", + "3, \"Star Wars: The Force Awakens\" ,'good',\n", + "4, \"Jurassic World\" ,'good',\n", + "5, \"The Avengers\" ,'not bad',\n", + "6, \"Furious 7\" ,'not bad',\n", + "7, \"Avengers: Age of Ultron\" ,'good',\n", + "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n", + "9, \"Frozen\" ,'good',\n", + "\n", + "\n", + "\"The Birth of a Nation\" ,1915,\n", + "\"The Birth of a Nation\" ,1940,\n", + "\"Gone with the Wind\" ,1940,\n", + "\"Gone with the Wind\" ,1963,\n", + "\"The Sound of Music\" ,1966]\n", + "df = pd.DataFrame(movies)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "types = []\n", + "raw_list = []\n", + "for e in movies:\n", + " types.append(type(e))\n", + " if isinstance(e, int):\n", + " raw_list.append(1)\n", + " else:\n", + " raw_list.append(0)\n", + "df1 = pd.DataFrame({'elem':movies, 'types':types}) " + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "raw_list = [1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1]\n", + "movies = [\n", + "1, \"Avatar\" ,'good',\n", + "2, \"Titanic\" ,'not bad',\n", + "3, \"Star Wars: The Force Awakens\" ,'good',\n", + "4, \"Jurassic World\" ,'good',\n", + "5, \"The Avengers\" ,'not bad',\n", + "6, \"Furious 7\" ,'not bad',\n", + "7, \"Avengers: Age of Ultron\" ,'good',\n", + "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n", + "9, \"Frozen\" ,'good',\n", + "\n", + "\n", + "\"The Birth of a Nation\" ,1915,\n", + "\"The Birth of a Nation\" ,1940,\n", + "\"Gone with the Wind\" ,1940,\n", + "\"Gone with the Wind\" ,1963,\n", + "\"The Sound of Music\" ,1966]" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[0, 1]\n", + "[0, 1]\n", + "[0, 1]\n", + "[0, 1]\n", + "[0, 1]\n" + ] + }, + { + "data": { + "text/plain": [ + "[[1, 'Avatar', 'good'],\n", + " [2, 'Titanic', 'not bad'],\n", + " [3, 'Star Wars: The Force Awakens', 'good'],\n", + " [4, 'Jurassic World', 'good'],\n", + " [5, 'The Avengers', 'not bad'],\n", + " [6, 'Furious 7', 'not bad'],\n", + " [7, 'Avengers: Age of Ultron', 'good'],\n", + " [8, 'Harry Potter and the Deathly Hallows – Part 2', 'not bad'],\n", + " [9, 'Frozen', 'good']]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "patern1 = [1, 0, 0]\n", + "patern2 = [1, 0]\n", + "\n", + "len1 = len(patern1)\n", + "len2 = len(patern2)\n", + "\n", + "output1 = []\n", + "output2 = []\n", + "\n", + "while(raw_list):\n", + " if raw_list[:len1] == patern1: \n", + " output1.append(movies[:len1])\n", + " raw_list = raw_list[len1:]\n", + " movies = movies[len1:]\n", + " else:\n", + " print(raw_list[:len2])\n", + " output2.append(movies[:len2])\n", + " raw_list = raw_list[len2:]\n", + " movies = movies[len2:]\n", + " \n", + "output1" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['The Birth of a Nation', 1915],\n", + " ['The Birth of a Nation', 1940],\n", + " ['Gone with the Wind', 1940],\n", + " ['Gone with the Wind', 1963],\n", + " ['The Sound of Music', 1966]]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "output2" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "new_list = sorted(output1, key=lambda x: x[2])" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[[1, 'Avatar', 'good'],\n", + " [3, 'Star Wars: The Force Awakens', 'good'],\n", + " [4, 'Jurassic World', 'good'],\n", + " [7, 'Avengers: Age of Ultron', 'good'],\n", + " [9, 'Frozen', 'good'],\n", + " [2, 'Titanic', 'not bad'],\n", + " [5, 'The Avengers', 'not bad'],\n", + " [6, 'Furious 7', 'not bad'],\n", + " [8, 'Harry Potter and the Deathly Hallows – Part 2', 'not bad']]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_list" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Python make groups in a list" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Simple grouping" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/notebooks/Scrape wiki tables with pandas and python.ipynb b/notebooks/Scrape wiki tables with pandas and python.ipynb index 3264206..af93cad 100644 --- a/notebooks/Scrape wiki tables with pandas and python.ipynb +++ b/notebooks/Scrape wiki tables with pandas and python.ipynb @@ -2661,7 +2661,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.8" } }, "nbformat": 4, diff --git a/notebooks/What_is_the_usage_of_*_asterisk_in_Python.ipynb b/notebooks/What_is_the_usage_of_*_asterisk_in_Python.ipynb index 6d0cb36..e7c1c50 100644 --- a/notebooks/What_is_the_usage_of_*_asterisk_in_Python.ipynb +++ b/notebooks/What_is_the_usage_of_*_asterisk_in_Python.ipynb @@ -67,10 +67,10 @@ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", - "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m 2 *** 2\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" - ] + ], + "output_type": "error" } ], "source": [ diff --git a/notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb b/notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb index c2b755d..d0d81d7 100644 --- a/notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb +++ b/notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb @@ -28,7 +28,7 @@ "* Convert string to datetime in DataFrame\n", "* Select rows between two dates\n", " * 1. Select rows based on dates with loc\n", - " * 2. Select rows based on dates without loc\n", + " * 2. Series method between\n", " * 3. Select rows between two times\n", " * 4. Select rows based on dates without loc\n", " * 5. Use mask to mark the records\n", @@ -385,75 +385,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### 2. Select rows based on dates without loc" + "#### 2. Series method between" ] }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
loading_datetimepagestitledatetime_col
12019-10-29 19:56:03english<GET https://en.wikipedia.org/wiki/Main_Page>...2019-10-31 11:16:43+00:00
22019-10-29 19:56:03italiano<GET https://it.wikipedia.org/wiki/Pagina_pri...2019-10-30 21:15:23+00:00
\n", - "
" - ], - "text/plain": [ - " loading_datetime pages \\\n", - "1 2019-10-29 19:56:03 english \n", - "2 2019-10-29 19:56:03 italiano \n", - "\n", - " title datetime_col \n", - "1 ... 2019-10-31 11:16:43+00:00 \n", - "2 \n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
loading_datetimepagestitledatetime_col
\n", - "" - ], - "text/plain": [ - "Empty DataFrame\n", - "Columns: [loading_datetime, pages, title, datetime_col]\n", - "Index: []" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df[(df['datetime_col'] > '2018-12-02') & (df['datetime_col'] <= '2018-12-03 23:26:10+00:00')]" ] @@ -605,7 +501,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 12, "metadata": {}, "outputs": [ { @@ -655,7 +551,7 @@ "1 ... 2019-10-31 11:16:43+00:00 " ] }, - "execution_count": 16, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } diff --git a/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb b/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb index 91e9752..b15791d 100644 --- a/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb +++ b/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb @@ -2032,7 +2032,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.9" } }, "nbformat": 4, diff --git a/notebooks/youtube/Youtube-PewDiePie.ipynb b/notebooks/youtube/Youtube-PewDiePie.ipynb index 1b7bec0..8e457ed 100644 --- a/notebooks/youtube/Youtube-PewDiePie.ipynb +++ b/notebooks/youtube/Youtube-PewDiePie.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ @@ -13,26 +13,26 @@ }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "(359, 8)\n" + "(143, 8)\n" ] } ], "source": [ "df = pd.read_csv(\n", - " \"~/Projects/MYP/Datasets/Youtube/PewDiePie20190210.csv\", sep=\"@\")\n", + " \"~/Projects/MYP/Datasets/Youtube/me20190528.csv\", sep=\"@\")\n", "print(df.shape)" ] }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 3, "metadata": {}, "outputs": [ { @@ -69,94 +69,94 @@ " \n", " \n", " 0\n", - " YOU HAD ONE JOB! - with editor Brad1\n", - " 5,293,108.0\n", - " 385,429.0\n", - " 4,083.0\n", + " PyCharm/IntelliJ fast and auto change of the color theme\n", + " 41.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 29,855.0\n", - " https://www.youtube.com/watch?v=B67OBHNCopk\n", - " SATIRE, reddit, you had one job, onejob\n", + " 2.0\n", + " https://www.youtube.com/embed/SsX9Fl958W0\n", + " https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", " \n", " \n", " 1\n", - " Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE\n", - " 5,358,466.0\n", - " 378,535.0\n", - " 3,951.0\n", + " How to add weather desklet to Linux Mint 19\n", + " 291.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 38,075.0\n", - " https://www.youtube.com/watch?v=kLM_9gBZIqY\n", - " SATIRE\n", + " 0.0\n", + " https://www.youtube.com/embed/-FPY_e0BdJs\n", + " https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", " \n", " \n", " 2\n", - " We broke another WORLD RECORD!\n", - " 8,558,673.0\n", - " 595,622.0\n", - " 7,901.0\n", + " How to easy integrate Google Calendar to Desktop for Linux Mint\n", + " 226.0\n", + " 1.0\n", + " 0.0\n", " 0.0\n", - " 53,664.0\n", - " https://www.youtube.com/watch?v=d1tAfXKc7-c\n", - " SATIRE\n", + " 0.0\n", + " https://www.youtube.com/embed/2evIujisdD0\n", + " https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", " \n", " \n", " 3\n", - " FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~\n", - " 3,609,152.0\n", - " 218,530.0\n", - " 3,126.0\n", + " Pandas use a list of values to select rows from a column\n", + " 45.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 17,595.0\n", - " https://www.youtube.com/watch?v=bMLdNrB5hAo\n", - " SATIRE\n", + " 10.0\n", + " https://www.youtube.com/embed/jlSbo5wmTPQ\n", + " https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", " \n", " \n", " 4\n", - " Don't Laugh Challenge, NEW SEASON!!!!!\n", - " 5,888,465.0\n", - " 569,900.0\n", - " 7,824.0\n", + " Pandas count and percentage by value for a column\n", + " 63.0\n", + " 3.0\n", " 0.0\n", - " 29,373.0\n", - " https://www.youtube.com/watch?v=Zgm_iM3f_ME\n", - " SATIRE\n", + " 0.0\n", + " 0.0\n", + " https://www.youtube.com/embed/P5pxJkv71BU\n", + " https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", " \n", " \n", "\n", "" ], "text/plain": [ - " title Views \\\n", - "0 YOU HAD ONE JOB! - with editor Brad1 5,293,108.0 \n", - "1 Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE 5,358,466.0 \n", - "2 We broke another WORLD RECORD! 8,558,673.0 \n", - "3 FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~ 3,609,152.0 \n", - "4 Don't Laugh Challenge, NEW SEASON!!!!! 5,888,465.0 \n", + " title Views \\\n", + "0 PyCharm/IntelliJ fast and auto change of the color theme 41.0 \n", + "1 How to add weather desklet to Linux Mint 19 291.0 \n", + "2 How to easy integrate Google Calendar to Desktop for Linux Mint 226.0 \n", + "3 Pandas use a list of values to select rows from a column 45.0 \n", + "4 Pandas count and percentage by value for a column 63.0 \n", "\n", - " Like Dislike Favorite Comment \\\n", - "0 385,429.0 4,083.0 0.0 29,855.0 \n", - "1 378,535.0 3,951.0 0.0 38,075.0 \n", - "2 595,622.0 7,901.0 0.0 53,664.0 \n", - "3 218,530.0 3,126.0 0.0 17,595.0 \n", - "4 569,900.0 7,824.0 0.0 29,373.0 \n", + " Like Dislike Favorite Comment \\\n", + "0 0.0 0.0 0.0 2.0 \n", + "1 0.0 0.0 0.0 0.0 \n", + "2 1.0 0.0 0.0 0.0 \n", + "3 3.0 0.0 0.0 10.0 \n", + "4 3.0 0.0 0.0 0.0 \n", "\n", - " videoID \\\n", - "0 https://www.youtube.com/watch?v=B67OBHNCopk \n", - "1 https://www.youtube.com/watch?v=kLM_9gBZIqY \n", - "2 https://www.youtube.com/watch?v=d1tAfXKc7-c \n", - "3 https://www.youtube.com/watch?v=bMLdNrB5hAo \n", - "4 https://www.youtube.com/watch?v=Zgm_iM3f_ME \n", + " videoID \\\n", + "0 https://www.youtube.com/embed/SsX9Fl958W0 \n", + "1 https://www.youtube.com/embed/-FPY_e0BdJs \n", + "2 https://www.youtube.com/embed/2evIujisdD0 \n", + "3 https://www.youtube.com/embed/jlSbo5wmTPQ \n", + "4 https://www.youtube.com/embed/P5pxJkv71BU \n", "\n", - " tags \n", - "0 SATIRE, reddit, you had one job, onejob \n", - "1 SATIRE \n", - "2 SATIRE \n", - "3 SATIRE \n", - "4 SATIRE " + " tags \n", + "0 https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg \n", + "1 https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg \n", + "2 https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg \n", + "3 https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg \n", + "4 https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg " ] }, - "execution_count": 45, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } @@ -168,16 +168,16 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "(359, 8)" + "(143, 8)" ] }, - "execution_count": 46, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -188,7 +188,7 @@ }, { "cell_type": "code", - "execution_count": 64, + "execution_count": 5, "metadata": { "scrolled": true }, @@ -199,7 +199,7 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -224,173 +224,43 @@ " \n", " \n", " 0\n", - " 1\n", - " 2\n", - " 3\n", - " 4\n", - " 5\n", - " 6\n", - " 7\n", - " 8\n", - " 9\n", - " ...\n", - " 38\n", - " 39\n", - " 40\n", - " 41\n", - " 42\n", - " 43\n", - " 44\n", - " 45\n", - " 46\n", - " 47\n", " \n", " \n", " \n", " \n", " 0\n", " True\n", - " True\n", - " True\n", - " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 1\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 2\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 3\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 4\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", "\n", - "

5 rows × 48 columns

\n", "" ], "text/plain": [ - " 0 1 2 3 4 5 6 7 8 9 ... \\\n", - "0 True True True True False False False False False False ... \n", - "1 True False False False False False False False False False ... \n", - "2 True False False False False False False False False False ... \n", - "3 True False False False False False False False False False ... \n", - "4 True False False False False False False False False False ... \n", - "\n", - " 38 39 40 41 42 43 44 45 46 47 \n", - "0 False False False False False False False False False False \n", - "1 False False False False False False False False False False \n", - "2 False False False False False False False False False False \n", - "3 False False False False False False False False False False \n", - "4 False False False False False False False False False False \n", - "\n", - "[5 rows x 48 columns]" + " 0\n", + "0 True\n", + "1 True\n", + "2 True\n", + "3 True\n", + "4 True" ] }, - "execution_count": 48, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -402,16 +272,16 @@ }, { "cell_type": "code", - "execution_count": 53, + "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "RangeIndex(start=0, stop=48, step=1)" + "RangeIndex(start=0, stop=1, step=1)" ] }, - "execution_count": 53, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -423,7 +293,7 @@ }, { "cell_type": "code", - "execution_count": 65, + "execution_count": 8, "metadata": {}, "outputs": [ { @@ -448,173 +318,43 @@ " \n", " \n", " 0\n", - " 1\n", - " 2\n", - " 3\n", - " 4\n", - " 5\n", - " 6\n", - " 7\n", - " 8\n", - " 9\n", - " ...\n", - " 38\n", - " 39\n", - " 40\n", - " 41\n", - " 42\n", - " 43\n", - " 44\n", - " 45\n", - " 46\n", - " 47\n", " \n", " \n", " \n", " \n", " 0\n", " True\n", - " True\n", - " True\n", - " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 1\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 2\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 3\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", " 4\n", " True\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " ...\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", - " False\n", " \n", " \n", "\n", - "

5 rows × 48 columns

\n", "" ], "text/plain": [ - " 0 1 2 3 4 5 6 7 8 9 ... \\\n", - "0 True True True True False False False False False False ... \n", - "1 True False False False False False False False False False ... \n", - "2 True False False False False False False False False False ... \n", - "3 True False False False False False False False False False ... \n", - "4 True False False False False False False False False False ... \n", - "\n", - " 38 39 40 41 42 43 44 45 46 47 \n", - "0 False False False False False False False False False False \n", - "1 False False False False False False False False False False \n", - "2 False False False False False False False False False False \n", - "3 False False False False False False False False False False \n", - "4 False False False False False False False False False False \n", - "\n", - "[5 rows x 48 columns]" + " 0\n", + "0 True\n", + "1 True\n", + "2 True\n", + "3 True\n", + "4 True" ] }, - "execution_count": 65, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } @@ -625,7 +365,7 @@ }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 9, "metadata": {}, "outputs": [ { @@ -650,173 +390,43 @@ " \n", " \n", " 0\n", - " 1\n", - " 2\n", - " 3\n", - " 4\n", - " 5\n", - " 6\n", - " 7\n", - " 8\n", - " 9\n", - " ...\n", - " 38\n", - " 39\n", - " 40\n", - " 41\n", - " 42\n", - " 43\n", - " 44\n", - " 45\n", - " 46\n", - " 47\n", " \n", " \n", " \n", " \n", " 0\n", - " SATIRE\n", - " reddit\n", - " you had one job\n", - " onejob\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " ...\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", + " https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", " \n", " \n", " 1\n", - " SATIRE\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " ...\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", + " https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", " \n", " \n", " 2\n", - " SATIRE\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " ...\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", + " https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", " \n", " \n", " 3\n", - " SATIRE\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " ...\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", + " https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", " \n", " \n", " 4\n", - " SATIRE\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " ...\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", - " None\n", + " https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", " \n", " \n", "\n", - "

5 rows × 48 columns

\n", "" ], "text/plain": [ - " 0 1 2 3 4 5 6 7 8 \\\n", - "0 SATIRE reddit you had one job onejob None None None None None \n", - "1 SATIRE None None None None None None None None \n", - "2 SATIRE None None None None None None None None \n", - "3 SATIRE None None None None None None None None \n", - "4 SATIRE None None None None None None None None \n", - "\n", - " 9 ... 38 39 40 41 42 43 44 45 46 47 \n", - "0 None ... None None None None None None None None None None \n", - "1 None ... None None None None None None None None None None \n", - "2 None ... None None None None None None None None None None \n", - "3 None ... None None None None None None None None None None \n", - "4 None ... None None None None None None None None None None \n", - "\n", - "[5 rows x 48 columns]" + " 0\n", + "0 https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", + "1 https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", + "2 https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", + "3 https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", + "4 https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg" ] }, - "execution_count": 66, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } @@ -827,7 +437,7 @@ }, { "cell_type": "code", - "execution_count": 73, + "execution_count": 10, "metadata": {}, "outputs": [ { @@ -835,3956 +445,422 @@ "output_type": "stream", "text": [ "ssssssssssssssssssssssssssssssssss0ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 reddit \n", - "2 you had one job\n", - "3 onejob \n", + "0 https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", "Name: 0, dtype: object\n", "ssssssssssssssssssssssssssssssssss1ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", "Name: 1, dtype: object\n", "ssssssssssssssssssssssssssssssssss2ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", "Name: 2, dtype: object\n", "ssssssssssssssssssssssssssssssssss3ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", "Name: 3, dtype: object\n", "ssssssssssssssssssssssssssssssssss4ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", "Name: 4, dtype: object\n", "ssssssssssssssssssssssssssssssssss5ssssssssssssssssssssssssssssssssss\n", - "0 player \n", - "1 unknown \n", - "2 PUBG \n", - "3 player unknowns \n", - "4 player unknown's\n", - "5 battleground \n", - "6 battle \n", - "7 ground \n", + "0 https://i.ytimg.com/vi/Ni2SjEuz__g/hqdefault.jpg\n", "Name: 5, dtype: object\n", "ssssssssssssssssssssssssssssssssss6ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/EXxJ-We2ygw/hqdefault.jpg\n", "Name: 6, dtype: object\n", "ssssssssssssssssssssssssssssssssss7ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme review\n", - "2 elon musk \n", + "0 https://i.ytimg.com/vi/tfU8pDNYlDA/hqdefault.jpg\n", "Name: 7, dtype: object\n", "ssssssssssssssssssssssssssssssssss8ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 The Battle Wizard . ENDING EXPLAINED\n", - "2 the battle wizard \n", - "3 battle wizard \n", - "4 battle wizard 1977 \n", - "5 battle wizard movie \n", - "6 movie review \n", - "7 movie \n", - "8 film review \n", - "9 pewdiepie \n", - "10 pewds \n", - "11 pewdie \n", - "12 pdp \n", - "13 wizard \n", + "0 https://i.ytimg.com/vi/nW5ltiwV-6Y/hqdefault.jpg\n", "Name: 8, dtype: object\n", "ssssssssssssssssssssssssssssssssss9ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 dr phil\n", - "2 react \n", + "0 https://i.ytimg.com/vi/Z1vISDOhC0k/hqdefault.jpg\n", "Name: 9, dtype: object\n", "ssssssssssssssssssssssssssssssssss10ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 Thats right... I'm a GAMER\n", - "2 gamer \n", - "3 gaming \n", - "4 youtube gaming \n", - "5 memes \n", - "6 pewdiepie \n", - "7 pewds \n", - "8 pewdie \n", - "9 pdp \n", + "0 https://i.ytimg.com/vi/lx7KFd6BPcg/hqdefault.jpg\n", "Name: 10, dtype: object\n", "ssssssssssssssssssssssssssssssssss11ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/3g6KG_8zq0E/hqdefault.jpg\n", "Name: 11, dtype: object\n", "ssssssssssssssssssssssssssssssssss12ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tiktok \n", - "2 tik tok \n", - "3 tik tok funny \n", - "4 tik tok compilation\n", - "5 funny tik toks \n", - "6 funny tiktok \n", - "7 funny tiktok memes \n", - "8 tiktok songs \n", - "9 tiktok cringe \n", - "10 cringe \n", - "11 cringe compilation \n", - "12 tiktok memes \n", - "13 tik tok memes \n", - "14 pewdiepie tiktok \n", - "15 pewdiepie \n", - "16 pewds \n", - "17 pewdie \n", - "18 pdp \n", - "19 #ad \n", - "20 4K video \n", + "0 https://i.ytimg.com/vi/-NVFQ_q3eRM/hqdefault.jpg\n", "Name: 12, dtype: object\n", "ssssssssssssssssssssssssssssssssss13ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/CA6lyOmfRbM/hqdefault.jpg\n", "Name: 13, dtype: object\n", "ssssssssssssssssssssssssssssssssss14ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 MY NEW SHOW / You Laugh You Lose\n", - "2 you laugh you lose \n", - "3 ylyl \n", - "4 you laugh you lose challenge \n", - "5 try not to laugh \n", - "6 try not to laugh challenge \n", - "7 pewdiepie \n", - "8 pewdiepie ylyl \n", - "9 ylyl pewds \n", - "10 pewdie \n", - "11 pdp \n", - "12 pewds \n", + "0 https://i.ytimg.com/vi/PIAzK1rvqIY/hqdefault.jpg\n", "Name: 14, dtype: object\n", "ssssssssssssssssssssssssssssssssss15ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 pew \n", - "2 news \n", + "0 https://i.ytimg.com/vi/nrF_Rgh88no/hqdefault.jpg\n", "Name: 15, dtype: object\n", "ssssssssssssssssssssssssssssssssss16ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 Sasuke Memes are NOT OK\n", - "2 sasuke \n", - "3 sasuke naruto \n", - "4 naruto \n", - "5 pewdiepie \n", - "6 meme review \n", - "7 memes \n", - "8 meme \n", - "9 pewds \n", - "10 pewdie \n", - "11 pdp \n", - "12 wave check \n", - "13 waves \n", - "14 wave hair \n", - "15 waves hair \n", + "0 https://i.ytimg.com/vi/4ixLp8aFomw/hqdefault.jpg\n", "Name: 16, dtype: object\n", "ssssssssssssssssssssssssssssssssss17ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/UvCO5gKQqtE/hqdefault.jpg\n", "Name: 17, dtype: object\n", "ssssssssssssssssssssssssssssssssss18ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 dr phil \n", - "2 dr phil spoiled teen \n", - "3 dr phil pewdiepie \n", - "4 Dr Phil VS Spoiled teen *destroyed by facts and logic*\n", - "5 dr phil spoiled \n", - "6 dr phil full episodes \n", - "7 pewds \n", - "8 pewdie \n", - "9 pewdiepie \n", - "10 pdp \n", - "11 dr phil 2019 \n", + "0 https://i.ytimg.com/vi/j80mqdfy8Fw/hqdefault.jpg\n", "Name: 18, dtype: object\n", "ssssssssssssssssssssssssssssssssss19ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/bKBpDywKje8/hqdefault.jpg\n", "Name: 19, dtype: object\n", "ssssssssssssssssssssssssssssssssss20ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 20, dtype: object\n", + "Series([], Name: 20, dtype: object)\n", "ssssssssssssssssssssssssssssssssss21ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/t_DI7NbjcFs/hqdefault.jpg\n", "Name: 21, dtype: object\n", "ssssssssssssssssssssssssssssssssss22ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 Im most handsome 2018 \n", - "2 gamer girls \n", - "3 reaction \n", - "4 react \n", - "5 gamer girls react \n", - "6 most handsome man \n", - "7 pewdiepie \n", - "8 pewds \n", - "9 pewdie \n", - "10 pdp \n", - "11 lwiay \n", - "12 pewdiepie lwiay \n", - "13 pokimane \n", - "14 lords mobile \n", - "15 ads \n", - "16 ad \n", - "17 lords mobile ad \n", - "18 mobile ads \n", - "19 handsome man \n", - "20 most handsome man winner\n", - "21 handsome \n", - "22 gamer girls reaction \n", - "23 gamer \n", - "24 girls \n", - "25 gaming \n", - "26 entertainment \n", + "0 https://i.ytimg.com/vi/Ol3Dwucax9U/hqdefault.jpg\n", "Name: 22, dtype: object\n", "ssssssssssssssssssssssssssssssssss23ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 comedy \n", - "3 you laugh you lose\n", - "4 compilation \n", - "5 try not to laugh \n", - "6 challenge \n", + "0 https://i.ytimg.com/vi/NbvHU_KoD74/hqdefault.jpg\n", "Name: 23, dtype: object\n", "ssssssssssssssssssssssssssssssssss24ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/zVQJQxpedm8/hqdefault.jpg\n", "Name: 24, dtype: object\n", "ssssssssssssssssssssssssssssssssss25ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 jesus \n", - "2 socalchrist \n", - "3 fake gamers \n", - "4 fake gamer girl \n", - "5 gamer girls \n", - "6 twitch girls \n", - "7 ricegum jakepaul \n", - "8 jake paul \n", - "9 jake paul amazon \n", - "10 amazon \n", - "11 amazon gift card \n", - "12 fake amazon giftcard\n", - "13 fake amazon \n", - "14 pewdiepie \n", - "15 pewds \n", - "16 pewdie \n", - "17 pdp \n", - "18 pew news \n", - "19 #ad \n", - "20 news \n", - "21 current affairs \n", - "22 ricegum jake paul \n", + "0 https://i.ytimg.com/vi/lCcE-0bykRU/hqdefault.jpg\n", "Name: 25, dtype: object\n", "ssssssssssssssssssssssssssssssssss26ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 vr chat\n", - "2 game \n", - "3 gaming \n", + "0 https://i.ytimg.com/vi/seLcRCulwl4/hqdefault.jpg\n", "Name: 26, dtype: object\n", "ssssssssssssssssssssssssssssssssss27ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 genius \n", - "2 review \n", - "3 lele pons \n", - "4 gabbi hannsomething \n", - "5 jacob whatever his name is\n", - "6 other people \n", + "0 https://i.ytimg.com/vi/ZfemCpfJNfU/hqdefault.jpg\n", "Name: 27, dtype: object\n", "ssssssssssssssssssssssssssssssssss28ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 reddit \n", - "2 reddit review \n", - "3 pewdiepie \n", - "4 pewds \n", - "5 pewdie \n", - "6 pdp \n", - "7 tseries \n", - "8 t series \n", - "9 pewdiepie vs tseries \n", - "10 pewdiepie vs t series \n", - "11 oopsie \n", - "12 /r/ \n", - "13 /r \n", - "14 reddit try not to laugh \n", - "15 reddit cringe \n", - "16 reddit stories \n", - "17 reddit cringe compilation\n", - "18 vox \n", - "19 vox media \n", - "20 pewdiepie vox media \n", - "21 pewdiepie vox \n", - "22 Unintentional Opsies \n", - "23 opsies \n", + "0 https://i.ytimg.com/vi/TgO-AkopLo4/hqdefault.jpg\n", "Name: 28, dtype: object\n", "ssssssssssssssssssssssssssssssssss29ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pewdiepie vs thanos\n", - "2 Pewdiepie vs Thanos\n", - "3 WHO would WIN? \n", - "4 pewdiepie \n", - "5 pewds \n", - "6 pewdie \n", - "7 pdp \n", - "8 thanos \n", - "9 thanos meme \n", - "10 thanos memes \n", - "11 tseries \n", + "0 https://i.ytimg.com/vi/HMB4zrP_-HY/hqdefault.jpg\n", "Name: 29, dtype: object\n", "ssssssssssssssssssssssssssssssssss30ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", - "3 awards\n", + "0 https://i.ytimg.com/vi/JBm8iptLnuA/hqdefault.jpg\n", "Name: 30, dtype: object\n", "ssssssssssssssssssssssssssssssssss31ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 We broke a world record!\n", - "2 world \n", - "3 record \n", - "4 world record \n", - "5 pewdiepie \n", - "6 pewds \n", - "7 pewdie \n", - "8 pdp \n", - "9 world record pewdipie \n", - "10 tseries \n", - "11 t series \n", - "12 youtube rewind \n", - "13 youtube rewind 2018 \n", + "0 https://i.ytimg.com/vi/Ynp0xyBgwt0/hqdefault.jpg\n", "Name: 31, dtype: object\n", "ssssssssssssssssssssssssssssssssss32ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/ftGiBv3LL_A/hqdefault.jpg\n", "Name: 32, dtype: object\n", "ssssssssssssssssssssssssssssssssss33ssssssssssssssssssssssssssssssssss\n", - "0 rewind 2018 \n", - "1 youtube rewind 2018\n", + "0 https://i.ytimg.com/vi/5pbRivDYzko/hqdefault.jpg\n", "Name: 33, dtype: object\n", "ssssssssssssssssssssssssssssssssss34ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 dr phil \n", - "2 Dr Phil ANNIHILATES spoiled Teen!!\n", - "3 dr phil spoiled daughter \n", - "4 dr phil full episodes \n", - "5 dr phil im white \n", - "6 dr phil annihilates \n", - "7 spoiled teen \n", - "8 dr phil spoiled \n", - "9 dr phil pewdiepie \n", - "10 dr phil 2018 \n", - "11 dr phil funny \n", - "12 dr phil meme review \n", - "13 dr phil treasure \n", - "14 dr phil video \n", - "15 dr phil tv show \n", - "16 pewdiepie \n", - "17 pewds \n", - "18 pewdie \n", - "19 pdp \n", + "0 https://i.ytimg.com/vi/3jlXIX5Ctyo/hqdefault.jpg\n", "Name: 34, dtype: object\n", "ssssssssssssssssssssssssssssssssss35ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 25 Dec 2018\n", + "0 https://i.ytimg.com/vi/mG9OnH9R5yM/hqdefault.jpg\n", "Name: 35, dtype: object\n", "ssssssssssssssssssssssssssssssssss36ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 lwiay\n", + "0 https://i.ytimg.com/vi/SnMXqyLqZwM/hqdefault.jpg\n", "Name: 36, dtype: object\n", "ssssssssssssssssssssssssssssssssss37ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 wsj \n", - "2 hack \n", + "0 https://i.ytimg.com/vi/30ndwJm1I5c/hqdefault.jpg\n", "Name: 37, dtype: object\n", "ssssssssssssssssssssssssssssssssss38ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 You SLAV You Lose \n", - "3 you laugh \n", - "4 you lose \n", - "5 try not to laugh \n", - "6 you laugh you lose \n", - "7 you laugh you lose pewdiepie\n", - "8 try not to laugh challenge \n", - "9 pewdiepie \n", - "10 pewds \n", - "11 pewdie \n", - "12 pdp \n", + "0 https://i.ytimg.com/vi/IoeYrz-fP2o/hqdefault.jpg\n", "Name: 38, dtype: object\n", "ssssssssssssssssssssssssssssssssss39ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 youtube rewind\n", - "2 rewind \n", - "3 2018 \n", - "4 roast \n", - "5 lwiay \n", - "6 ylyl \n", - "7 meme \n", - "8 review \n", - "Name: 39, dtype: object\n", + "Series([], Name: 39, dtype: object)\n", "ssssssssssssssssssssssssssssssssss40ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pewdiepie \n", - "2 pewds \n", - "3 pewdie \n", - "4 pdp \n", - "5 PewDiePie's biggest OOPSIE.\n", - "6 pew news \n", - "7 game awards 2018 \n", - "8 game awards 2018 cringe \n", + "0 https://i.ytimg.com/vi/hJMH_1o8eU0/hqdefault.jpg\n", "Name: 40, dtype: object\n", "ssssssssssssssssssssssssssssssssss41ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/OXA_ZD1gR6A/hqdefault.jpg\n", "Name: 41, dtype: object\n", "ssssssssssssssssssssssssssssssssss42ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tiktok \n", - "2 tiktok memes \n", - "3 tiktok songs \n", - "4 tiktok cringe \n", - "5 tiktok tutorial \n", - "6 tiktok hit or miss \n", - "7 tiktok music \n", - "8 tiktok fortnite \n", - "9 tiktok cringe compilation \n", - "10 tiktok epic \n", - "11 best tiktok \n", - "12 best tiktok videos \n", - "13 tiktok funny \n", - "14 tiktok funny videos \n", - "15 tiktok haha \n", - "16 tiktok epic memes \n", - "17 tiktok compilation \n", - "18 tiktok compilation 2018 \n", - "19 tiktok 2018 \n", - "20 Tik Tok Very Funny Haha Epic Compilation Montage BEST TIK TOK 2018 LOL\n", - "21 tiktok montage \n", - "22 pewdiepie tiktok \n", - "23 pewdiepie vs t series \n", - "24 pewdiepie \n", - "25 pewdie \n", - "26 pdp \n", + "0 https://i.ytimg.com/vi/duOHHDqI40c/hqdefault.jpg\n", "Name: 42, dtype: object\n", "ssssssssssssssssssssssssssssssssss43ssssssssssssssssssssssssssssssssss\n", - "0 player \n", - "1 unknown \n", - "2 PUBG \n", - "3 player unknowns \n", - "4 player unknown's\n", - "5 battleground \n", - "6 battle \n", - "7 ground \n", + "0 https://i.ytimg.com/vi/vbHFIALhSWE/hqdefault.jpg\n", "Name: 43, dtype: object\n", "ssssssssssssssssssssssssssssssssss44ssssssssssssssssssssssssssssssssss\n", - "0 TRY TO LAUGH NOT CHALLENGE \n", - "1 TRY NOT TO LAUGH \n", - "2 try not to laugh challenge \n", - "3 try not to laugh challenge impossible\n", - "4 try not to laugh challenge clean \n", - "5 try not to laugh \n", - "6 try not to laugh tiktok \n", - "7 tltl \n", - "8 pewdiepie \n", - "9 pewds \n", - "10 pewdie \n", - "11 ylyl \n", - "12 you laugh you lose \n", - "13 episode 1 season 1 \n", - "14 ep 1 \n", - "15 pdp \n", - "16 pewdiepie ylyl \n", - "17 video \n", - "18 youtube video \n", - "19 youtube channel \n", - "20 t series \n", - "21 tseries vs pewdiepie \n", - "22 tiktok \n", - "23 fortnite \n", - "24 fortnite funny moments \n", + "0 https://i.ytimg.com/vi/ZWytZoEVpGU/hqdefault.jpg\n", "Name: 44, dtype: object\n", "ssssssssssssssssssssssssssssssssss45ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 youtube\n", - "2 rewind \n", - "3 meme \n", - "4 yea \n", - "5 review \n", + "0 https://i.ytimg.com/vi/uoAV7651Op0/hqdefault.jpg\n", "Name: 45, dtype: object\n", "ssssssssssssssssssssssssssssssssss46ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme review\n", + "0 https://i.ytimg.com/vi/702lkQbZx50/hqdefault.jpg\n", "Name: 46, dtype: object\n", "ssssssssssssssssssssssssssssssssss47ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pewdiepie\n", - "2 fortnite \n", - "3 lwiay \n", - "4 ylyl \n", - "5 meme \n", - "6 review \n", - "7 season 7 \n", - "8 new \n", - "9 skins \n", + "0 https://i.ytimg.com/vi/7sgDvC4k6Xg/hqdefault.jpg\n", "Name: 47, dtype: object\n", "ssssssssssssssssssssssssssssssssss48ssssssssssssssssssssssssssssssssss\n", - "0 player \n", - "1 unknown \n", - "2 PUBG \n", - "3 player unknowns \n", - "4 player unknown's\n", - "5 battleground \n", - "6 battle \n", - "7 ground \n", + "0 https://i.ytimg.com/vi/cCoGsFVPVh0/hqdefault.jpg\n", "Name: 48, dtype: object\n", "ssssssssssssssssssssssssssssssssss49ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 2x \n", - "2 slow mo\n", - "3 50% \n", - "4 speed \n", - "Name: 49, dtype: object\n", + "Series([], Name: 49, dtype: object)\n", "ssssssssssssssssssssssssssssssssss50ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", + "0 https://i.ytimg.com/vi/Odog86JslbA/hqdefault.jpg\n", "Name: 50, dtype: object\n", "ssssssssssssssssssssssssssssssssss51ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tekashi69 \n", - "2 tekashi 6ix9ine \n", - "3 tekashi69 songs \n", - "4 6ix9ine \n", - "5 6ix9ine 2018 \n", - "6 Ninja \n", - "7 ninja fortnite \n", - "8 ninja fortnite gameplay \n", - "9 fortnite \n", - "10 fortnite funny moments \n", - "11 icy five ninja \n", - "12 alinity \n", - "13 alinity pewdiepie \n", - "14 alinity pewdiepie copystrike \n", - "15 pew news \n", - "16 pewdiepie \n", - "17 pewds \n", - "18 pdp \n", - "19 pewdie \n", - "20 youtube video \n", - "21 youtube channel \n", - "22 youtube \n", - "23 Tekashi69 BAN \n", - "24 Ninja caught selling underwear\n", - "25 Alinity facing 32 year prison.\n", - "26 smosh \n", - "27 news \n", - "28 news live \n", - "29 world news \n", + "0 https://i.ytimg.com/vi/SZO8jF9Z6vw/hqdefault.jpg\n", "Name: 51, dtype: object\n", "ssssssssssssssssssssssssssssssssss52ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 beat \n", - "2 saber \n", - "3 vr \n", - "4 gameplay\n", + "0 https://i.ytimg.com/vi/dAKyi8aFq3Y/hqdefault.jpg\n", "Name: 52, dtype: object\n", "ssssssssssssssssssssssssssssssssss53ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 The last hope for my channel...\n", - "2 pewdiepie \n", - "3 pewds \n", - "4 pdp \n", - "5 pewdie \n", - "6 last hope \n", - "7 youtube \n", - "8 youtube channel \n", + "0 https://i.ytimg.com/vi/GskbfPKP35E/hqdefault.jpg\n", "Name: 53, dtype: object\n", "ssssssssssssssssssssssssssssssssss54ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", + "0 https://i.ytimg.com/vi/sVxLiftJGbU/hqdefault.jpg\n", "Name: 54, dtype: object\n", "ssssssssssssssssssssssssssssssssss55ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 oblivion \n", - "2 skyrimn \n", - "3 skyrim \n", - "4 gameplay \n", - "5 funny \n", - "6 moments \n", - "7 compilation\n", - "8 meme \n", - "9 memes \n", + "0 https://i.ytimg.com/vi/0k0fvqikaoE/hqdefault.jpg\n", "Name: 55, dtype: object\n", "ssssssssssssssssssssssssssssssssss56ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 This video is blocked in your country.\n", - "2 video \n", - "3 youtube video \n", - "4 pewdiepie \n", - "5 youtube pewdiepie \n", - "6 this video is blocked \n", - "7 blocked \n", - "8 pewds \n", - "9 pewdie \n", - "10 pdp \n", - "11 article 13 \n", - "12 article 11 \n", - "13 youtube support \n", - "14 india \n", - "15 iisuperwomanii \n", - "16 taking a break \n", + "0 https://i.ytimg.com/vi/x8OCVDCDrDA/hqdefault.jpg\n", "Name: 56, dtype: object\n", "ssssssssssssssssssssssssssssssssss57ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 We made history!!! *again*\n", - "2 We made history! \n", - "3 we \n", - "4 made \n", - "5 history \n", - "6 pewdiepie \n", - "7 pewds \n", - "8 pdp \n", - "9 pewdie \n", - "10 lwaiy \n", - "11 tseries \n", - "12 t-series \n", - "13 lwiay pewdiepie \n", - "14 marzia \n", - "15 markiplier \n", - "16 try not to laugh \n", - "17 we made history again \n", + "0 https://i.ytimg.com/vi/yl3kavXxvHo/hqdefault.jpg\n", "Name: 57, dtype: object\n", "ssssssssssssssssssssssssssssssssss58ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh you lose\n", - "2 challenge \n", + "0 https://i.ytimg.com/vi/Ihbu0aZwkE8/hqdefault.jpg\n", "Name: 58, dtype: object\n", "ssssssssssssssssssssssssssssssssss59ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 A message to Obama\n", - "2 OBAMA \n", - "3 memes \n", - "4 meme \n", - "5 dank memes \n", - "6 memes 2018 \n", - "7 pewdiepie \n", - "8 pewds \n", - "9 pdp \n", - "10 pewdie \n", + "0 https://i.ytimg.com/vi/13viBxojGvA/hqdefault.jpg\n", "Name: 59, dtype: object\n", "ssssssssssssssssssssssssssssssssss60ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 TIKTOK \n", - "2 tik tok \n", - "3 tik tok cringe \n", - "4 tiktok pewdiepie \n", - "5 pewdiepie \n", - "6 pewds \n", - "7 pewdie \n", - "8 pdp \n", - "9 tiktok has gone too far \n", - "10 OK \n", - "11 TIK TOK HAS GONE TOO FAR NOW...\n", - "12 tiktok compilation \n", - "13 tiktok memes \n", - "14 meme \n", - "15 memes \n", - "16 pewdiepie memes \n", - "17 pewdiepie meme \n", - "18 pewdiepie tik tok \n", - "19 tiktok ad \n", - "20 tiktok funny \n", - "21 cringe challenge \n", - "22 cringe \n", - "23 cringe tiktok \n", - "24 funny tiktok videos \n", - "25 musically \n", - "26 musical.ly \n", - "27 tiktok trolls \n", + "0 https://i.ytimg.com/vi/DmSephyJNtQ/hqdefault.jpg\n", "Name: 60, dtype: object\n", "ssssssssssssssssssssssssssssssssss61ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 We made history!\n", - "2 we \n", - "3 made \n", - "4 history \n", - "5 pewdiepie \n", - "6 pewds \n", - "7 pdp \n", - "8 pewdie \n", - "9 lwaiy \n", - "10 tseries \n", - "11 t-series \n", - "12 lwiay pewdiepie \n", - "13 marzia \n", - "14 markiplier \n", - "15 try not to laugh\n", + "0 https://i.ytimg.com/vi/30pPGx0J6FU/hqdefault.jpg\n", "Name: 61, dtype: object\n", "ssssssssssssssssssssssssssssssssss62ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 you laugh you lose\n", - "3 challenge \n", - "Name: 62, dtype: object\n", + "Series([], Name: 62, dtype: object)\n", "ssssssssssssssssssssssssssssssssss63ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 deltarune \n", - "2 delta \n", - "3 rune \n", - "4 undertale \n", - "5 undertale 2 \n", - "6 squel \n", - "7 sequel \n", - "8 prequel \n", - "9 commentary \n", - "10 gameplay \n", - "11 walkthrough \n", - "12 pacifist \n", - "13 delta rune part 1 \n", - "14 chapter 1 \n", - "15 deltarune part 1 \n", - "16 soundtrack \n", - "17 undertale delta \n", - "18 undertale delta rune\n", - "19 delta rune undertale\n", - "20 part 1 \n", - "21 chapter 1 part 1 \n", + "0 https://i.ytimg.com/vi/eIRhXharV7k/hqdefault.jpg\n", "Name: 63, dtype: object\n", "ssssssssssssssssssssssssssssssssss64ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review \n", - "3 ben \n", - "4 shapiro \n", - "5 bonus meme\n", - "6 gnome \n", - "7 obama \n", - "8 elon musk \n", - "9 pikachu \n", - "10 tik tok \n", - "11 tracer \n", + "0 https://i.ytimg.com/vi/2waSmpD1zQg/hqdefault.jpg\n", "Name: 64, dtype: object\n", "ssssssssssssssssssssssssssssssssss65ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 I'm white \n", - "2 im white \n", - "3 im white dr phil \n", - "4 dr phil \n", - "5 im \n", - "6 white \n", - "7 dr phil black white girl \n", - "8 dr phil black girl acts white \n", - "9 dr phil black girl \n", - "10 dr phil full episodes \n", - "11 dr \n", - "12 phil \n", - "13 mom says her daughter \n", - "14 dr phil pewdiepie \n", - "15 dr phil #3 \n", - "16 dr phil 3 \n", - "17 react \n", - "18 pewds \n", - "19 pewdie \n", - "20 pewdiepie \n", - "21 pdp \n", - "22 dr phil destroys \n", - "23 dr phil memes \n", - "24 dr phil meme \n", - "25 dr phil october 2018 \n", - "26 meme \n", - "27 memes \n", - "28 im black \n", - "29 i'm black \n", - "30 im white dr phil full episode \n", - "31 im white dr phil full episodes\n", - "Name: 65, dtype: object\n", + "Series([], Name: 65, dtype: object)\n", "ssssssssssssssssssssssssssssssssss66ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/P4LonC3puS4/hqdefault.jpg\n", "Name: 66, dtype: object\n", "ssssssssssssssssssssssssssssssssss67ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 i need your help...\n", - "2 lwiay \n", - "3 help \n", - "4 pewdiepie \n", - "5 pewds \n", - "6 pdp \n", - "7 pewdie \n", + "0 https://i.ytimg.com/vi/oJdubyyJNIQ/hqdefault.jpg\n", "Name: 67, dtype: object\n", "ssssssssssssssssssssssssssssssssss68ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/UcvCdFfI3bs/hqdefault.jpg\n", "Name: 68, dtype: object\n", "ssssssssssssssssssssssssssssssssss69ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 apology video\n", - "2 my response \n", - "3 pewdiepie \n", - "4 logan paul \n", - "5 laura lee \n", - "6 tmartin \n", + "0 https://i.ytimg.com/vi/_fNZLrz97kg/hqdefault.jpg\n", "Name: 69, dtype: object\n", "ssssssssssssssssssssssssssssssssss70ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 fashion\n", - "2 meme \n", - "3 review \n", + "0 https://i.ytimg.com/vi/1tCbvYv_ibw/hqdefault.jpg\n", "Name: 70, dtype: object\n", "ssssssssssssssssssssssssssssssssss71ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 birds \n", - "2 birds aren't real \n", - "3 birds aren't real youtube \n", - "4 npc meme \n", - "5 npc memes \n", - "6 pewdiepie \n", - "7 pewds \n", - "8 pewdie \n", - "9 memes \n", - "10 meme \n", - "11 meme review \n", - "12 BIRDS. AREN'T. REAL. \n", - "13 review \n", - "14 meme compilation \n", - "15 meme compilation 2018 \n", - "16 everyone we have an announcement to make\n", + "0 https://i.ytimg.com/vi/EZ-im7m8630/hqdefault.jpg\n", "Name: 71, dtype: object\n", "ssssssssssssssssssssssssssssssssss72ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 npc meme \n", - "2 meme \n", - "3 funny \n", - "4 compilation\n", - "5 shane \n", - "6 logan \n", - "7 logan paul \n", - "8 show \n", - "9 youtube \n", - "10 red \n", - "11 youtube red\n", + "0 https://i.ytimg.com/vi/03ahRfkfwME/hqdefault.jpg\n", "Name: 72, dtype: object\n", "ssssssssssssssssssssssssssssssssss73ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 LWIAY\n", + "0 https://i.ytimg.com/vi/h27uLjDOK-M/hqdefault.jpg\n", "Name: 73, dtype: object\n", "ssssssssssssssssssssssssssssssssss74ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme review\n", - "2 spooktober \n", - "3 halloween \n", - "4 bone \n", - "5 skeleton \n", - "6 doot doot \n", - "7 sans \n", + "0 https://i.ytimg.com/vi/8OoLg39nNlo/hqdefault.jpg\n", "Name: 74, dtype: object\n", "ssssssssssssssssssssssssssssssssss75ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 you laugh you lose\n", - "3 challenge \n", - "4 moth \n", - "5 edition \n", - "6 meme \n", + "0 https://i.ytimg.com/vi/DJd0JYaVkqA/hqdefault.jpg\n", "Name: 75, dtype: object\n", "ssssssssssssssssssssssssssssssssss76ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 lwiay \n", - "2 reddit\n", + "0 https://i.ytimg.com/vi/hUXGQwTSfMs/hqdefault.jpg\n", "Name: 76, dtype: object\n", "ssssssssssssssssssssssssssssssssss77ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tseries \n", - "2 t series \n", - "3 diss \n", - "4 track \n", - "5 pewdiepie \n", - "6 song \n", - "7 rap \n", - "8 mixtape \n", - "9 disstrack \n", - "10 diss track \n", - "11 bitch lasagna\n", + "0 https://i.ytimg.com/vi/-zcJ4uB7XUo/hqdefault.jpg\n", "Name: 77, dtype: object\n", "ssssssssssssssssssssssssssssssssss78ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 𝓜𝓸𝓽𝓱 𝓜𝓮𝓶𝓮𝓼 \n", - "2 moth memes \n", - "3 moth meme \n", - "4 moth meme compilation \n", - "5 moth lamp \n", - "6 moth lamp meme compilation\n", - "7 pewdiepie meme review \n", - "8 pewdiepie \n", - "9 pewds \n", - "10 pdp \n", - "11 pewdie \n", - "12 meme review \n", - "13 memes \n", - "14 meme \n", - "15 moth \n", - "16 lamp \n", + "0 https://i.ytimg.com/vi/tQ_9a6UhUQs/hqdefault.jpg\n", "Name: 78, dtype: object\n", "ssssssssssssssssssssssssssssssssss79ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 lwiay \n", - "2 pewdiepie \n", - "3 pewds \n", - "4 pewdie \n", - "5 pewdiepie vs t series \n", - "6 ANNOUNCING ME NEW WEBSITE\n", - "7 website \n", - "8 new website \n", - "9 t series \n", + "0 https://i.ytimg.com/vi/ztwsGeT5lR0/hqdefault.jpg\n", "Name: 79, dtype: object\n", "ssssssssssssssssssssssssssssssssss80ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh you lose\n", - "2 ylyl \n", - "3 try not to \n", - "4 laugh \n", - "5 challenge \n", + "0 https://i.ytimg.com/vi/nOlH-P8-5PI/hqdefault.jpg\n", "Name: 80, dtype: object\n", "ssssssssssssssssssssssssssssssssss81ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 bowsette \n", - "2 meme review\n", + "0 https://i.ytimg.com/vi/BdppFIT_lIs/hqdefault.jpg\n", "Name: 81, dtype: object\n", "ssssssssssssssssssssssssssssssssss82ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pew news \n", - "2 serena williams \n", - "3 t series \n", - "4 youtube \n", - "5 alternative influence\n", + "0 https://i.ytimg.com/vi/7nYkJctgSSA/hqdefault.jpg\n", "Name: 82, dtype: object\n", "ssssssssssssssssssssssssssssssssss83ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 lego \n", - "2 star \n", - "3 wars \n", + "0 https://i.ytimg.com/vi/hZHfdOKFlAw/hqdefault.jpg\n", "Name: 83, dtype: object\n", "ssssssssssssssssssssssssssssssssss84ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 you laugh you lose \n", - "3 YOU LAUGH YOU LOSE \n", - "4 TRY NOT TO LAUGH SUPER HARD EDITION\n", - "5 try not to laugh \n", - "6 try not to laugh challenge \n", - "7 pewdiepie \n", - "8 pewds \n", - "9 pewdie \n", - "10 pdp \n", + "0 https://i.ytimg.com/vi/gYTJrTXaGwA/hqdefault.jpg\n", "Name: 84, dtype: object\n", "ssssssssssssssssssssssssssssssssss85ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", + "0 https://i.ytimg.com/vi/cFTB5EJUxzw/hqdefault.jpg\n", "Name: 85, dtype: object\n", "ssssssssssssssssssssssssssssssssss86ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 gucci \n", - "2 fashion\n", - "3 meme \n", + "0 https://i.ytimg.com/vi/T8EfomTlcfA/hqdefault.jpg\n", "Name: 86, dtype: object\n", "ssssssssssssssssssssssssssssssssss87ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", - "3 THANOS\n", - "4 CAR \n", + "0 https://i.ytimg.com/vi/ww8dRu4_1EY/hqdefault.jpg\n", "Name: 87, dtype: object\n", "ssssssssssssssssssssssssssssssssss88ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 Try Not To Laugh At Other Youtubers Try Not To Laugh Challenge\n", - "2 try not to laugh \n", - "3 try not to laugh challenge \n", - "4 try not to laugh challenge clean \n", - "5 try not to laugh challenge impossible \n", - "6 try not to laugh markiplier \n", - "7 try not to laugh jacksepticeye \n", - "8 try not to laugh pewdiepie edition \n", - "9 try not to laugh memes \n", - "10 memes \n", - "11 meme \n", - "12 funny memes \n", - "13 funny memes try not to laugh \n", - "14 ylyl \n", - "15 you laugh you lose \n", - "16 pewdiepie ylyl \n", - "17 pewdiepie \n", - "18 pewds \n", - "19 pdp \n", - "20 pewdie \n", - "21 tntl \n", - "22 laugh \n", - "23 try not to \n", - "24 markiplier \n", - "25 jacksepticeye \n", + "0 https://i.ytimg.com/vi/Bb896qn7S54/hqdefault.jpg\n", "Name: 88, dtype: object\n", "ssssssssssssssssssssssssssssssssss89ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tumblr \n", - "2 tumblr in action\n", - "3 reddit \n", - "4 reddit review \n", + "0 https://i.ytimg.com/vi/WgnmQk_2yF4/hqdefault.jpg\n", "Name: 89, dtype: object\n", "ssssssssssssssssssssssssssssssssss90ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/mtp0Mu-yj_o/hqdefault.jpg\n", "Name: 90, dtype: object\n", "ssssssssssssssssssssssssssssssssss91ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 YES PAPA \n", - "2 YES PAPA MEME \n", - "3 johny johny yes papa \n", - "4 johnny johnny \n", - "5 johny meme \n", - "6 baby johnny eating sugar\n", - "7 no papa no papa \n", - "8 no papa sugar \n", - "9 meme review \n", - "10 pewdiepie meme review \n", - "11 pewdiepie \n", - "12 pewds \n", - "13 pdp \n", - "14 pewdie \n", - "15 YES PAPA MEME EXPOSED \n", + "0 https://i.ytimg.com/vi/mkKDI6y2kyE/hqdefault.jpg\n", "Name: 91, dtype: object\n", "ssssssssssssssssssssssssssssssssss92ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 lwiay\n", - "Name: 92, dtype: object\n", + "Series([], Name: 92, dtype: object)\n", "ssssssssssssssssssssssssssssssssss93ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 try not to laugh\n", + "0 https://i.ytimg.com/vi/JToPoYip-C4/hqdefault.jpg\n", "Name: 93, dtype: object\n", "ssssssssssssssssssssssssssssssssss94ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 episode 1 \n", - "2 gameplay \n", - "3 wlaking \n", - "4 walking dead\n", - "5 final \n", - "6 season \n", - "7 last \n", + "0 https://i.ytimg.com/vi/AgRHEGB8Urs/hqdefault.jpg\n", "Name: 94, dtype: object\n", "ssssssssssssssssssssssssssssssssss95ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pew news\n", - "2 ksi \n", - "3 ninja \n", - "4 female \n", - "5 streamer\n", + "0 https://i.ytimg.com/vi/SRCToEkq7to/hqdefault.jpg\n", "Name: 95, dtype: object\n", "ssssssssssssssssssssssssssssssssss96ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 ylyl \n", - "2 laugh\n", - "3 lose \n", + "0 https://i.ytimg.com/vi/A6EIl677ntQ/hqdefault.jpg\n", "Name: 96, dtype: object\n", "ssssssssssssssssssssssssssssssssss97ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pubg \n", - "2 player unknown\n", - "3 squads \n", + "0 https://i.ytimg.com/vi/4HD5rCNYxng/hqdefault.jpg\n", "Name: 97, dtype: object\n", "ssssssssssssssssssssssssssssssssss98ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 98, dtype: object\n", + "Series([], Name: 98, dtype: object)\n", "ssssssssssssssssssssssssssssssssss99ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 What drinking her juice ACTUALLY gives you \n", - "2 jilly juice \n", - "3 dr phil \n", - "4 dr phil 2018 \n", - "5 dr phil jilly juice \n", - "6 dr phil jilly juice reaction \n", - "7 dr phil pewdiepie \n", - "8 pewdiepie dr phil \n", - "9 pewdiepie dr phil eminem \n", - "10 15 YEAR OLD CRIES OVER NOT GETTING $231 \n", - "11 dr phil 1 \n", - "12 dr phil 15 year old \n", - "13 LOGAN PAULS SISTER WANTS TO DO YOUTUBE - Dr Phil #2\n", - "14 YOUTUBER GOES ON DR PHIL. \n", - "15 dr phil playlist \n", - "16 pewdiepie \n", - "17 pewds \n", - "18 pdp \n", - "19 pewdie \n", - "20 juice \n", - "21 comedy \n", - "22 reaction \n", - "23 entertainment \n", - "24 jilly \n", + "0 https://i.ytimg.com/vi/hnc3bGtYQsQ/hqdefault.jpg\n", "Name: 99, dtype: object\n", "ssssssssssssssssssssssssssssssssss100ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh \n", - "2 you lose \n", - "3 try not to laugh challenge\n", - "4 challenge \n", - "5 try not to \n", + "0 https://i.ytimg.com/vi/cva2sxX5PgM/hqdefault.jpg\n", "Name: 100, dtype: object\n", "ssssssssssssssssssssssssssssssssss101ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 THAT TOTALLY HAPPENED.\n", - "2 that \n", - "3 totally \n", - "4 happened \n", - "5 /r thathappened \n", - "6 thathappened \n", - "7 redit \n", - "8 reddit \n", - "9 thathappened redit \n", - "10 pewdiepie \n", - "11 reddit review \n", - "12 reddit reaction \n", - "13 reddit cringe \n", - "14 cringe \n", - "15 reddit pewdiepie \n", - "16 pewds \n", - "17 pewdie \n", - "18 pdp \n", - "19 /r \n", - "20 meme \n", - "21 memes \n", - "22 meme review \n", + "0 https://i.ytimg.com/vi/cDOlBRzHRI0/hqdefault.jpg\n", "Name: 101, dtype: object\n", "ssssssssssssssssssssssssssssssssss102ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/Mxdze0Wo91U/hqdefault.jpg\n", "Name: 102, dtype: object\n", "ssssssssssssssssssssssssssssssssss103ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/YH_rnTjnWfg/hqdefault.jpg\n", "Name: 103, dtype: object\n", "ssssssssssssssssssssssssssssssssss104ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 TRY NOT TO LAUGH / EPISODE 1 / NEW SERIES\n", - "2 ylyl \n", - "3 you laugh \n", - "4 you lose \n", - "5 you laugh you lose \n", - "6 you laugh you lose challenge \n", - "7 pewdiepie ylyl \n", - "8 pewdiepie ylyl 1 \n", - "9 try not to laugh \n", - "10 try not to laugh challenge \n", - "11 try not to laugh challenge episode 1 \n", - "12 new series \n", - "13 pewdiepie series \n", - "14 pewds \n", - "15 pewdie \n", - "16 pdp \n", - "17 try not to laugh clean \n", - "18 skrattar du \n", - "19 skrattar du förlorar du \n", - "20 TNTL \n", - "21 tntl clean \n", - "Name: 104, dtype: object\n", + "Series([], Name: 104, dtype: object)\n", "ssssssssssssssssssssssssssssssssss105ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 detroit \n", - "2 detroit become human \n", - "3 detroit become human gameplay\n", - "4 gameplay detroit become human\n", - "5 gameplay \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 full \n", - "9 commentary \n", + "0 https://i.ytimg.com/vi/WFRBxz6AeZI/hqdefault.jpg\n", "Name: 105, dtype: object\n", "ssssssssssssssssssssssssssssssssss106ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 detroit \n", - "2 detroit become human \n", - "3 detroit become human gameplay\n", - "4 gameplay detroit become human\n", - "5 gameplay \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 full \n", - "9 commentary \n", + "0 https://i.ytimg.com/vi/7yuPVq9DtV0/hqdefault.jpg\n", "Name: 106, dtype: object\n", "ssssssssssssssssssssssssssssssssss107ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 detroit become human \n", - "2 detroit \n", - "3 detroit become human gameplay\n", - "4 gameplay detroit become human\n", - "5 gameplay \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 full \n", - "9 commentary \n", + "0 https://i.ytimg.com/vi/vYP6GdsEmg0/hqdefault.jpg\n", "Name: 107, dtype: object\n", "ssssssssssssssssssssssssssssssssss108ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 detroit \n", - "2 detroit become human \n", - "3 detroit become human gameplay\n", - "4 gameplay detroit become human\n", - "5 gameplay \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 full \n", - "9 commentary \n", + "0 https://i.ytimg.com/vi/7k4GbHQNmQo/hqdefault.jpg\n", "Name: 108, dtype: object\n", "ssssssssssssssssssssssssssssssssss109ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 My fans have turned against me...\n", - "2 pewdiepie \n", - "3 pewds \n", - "4 pdp \n", - "5 pewdie \n", - "6 pewdiepie fans \n", - "7 lwiay \n", - "8 pewdiepie lwaiy \n", + "0 https://i.ytimg.com/vi/o_CSmob64uU/hqdefault.jpg\n", "Name: 109, dtype: object\n", "ssssssssssssssssssssssssssssssssss110ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/o8Je7hPgsdU/hqdefault.jpg\n", "Name: 110, dtype: object\n", "ssssssssssssssssssssssssssssssssss111ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh you lose\n", + "0 https://i.ytimg.com/vi/iDFjTrl7J8w/hqdefault.jpg\n", "Name: 111, dtype: object\n", "ssssssssssssssssssssssssssssssssss112ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 fouseytube \n", - "2 drake \n", - "3 drake july 2018 \n", - "4 pewdiepie \n", - "5 pewds \n", - "6 pewdie \n", - "7 pdp \n", - "8 fouseytube drake \n", - "9 dj khaled \n", - "10 djkhaled drake \n", - "11 dj khaled fouseytube\n", - "12 drake concert live \n", - "13 drake concert \n", - "14 concert \n", - "15 new drake \n", - "16 lil \n", - "17 lil rapper \n", - "18 rapper \n", - "19 lil rappers \n", - "20 6ix9ine \n", - "21 tekashi69 \n", - "22 6ix9ine pewdiepie \n", - "23 tekashi \n", - "24 drake pewdiepie \n", - "Name: 112, dtype: object\n", + "Series([], Name: 112, dtype: object)\n", "ssssssssssssssssssssssssssssssssss113ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 the dobre \n", - "2 dobre \n", - "3 dobre brothers \n", - "4 dobre twins \n", - "5 dobre brothers song \n", - "6 dobre brothers pranks\n", - "7 prank \n", - "8 pranks \n", - "9 slime \n", + "0 https://i.ytimg.com/vi/q2CBNLsQbCM/hqdefault.jpg\n", "Name: 113, dtype: object\n", "ssssssssssssssssssssssssssssssssss114ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 Tekashi 6ix9ine \n", - "2 Tekashi \n", - "3 6ix9ine \n", - "4 Tekashi 6ix9ine saved by polite cat\n", - "5 six nine \n", - "6 six \n", - "7 nine \n", - "8 69 \n", - "9 tekashi69 \n", - "10 pewdiepie \n", - "11 pewds \n", - "12 pdp \n", - "13 pewdie \n", - "14 meme review \n", - "15 6ix9ine pewdiepie \n", - "16 six nine pewdiepie \n", - "17 tekashi69 pewdiepie \n", - "18 polite cat \n", - "19 cat meme \n", - "20 cat memes \n", - "21 cats \n", - "22 cat \n", - "23 memes \n", - "24 meme compilation \n", + "0 https://i.ytimg.com/vi/jEYQqLtK_Xw/hqdefault.jpg\n", "Name: 114, dtype: object\n", "ssssssssssssssssssssssssssssssssss115ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 funny memes\n", - "2 meme \n", - "3 memes \n", - "4 curb \n", - "5 compilation\n", + "0 https://i.ytimg.com/vi/k66FoY5ndfI/hqdefault.jpg\n", "Name: 115, dtype: object\n", "ssssssssssssssssssssssssssssssssss116ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pew news \n", - "2 youtube \n", - "3 vox media \n", - "4 elon musk \n", - "5 thai \n", - "6 hank green \n", - "7 jessica price\n", - "8 guild wars 2 \n", - "9 media \n", + "0 https://i.ytimg.com/vi/WbW0rHCX2UU/hqdefault.jpg\n", "Name: 116, dtype: object\n", "ssssssssssssssssssssssssssssssssss117ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 one hand clapping\n", + "0 https://i.ytimg.com/vi/2YoUqR9fuA4/hqdefault.jpg\n", "Name: 117, dtype: object\n", "ssssssssssssssssssssssssssssssssss118ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 IS THIS LIVE...? \n", - "2 YOU \n", - "3 CRINGE \n", - "4 LOSE \n", - "5 you cringe \n", - "6 cringe \n", - "7 you cringe you lose \n", - "8 you cringe you lose pewdiepie\n", - "9 cringe comp \n", - "10 cringe compilation \n", - "11 cringe compilation 2018 \n", - "12 cringe compilations \n", - "13 media cringe \n", - "14 news \n", - "15 news cringe \n", - "16 news cringe reaction \n", - "17 news cringe moments \n", - "18 pewdiepie \n", - "19 pewds \n", - "20 pewdie \n", - "21 pdp \n", - "22 cringe moments \n", - "23 cringe moments on tv \n", - "24 pewdiepie cringe \n", + "0 https://i.ytimg.com/vi/Sr0fZ298eM8/hqdefault.jpg\n", "Name: 118, dtype: object\n", "ssssssssssssssssssssssssssssssssss119ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 all woman\n", - "2 are \n", - "3 queen \n", + "0 https://i.ytimg.com/vi/_umr17a_AdQ/hqdefault.jpg\n", "Name: 119, dtype: object\n", "ssssssssssssssssssssssssssssssssss120ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review \n", - "3 slaps hand on car\n", - "4 car salesman meme\n", + "0 https://i.ytimg.com/vi/XQjyjn3MdxM/hqdefault.jpg\n", "Name: 120, dtype: object\n", "ssssssssssssssssssssssssssssssssss121ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 WHAT THE MEDIA DOESNT TELL YOU ABOUT PEWDIEPIE\n", - "2 pewdiepie \n", - "3 pewds \n", - "4 pewdie \n", - "5 pdp \n", - "6 media \n", - "7 pewdiepie media \n", - "8 pewdiepie wsj \n", - "9 pewdiepie scandal \n", - "Name: 121, dtype: object\n", + "Series([], Name: 121, dtype: object)\n", "ssssssssssssssssssssssssssssssssss122ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 twitch \n", - "2 twitch victims \n", - "3 twitch fails \n", - "4 twitch fails 2018 \n", - "5 twitch girls comp \n", - "6 twitch girls 2018 \n", - "7 twitch gone wrong \n", - "8 twitch compilation\n", - "9 pewdie \n", - "10 pewdiepie \n", - "11 pewds \n", - "12 pdp \n", + "0 https://i.ytimg.com/vi/m3Xf1ra2Ekg/hqdefault.jpg\n", "Name: 122, dtype: object\n", "ssssssssssssssssssssssssssssssssss123ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh you lose \n", - "2 ylyl \n", - "3 skrattar du förlorar du\n", + "0 https://i.ytimg.com/vi/DYsCJEfQh1U/hqdefault.jpg\n", "Name: 123, dtype: object\n", "ssssssssssssssssssssssssssssssssss124ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 news \n", - "2 tana' \n", - "3 tana \n", - "4 mongeau'\n", - "5 tanacon \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ + "0 https://i.ytimg.com/vi/PK-GvWWQ03g/hqdefault.jpg\n", "Name: 124, dtype: object\n", "ssssssssssssssssssssssssssssssssss125ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 vlog \n", - "2 summer\n", - "3 idk \n", + "0 https://i.ytimg.com/vi/vHab6BNrHU8/hqdefault.jpg\n", "Name: 125, dtype: object\n", "ssssssssssssssssssssssssssssssssss126ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 eu \n", - "2 ban \n", - "3 memes\n", - "4 not \n", - "5 cool \n", - "6 guys \n", + "0 https://i.ytimg.com/vi/JKfFCVPjo_g/hqdefault.jpg\n", "Name: 126, dtype: object\n", "ssssssssssssssssssssssssssssssssss127ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 joke \n", - "2 over \n", - "3 head \n", + "0 https://i.ytimg.com/vi/__d5Q6IF1Sg/hqdefault.jpg\n", "Name: 127, dtype: object\n", "ssssssssssssssssssssssssssssssssss128ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 tanacon \n", - "2 tana mongeau w \n", - "3 tana mongeau \n", - "4 gaming disorder \n", - "5 gaming \n", - "6 disorder \n", - "7 gaming disorder 2018 \n", - "8 gaming disorder video\n", - "9 pewdiepie \n", - "10 pewds \n", - "11 pdp \n", - "12 pewdie \n", - "13 pew news \n", + "0 https://i.ytimg.com/vi/oLBqixxgd6Y/hqdefault.jpg\n", "Name: 128, dtype: object\n", "ssssssssssssssssssssssssssssssssss129ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 ylyl \n", - "2 skratta \n", - "3 skrattar \n", - "4 you laugh \n", - "5 you lose \n", - "6 you laugh you lose\n", - "7 try not to laugh \n", - "8 challenge \n", - "9 YOU LAUGH YOU SAD \n", + "0 https://i.ytimg.com/vi/X2bUUkWC7dE/hqdefault.jpg\n", "Name: 129, dtype: object\n", "ssssssssssssssssssssssssssssssssss130ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 LWIAY\n", - "Name: 130, dtype: object\n", + "Series([], Name: 130, dtype: object)\n", "ssssssssssssssssssssssssssssssssss131ssssssssssssssssssssssssssssssssss\n", - "0 meme \n", - "1 review \n", - "2 youtubes\n", - "3 favorite\n", - "4 show \n", + "0 https://i.ytimg.com/vi/szPjXJeIGP8/hqdefault.jpg\n", "Name: 131, dtype: object\n", "ssssssssssssssssssssssssssssssssss132ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", + "0 https://i.ytimg.com/vi/eEHBjP06WSI/hqdefault.jpg\n", "Name: 132, dtype: object\n", "ssssssssssssssssssssssssssssssssss133ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 MEMES COULD GET BANNED (NEEDS HELP ASAP SEND TO ALL YOUR FRIENDS AND FAMILY)\n", - "2 memes \n", - "3 meme \n", - "4 memes banned \n", - "5 memes ban \n", - "6 ban \n", - "7 pewds \n", - "8 pewdie \n", - "9 pewdiepie \n", - "10 pew news \n", - "11 news \n", - "12 pew \n", + "0 https://i.ytimg.com/vi/epgHrLszj-Q/hqdefault.jpg\n", "Name: 133, dtype: object\n", "ssssssssssssssssssssssssssssssssss134ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 lil \n", - "2 tay \n", - "3 lil tay \n", - "4 pewdiepie\n", - "5 pewdie \n", - "6 pdp \n", - "7 pewds \n", + "0 https://i.ytimg.com/vi/t3ppxtEU6No/hqdefault.jpg\n", "Name: 134, dtype: object\n", "ssssssssssssssssssssssssssssssssss135ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 flex seal \n", - "2 flex spray \n", - "3 flex commercial\n", - "4 tape commercial\n", - "5 commercial \n", + "0 https://i.ytimg.com/vi/yd62ObxkV44/hqdefault.jpg\n", "Name: 135, dtype: object\n", "ssssssssssssssssssssssssssssssssss136ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 dr phil \n", - "2 logan paul\n", - "3 psycho \n", - "4 youtuber \n", + "0 https://i.ytimg.com/vi/AkiC0_09Zss/hqdefault.jpg\n", "Name: 136, dtype: object\n", "ssssssssssssssssssssssssssssssssss137ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 fridays \n", - "2 with \n", - "3 pewdiepie \n", - "4 fridays with pewdiepie\n", - "5 lwiay \n", + "0 https://i.ytimg.com/vi/Xz5XIHrT4LQ/hqdefault.jpg\n", "Name: 137, dtype: object\n", "ssssssssssssssssssssssssssssssssss138ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 YLYL \n", - "2 SKRATTA DU \n", - "3 TRY NOT TO LAUGH\n", - "4 CHALLENGE \n", + "0 https://i.ytimg.com/vi/_lsDECLUt3k/hqdefault.jpg\n", "Name: 138, dtype: object\n", "ssssssssssssssssssssssssssssssssss139ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pewdiepie \n", - "2 react \n", - "3 world \n", - "4 dr phil \n", - "5 spoiled \n", - "6 brat \n", - "7 beverly hills\n", - "8 girl \n", - "9 15 \n", + "0 https://i.ytimg.com/vi/iBsg75W2Vig/hqdefault.jpg\n", "Name: 139, dtype: object\n", "ssssssssssssssssssssssssssssssssss140ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 pew news \n", - "2 news \n", - "3 pewdiepie\n", + "0 https://i.ytimg.com/vi/sUtkJUJuq2U/hqdefault.jpg\n", "Name: 140, dtype: object\n", "ssssssssssssssssssssssssssssssssss141ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "1 lwiay\n", + "0 https://i.ytimg.com/vi/YzhLEjUD8hk/hqdefault.jpg\n", "Name: 141, dtype: object\n", "ssssssssssssssssssssssssssssssssss142ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 baldis basics \n", - "2 baldi's basics in education and learning \n", - "3 baldi's basics in education and learning secrets \n", - "4 baldis basics gameplay \n", - "5 baldis basics game \n", - "6 baldi's basics \n", - "7 baldis classroom \n", - "8 baldis education \n", - "9 baldis education and learning \n", - "10 baldis \n", - "11 basics \n", - "12 BALDIS BASICS IS THE SPOOKIEST GAME IN THE HISTORY OF THE WORLD AND UNIVERSE\n", - "13 baldis basics scary \n", - "14 baldis basics speedrun \n", - "15 pewds \n", - "16 pewdiepie \n", - "17 pewdie \n", - "18 pdp \n", - "19 baldi pewdiepie \n", - "Name: 142, dtype: object\n", - "ssssssssssssssssssssssssssssssssss143ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 media \n", - "2 vice \n", - "3 news \n", - "4 article \n", - "5 pewdiepie\n", - "Name: 143, dtype: object\n", - "ssssssssssssssssssssssssssssssssss144ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 meme \n", - "2 review\n", - "Name: 144, dtype: object\n", - "ssssssssssssssssssssssssssssssssss145ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 humble \n", - "2 humble brag \n", - "3 bragging \n", - "4 youtuber \n", - "5 humble youtubers\n", - "6 youtubers humble\n", - "7 rich youtubers \n", - "8 rich youtube \n", - "Name: 145, dtype: object\n", - "ssssssssssssssssssssssssssssssssss146ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 lwiay \n", - "2 pewds \n", - "3 pewdie \n", - "4 pewdiepie\n", - "5 pdp \n", - "Name: 146, dtype: object\n", - "ssssssssssssssssssssssssssssssssss147ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 book \n", - "2 review \n", - "3 literature\n", - "4 club \n", - "Name: 147, dtype: object\n", - "ssssssssssssssssssssssssssssssssss148ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 fortnite\n", - "2 cringe \n", - "3 ali a \n", - "4 ninja \n", - "5 summit \n", - "Name: 148, dtype: object\n", - "ssssssssssssssssssssssssssssssssss149ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 sleep \n", - "2 challenge\n", - "3 horror \n", - "4 video \n", - "5 game \n", - "6 play \n", - "Name: 149, dtype: object\n", - "ssssssssssssssssssssssssssssssssss150ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 TESTING OUT EYETRACKING \n", - "2 eyetracker \n", - "3 eye tracking \n", - "4 eye tracker \n", - "5 tobii \n", - "6 tobii eye tracker \n", - "7 tobii eye tracking \n", - "8 tobii review \n", - "9 tobii eye tracker review\n", - "10 pewdiepie \n", - "11 pewds \n", - "12 pewdie \n", - "13 pdp \n", - "14 tracker \n", - "15 eye \n", - "16 tracking \n", - "Name: 150, dtype: object\n", - "ssssssssssssssssssssssssssssssssss151ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE \n", - "1 you laugh you lose\n", - "2 ylyl \n", - "3 india \n", - "4 indian \n", - "5 meme \n", - "6 comedy \n", - "Name: 151, dtype: object\n", - "ssssssssssssssssssssssssssssssssss152ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 152, dtype: object\n", - "ssssssssssssssssssssssssssssssssss153ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 meme review\n", - "2 savage \n", - "3 patrick \n", - "4 fortnite \n", - "5 pubg \n", - "6 meme \n", - "7 memes \n", - "8 spongebob \n", - "Name: 153, dtype: object\n", - "ssssssssssssssssssssssssssssssssss154ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 trapland\n", - "Name: 154, dtype: object\n", - "ssssssssssssssssssssssssssssssssss155ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 155, dtype: object\n", - "ssssssssssssssssssssssssssssssssss156ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 156, dtype: object\n", - "ssssssssssssssssssssssssssssssssss157ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 157, dtype: object\n", - "ssssssssssssssssssssssssssssssssss158ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pew \n", - "2 news \n", - "Name: 158, dtype: object\n", - "ssssssssssssssssssssssssssssssssss159ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 trap adventure 2\n", - "Name: 159, dtype: object\n", - "ssssssssssssssssssssssssssssssssss160ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 lwiay \n", - "Name: 160, dtype: object\n", - "ssssssssssssssssssssssssssssssssss161ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 hmmm \n", - "Name: 161, dtype: object\n", - "ssssssssssssssssssssssssssssssssss162ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 162, dtype: object\n", - "ssssssssssssssssssssssssssssssssss163ssssssssssssssssssssssssssssssssss\n", - "0 party in backyard\n", - "1 hej monika \n", - "2 monika \n", - "3 monica \n", - "4 song \n", - "5 pewdiepie \n", - "6 sing \n", - "7 singing \n", - "Name: 163, dtype: object\n", - "ssssssssssssssssssssssssssssssssss164ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 trap adventure 2 \n", - "2 rage \n", - "3 quit \n", - "4 game \n", - "5 videogame \n", - "6 trap \n", - "7 adventure \n", - "8 free download \n", - "9 link \n", - "10 trap adventure download \n", - "11 trap adventure 2 download \n", - "12 trap adventure 2 free download\n", - "Name: 164, dtype: object\n", - "ssssssssssssssssssssssssssssssssss165ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 165, dtype: object\n", - "ssssssssssssssssssssssssssssssssss166ssssssssssssssssssssssssssssssssss\n", - "0 vr chat\n", - "Name: 166, dtype: object\n", - "ssssssssssssssssssssssssssssssssss167ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 jacksfilms\n", - "2 lwiay \n", - "3 yiay \n", - "Name: 167, dtype: object\n", - "ssssssssssssssssssssssssssssssssss168ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepi\n", - "1 indian \n", - "2 meme \n", - "Name: 168, dtype: object\n", - "ssssssssssssssssssssssssssssssssss169ssssssssssssssssssssssssssssssssss\n", - "0 ylyl\n", - "Name: 169, dtype: object\n", - "ssssssssssssssssssssssssssssssssss170ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 mad lad \n", - "Name: 170, dtype: object\n", - "ssssssssssssssssssssssssssssssssss171ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "Name: 171, dtype: object\n", - "ssssssssssssssssssssssssssssssssss172ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 you laugh you lose\n", - "2 ylyl \n", - "3 laugh \n", - "4 lose \n", - "Name: 172, dtype: object\n", - "ssssssssssssssssssssssssssssssssss173ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 nice guy \n", - "2 nice guys\n", - "3 reddit \n", - "Name: 173, dtype: object\n", - "ssssssssssssssssssssssssssssssssss174ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 vine \n", - "2 instagram\n", - "Name: 174, dtype: object\n", - "ssssssssssssssssssssssssssssssssss175ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 vr \n", - "2 vr chat \n", - "3 la noire\n", - "4 vr cases\n", - "Name: 175, dtype: object\n", - "ssssssssssssssssssssssssssssssssss176ssssssssssssssssssssssssssssssssss\n", - "0 im14thisisdeep \n", - "1 im 14 this is deep\n", - "2 this is deep \n", - "3 this is so deep \n", - "Name: 176, dtype: object\n", - "ssssssssssssssssssssssssssssssssss177ssssssssssssssssssssssssssssssssss\n", - "0 rick \n", - "1 and morty \n", - "2 rick and morty\n", - "Name: 177, dtype: object\n", - "ssssssssssssssssssssssssssssssssss178ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 the impossible quiz\n", - "Name: 178, dtype: object\n", - "ssssssssssssssssssssssssssssssssss179ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 179, dtype: object\n", - "ssssssssssssssssssssssssssssssssss180ssssssssssssssssssssssssssssssssss\n", - "0 YLYL\n", - "Name: 180, dtype: object\n", - "ssssssssssssssssssssssssssssssssss181ssssssssssssssssssssssssssssssssss\n", - "0 To the moon \n", - "1 sequel \n", - "2 finding paradise\n", - "3 paradice \n", - "4 walkthrough \n", - "5 playthrough \n", - "6 lets play \n", - "7 pewdiepie \n", - "8 part 1 \n", - "Name: 181, dtype: object\n", - "ssssssssssssssssssssssssssssssssss182ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 zootopia \n", - "2 doki doki literature club\n", - "3 doki doki \n", - "4 meme review \n", - "5 meme \n", - "6 death stranding \n", - "Name: 182, dtype: object\n", - "ssssssssssssssssssssssssssssssssss183ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 183, dtype: object\n", - "ssssssssssssssssssssssssssssssssss184ssssssssssssssssssssssssssssssssss\n", - "0 ylyl\n", - "Name: 184, dtype: object\n", - "ssssssssssssssssssssssssssssssssss185ssssssssssssssssssssssssssssssssss\n", - "0 doki doki \n", - "1 literature \n", - "2 club \n", - "3 litterature\n", - "4 part 1 \n", - "Name: 185, dtype: object\n", - "ssssssssssssssssssssssssssssssssss186ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 186, dtype: object\n", - "ssssssssssssssssssssssssssssssssss187ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 187, dtype: object\n", - "ssssssssssssssssssssssssssssssssss188ssssssssssssssssssssssssssssssssss\n", - "0 getting over it \n", - "1 walkthrough \n", - "2 playthrough \n", - "3 get over it \n", - "4 hiking \n", - "5 hammer \n", - "6 climb \n", - "7 climb game \n", - "8 clop \n", - "9 qwop \n", - "10 funny game \n", - "11 getting over it part 1\n", - "12 tutorial \n", - "13 full \n", - "Name: 188, dtype: object\n", - "ssssssssssssssssssssssssssssssssss189ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 189, dtype: object\n", - "ssssssssssssssssssssssssssssssssss190ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 190, dtype: object\n", - "ssssssssssssssssssssssssssssssssss191ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 191, dtype: object\n", - "ssssssssssssssssssssssssssssssssss192ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 192, dtype: object\n", - "ssssssssssssssssssssssssssssssssss193ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 jacksepticeye\n", - "2 whiskey \n", - "3 irish \n", - "4 review \n", - "Name: 193, dtype: object\n", - "ssssssssssssssssssssssssssssssssss194ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 194, dtype: object\n", - "ssssssssssssssssssssssssssssssssss195ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 south park \n", - "2 the fractured \n", - "3 but whole \n", - "4 south park game\n", - "5 sequel \n", - "6 new \n", - "7 gameplay \n", - "8 walkthrough \n", - "9 part 1 \n", - "10 full game \n", - "Name: 195, dtype: object\n", - "ssssssssssssssssssssssssssssssssss196ssssssssssssssssssssssssssssssssss\n", - "0 lwiay \n", - "1 reddit\n", - "Name: 196, dtype: object\n", - "ssssssssssssssssssssssssssssssssss197ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "Name: 197, dtype: object\n", - "ssssssssssssssssssssssssssssssssss198ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 198, dtype: object\n", - "ssssssssssssssssssssssssssssssssss199ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 pewd \n", - "4 pewdiepie cooking \n", - "5 cooking \n", - "6 how to \n", - "7 how to cook \n", - "8 how to cook meatballs \n", - "9 meatballs \n", - "10 meat balls \n", - "11 how to cook meatballs in a pan\n", - "12 how to cook meatballs in sauce\n", - "13 meatballs recipe \n", - "14 meatballs recipe tasty \n", - "15 tasty \n", - "16 recipe \n", - "17 best recipe \n", - "18 how to make \n", - "19 how to make meatballs \n", - "20 cook \n", - "21 homemade \n", - "Name: 199, dtype: object\n", - "ssssssssssssssssssssssssssssssssss200ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 200, dtype: object\n", - "ssssssssssssssssssssssssssssssssss201ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 201, dtype: object\n", - "ssssssssssssssssssssssssssssssssss202ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 202, dtype: object\n", - "ssssssssssssssssssssssssssssssssss203ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 anime \n", - "2 myanimelist \n", - "3 favourite anime\n", - "Name: 203, dtype: object\n", - "ssssssssssssssssssssssssssssssssss204ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 you \n", - "2 laugh \n", - "3 lose \n", - "4 challenge\n", - "Name: 204, dtype: object\n", - "ssssssssssssssssssssssssssssssssss205ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 205, dtype: object\n", - "ssssssssssssssssssssssssssssssssss206ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 whiskey \n", - "2 japanese\n", - "3 review \n", - "Name: 206, dtype: object\n", - "ssssssssssssssssssssssssssssssssss207ssssssssssssssssssssssssssssssssss\n", - "0 ylyl \n", - "1 you laugh you lose\n", - "2 try not to laugh \n", - "3 challenge \n", - "Name: 207, dtype: object\n", - "ssssssssssssssssssssssssssssssssss208ssssssssssssssssssssssssssssssssss\n", - "0 hardest\n", - "1 game \n", - "2 ever \n", - "Name: 208, dtype: object\n", - "ssssssssssssssssssssssssssssssssss209ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 how to \n", - "2 get started\n", - "3 youtube \n", - "4 youtuber \n", - "Name: 209, dtype: object\n", - "ssssssssssssssssssssssssssssssssss210ssssssssssssssssssssssssssssssssss\n", - "0 drawing \n", - "1 youtuber \n", - "2 youtubers\n", - "Name: 210, dtype: object\n", - "ssssssssssssssssssssssssssssssssss211ssssssssssssssssssssssssssssssssss\n", - "0 Pewdiepie\n", - "1 would \n", - "2 you \n", - "3 rather \n", - "Name: 211, dtype: object\n", - "ssssssssssssssssssssssssssssssssss212ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 stream \n", - "2 twitch \n", - "3 fail \n", - "4 fails \n", - "Name: 212, dtype: object\n", - "ssssssssssssssssssssssssssssssssss213ssssssssssssssssssssssssssssssssss\n", - "0 you \n", - "1 laugh \n", - "2 you lose\n", - "Name: 213, dtype: object\n", - "ssssssssssssssssssssssssssssssssss214ssssssssssssssssssssssssssssssssss\n", - "0 Pewdiepie\n", - "1 Jake \n", - "2 Logan \n", - "3 Paul \n", - "4 Team 10 \n", - "5 Dab \n", - "Name: 214, dtype: object\n", - "ssssssssssssssssssssssssssssssssss215ssssssssssssssssssssssssssssssssss\n", - "0 wormax.io\n", - "1 wormax \n", - "2 snake \n", - "3 game \n", - "4 online \n", - "5 free \n", - "Name: 215, dtype: object\n", - "ssssssssssssssssssssssssssssssssss216ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 respect \n", - "2 women \n", - "3 piers morgan \n", - "4 good morning britain\n", - "Name: 216, dtype: object\n", - "ssssssssssssssssssssssssssssssssss217ssssssssssssssssssssssssssssssssss\n", - "0 women \n", - "1 bbc \n", - "2 bbc 3\n", - "Name: 217, dtype: object\n", - "ssssssssssssssssssssssssssssssssss218ssssssssssssssssssssssssssssssssss\n", - "0 fridays \n", - "1 with \n", - "2 pewdiepie\n", - "Name: 218, dtype: object\n", - "ssssssssssssssssssssssssssssssssss219ssssssssssssssssssssssssssssssssss\n", - "0 5 weird \n", - "1 stuff \n", - "2 5 weird stuff online\n", - "Name: 219, dtype: object\n", - "ssssssssssssssssssssssssssssssssss220ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 katy perry\n", - "Name: 220, dtype: object\n", - "ssssssssssssssssssssssssssssssssss221ssssssssssssssssssssssssssssssssss\n", - "0 reacting \n", - "1 fridays \n", - "2 with pewdiepie \n", - "3 fridays with pewdiepie\n", - "4 react \n", - "5 fan submission \n", - "6 fan \n", - "7 fans \n", - "Name: 221, dtype: object\n", - "ssssssssssssssssssssssssssssssssss222ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 YOU LAUGH YOU'RE OUT \n", - "2 you laugh \n", - "3 you laugh you \n", - "4 you laugh you're \n", - "5 you laugh lose \n", - "6 you laugh you lose pewdiepie\n", - "7 laugh \n", - "8 lose \n", - "9 laugh lose \n", - "Name: 222, dtype: object\n", - "ssssssssssssssssssssssssssssssssss223ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 oblivion \n", - "2 elder scrolls\n", - "Name: 223, dtype: object\n", - "ssssssssssssssssssssssssssssssssss224ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 REACTING TO CRINGEY SPEED RUNS\n", - "2 cringe compilation \n", - "3 cringe compilation 2017 \n", - "4 speed runs \n", - "5 speed run \n", - "6 cringe \n", - "7 cringe reaction \n", - "8 reaction \n", - "9 cringe react \n", - "10 reacting to cringe \n", - "11 cringy reaction \n", - "12 cringey \n", - "13 reacting to cringey videos \n", - "14 cringey speed runs \n", - "15 speed \n", - "16 run \n", - "17 pewdiepie reaction \n", - "18 react \n", - "Name: 224, dtype: object\n", - "ssssssssssssssssssssssssssssssssss225ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 the rich life of pewdiepie \n", - "2 before he was famous \n", - "3 before he was famous pewdiepie \n", - "4 pewdiepie rich \n", - "5 pewdiepie net worth \n", - "6 how much money does pewdiepie make\n", - "7 how much money \n", - "8 youtube money \n", - "9 money \n", - "10 net worth \n", - "11 networth \n", - "12 rich \n", - "13 rich life \n", - "14 the rich life \n", - "Name: 225, dtype: object\n", - "ssssssssssssssssssssssssssssssssss226ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 react \n", - "2 react world\n", - "3 greenscreen\n", - "4 competition\n", - "Name: 226, dtype: object\n", - "ssssssssssssssssssssssssssssssssss227ssssssssssssssssssssssssssssssssss\n", - "0 moral \n", - "1 moral machine\n", - "Name: 227, dtype: object\n", - "ssssssssssssssssssssssssssssssssss228ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 e3 \n", - "2 react \n", - "3 react world\n", - "Name: 228, dtype: object\n", - "ssssssssssssssssssssssssssssssssss229ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 jake \n", - "2 paul \n", - "3 it's \n", - "4 everyday \n", - "5 bro \n", - "6 react \n", - "7 react world\n", - "8 fine \n", - "Name: 229, dtype: object\n", - "ssssssssssssssssssssssssssssssssss230ssssssssssssssssssssssssssssssssss\n", - "0 respect\n", - "1 women \n", - "2 react \n", - "3 meme \n", - "Name: 230, dtype: object\n", - "ssssssssssssssssssssssssssssssssss231ssssssssssssssssssssssssssssssssss\n", - "0 try not to \n", - "1 try not \n", - "2 dont laugh \n", - "3 try not to laugh\n", - "4 challenge \n", - "Name: 231, dtype: object\n", - "ssssssssssssssssssssssssssssssssss232ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 test \n", - "2 harvard \n", - "3 skin \n", - "4 race \n", - "Name: 232, dtype: object\n", - "ssssssssssssssssssssssssssssssssss233ssssssssssssssssssssssssssssssssss\n", - "0 fidget spinner \n", - "1 fidget spinner tricks\n", - "2 trick \n", - "3 fidget spinner unbox \n", - "Name: 233, dtype: object\n", - "ssssssssssssssssssssssssssssssssss234ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 angry \n", - "2 challenge\n", - "Name: 234, dtype: object\n", - "ssssssssssssssssssssssssssssssssss235ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 buzzfeed\n", - "2 drunk \n", - "3 goggle \n", - "4 goggles \n", - "Name: 235, dtype: object\n", - "ssssssssssssssssssssssssssssssssss236ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 Little Nightmares Gameplay \n", - "5 Little Nightmares Walkthrough Part 1\n", - "6 Little Nightmares Gameplay Part 1 \n", - "7 Little Nightmares Pewdiepie \n", - "8 Little Nightmares Trailer \n", - "9 Little Nightmares Full Gameplay \n", - "10 Little Nightmares PS4 \n", - "11 Little Nightmares Review \n", - "12 Little Nightmares Part 1 \n", - "13 Little Nightmares Reaction \n", - "14 Little Nightmares Scary \n", - "15 Little Nightmares Game \n", - "16 Scary Games \n", - "17 New PS4 Games \n", - "18 New Games 2017 \n", - "19 PS4 Games 2017 \n", - "20 Best Games 2017 \n", - "Name: 236, dtype: object\n", - "ssssssssssssssssssssssssssssssssss237ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 buzzfeed\n", - "Name: 237, dtype: object\n", - "ssssssssssssssssssssssssssssssssss238ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 238, dtype: object\n", - "ssssssssssssssssssssssssssssssssss239ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 barbie \n", - "5 youtube channel\n", - "6 vlogger \n", - "7 vlog \n", - "Name: 239, dtype: object\n", - "ssssssssssssssssssssssssssssssssss240ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "Name: 240, dtype: object\n", - "ssssssssssssssssssssssssssssssssss241ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 family friendly\n", - "5 frozen \n", - "6 frozen games \n", - "Name: 241, dtype: object\n", - "ssssssssssssssssssssssssssssssssss242ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 can this video\n", - "5 get \n", - "Name: 242, dtype: object\n", - "ssssssssssssssssssssssssssssssssss243ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 everything \n", - "5 game \n", - "6 everything game \n", - "7 play as anything \n", - "8 play as everything\n", - "9 play as \n", - "10 play everything \n", - "Name: 243, dtype: object\n", - "ssssssssssssssssssssssssssssssssss244ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 mass \n", - "5 effect \n", - "6 andromeda\n", - "7 video \n", - "8 game \n", - "9 ME \n", - "Name: 244, dtype: object\n", - "ssssssssssssssssssssssssssssssssss245ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "Name: 245, dtype: object\n", - "ssssssssssssssssssssssssssssssssss246ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 mind \n", - "5 blown \n", - "Name: 246, dtype: object\n", - "ssssssssssssssssssssssssssssssssss247ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 how dirty \n", - "5 is your mind\n", - "6 dirty mind \n", - "7 photos \n", - "8 funny \n", - "Name: 247, dtype: object\n", - "ssssssssssssssssssssssssssssssssss248ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 try not \n", - "5 to \n", - "6 laugh \n", - "7 try not to laugh\n", - "8 dont laugh \n", - "9 challenge \n", - "Name: 248, dtype: object\n", - "ssssssssssssssssssssssssssssssssss249ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 249, dtype: object\n", - "ssssssssssssssssssssssssssssssssss250ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 smash \n", - "5 or \n", - "6 pass \n", - "Name: 250, dtype: object\n", - "ssssssssssssssssssssssssssssssssss251ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 before he was famous\n", - "5 famous \n", - "6 young \n", - "7 young pewdiepie \n", - "Name: 251, dtype: object\n", - "ssssssssssssssssssssssssssssssssss252ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 try not \n", - "5 to get \n", - "6 try not to get \n", - "7 scared \n", - "8 challenge \n", - "9 scared challenge \n", - "10 try not to get scared challenge\n", - "Name: 252, dtype: object\n", - "ssssssssssssssssssssssssssssssssss253ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 253, dtype: object\n", - "ssssssssssssssssssssssssssssssssss254ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 re7 \n", - "5 RESIDENT eVIL 7 \n", - "6 GAMEPLAY \n", - "7 Resident Evil 7: Biohazard\n", - "8 BIOHAZARD \n", - "9 rewind \n", - "10 biohazard \n", - "11 survival horror \n", - "12 ps4 \n", - "13 playstation 4 \n", - "14 vr \n", - "15 demo \n", - "Name: 254, dtype: object\n", - "ssssssssssssssssssssssssssssssssss255ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 double gal\n", - "5 gun \n", - "6 double \n", - "7 gal \n", - "8 girl \n", - "9 anime \n", - "10 animes \n", - "Name: 255, dtype: object\n", - "ssssssssssssssssssssssssssssssssss256ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 cringe \n", - "5 try not \n", - "6 challenge \n", - "7 try not to\n", - "8 handshake \n", - "9 handshakes\n", - "Name: 256, dtype: object\n", - "ssssssssssssssssssssssssssssssssss257ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "Name: 257, dtype: object\n", - "ssssssssssssssssssssssssssssssssss258ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 beat \n", - "5 subscribers\n", - "6 most \n", - "Name: 258, dtype: object\n", - "ssssssssssssssssssssssssssssssssss259ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "Name: 259, dtype: object\n", - "ssssssssssssssssssssssssssssssssss260ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 to be continued\n", - "5 meme \n", - "6 compilation \n", - "7 continue \n", - "8 jojo \n", - "9 jojos \n", - "10 bizarre \n", - "11 adventure \n", - "Name: 260, dtype: object\n", - "ssssssssssssssssssssssssssssssssss261ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 how long can you watch\n", - "5 how long \n", - "6 watch \n", - "7 challenge \n", - "8 watching \n", - "9 time \n", - "Name: 261, dtype: object\n", - "ssssssssssssssssssssssssssssssssss262ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 the walking dead \n", - "5 walking dead \n", - "6 part 1 \n", - "7 season 3 \n", - "8 telltale \n", - "9 game \n", - "10 the walking dead seasons 3\n", - "11 walking dead full game \n", - "Name: 262, dtype: object\n", - "ssssssssssssssssssssssssssssssssss263ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "Name: 263, dtype: object\n", - "ssssssssssssssssssssssssssssssssss264ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 who is \n", - "5 more likely \n", - "6 markiplier \n", - "7 jacksepticeye \n", - "8 who is more likely\n", - "9 most likely \n", - "Name: 264, dtype: object\n", - "ssssssssssssssssssssssssssssssssss265ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 Bottleflip \n", - "5 Challenge \n", - "6 Bottle \n", - "7 Dab \n", - "8 Meme \n", - "9 Jacksepticeye\n", - "Name: 265, dtype: object\n", - "ssssssssssssssssssssssssssssssssss266ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 hot \n", - "5 sauce \n", - "6 lootcrate\n", - "Name: 266, dtype: object\n", - "ssssssssssssssssssssssssssssssssss267ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 deleting\n", - "5 channel \n", - "Name: 267, dtype: object\n", - "ssssssssssssssssssssssssssssssssss268ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 vlog \n", - "5 birdabo \n", - "Name: 268, dtype: object\n", - "ssssssssssssssssssssssssssssssssss269ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 Tuber Simulator \n", - "5 Tuber \n", - "6 Simulator \n", - "7 Pewdiepie Simulator \n", - "8 Pewdiepie Game \n", - "9 Youtube Game \n", - "10 IOS \n", - "11 Android \n", - "12 Youtuber Simulator \n", - "13 Competition \n", - "14 Fridays \n", - "15 Fridays with Pewdiepie\n", - "Name: 269, dtype: object\n", - "ssssssssssssssssssssssssssssssssss270ssssssssssssssssssssssssssssssssss\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 Vlog \n", - "5 Jacksepticeye\n", - "6 Slippy \n", - "7 Holiday \n", - "8 video \n", - "9 log \n", - "10 kickthepj \n", - "Name: 270, dtype: object\n", - "ssssssssssssssssssssssssssssssssss271ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 pewdie \n", - "4 my \n", - "5 favourite\n", - "6 videos \n", - "7 ever \n", - "Name: 271, dtype: object\n", - "ssssssssssssssssssssssssssssssssss272ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "3 meme \n", - "4 react \n", - "5 spicy \n", - "6 dank \n", - "Name: 272, dtype: object\n", - "ssssssssssssssssssssssssssssssssss273ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 cringy \n", - "4 cringe \n", - "5 cringe kid \n", - "6 cringe compilation\n", - "7 cringe react \n", - "8 react \n", - "Name: 273, dtype: object\n", - "ssssssssssssssssssssssssssssssssss274ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 Happy Wheels \n", - "4 Happy Wheels 3D\n", - "5 Guts and Glory \n", - "6 Let's Play \n", - "7 Download \n", - "8 Alpha \n", - "9 Gameplay \n", - "10 Montage \n", - "Name: 274, dtype: object\n", - "ssssssssssssssssssssssssssssssssss275ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 kick the pj\n", - "4 pj \n", - "5 google \n", - "6 google feud\n", - "Name: 275, dtype: object\n", - "ssssssssssssssssssssssssssssssssss276ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 lootcrate \n", - "4 5 weird stuff online\n", - "5 vlog \n", - "6 unboxing \n", - "Name: 276, dtype: object\n", - "ssssssssssssssssssssssssssssssssss277ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 Marzia \n", - "4 Gangbeasts \n", - "5 Multiplayer \n", - "6 gang beasts \n", - "7 gan \n", - "8 gang \n", - "9 beasts \n", - "10 funny multiplayer\n", - "11 funny \n", - "12 multiplayer \n", - "13 2 player \n", - "14 coop \n", - "Name: 277, dtype: object\n", - "ssssssssssssssssssssssssssssssssss278ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 Welcome to the game \n", - "4 Steam \n", - "5 deep web \n", - "6 illegal \n", - "7 hackers \n", - "8 hacking \n", - "9 hack \n", - "10 Welcome to the game red room \n", - "11 welcome to the game all codes\n", - "12 Hacking Game \n", - "Name: 278, dtype: object\n", - "ssssssssssssssssssssssssssssssssss279ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie\n", - "1 pewds \n", - "2 pdp \n", - "Name: 279, dtype: object\n", - "ssssssssssssssssssssssssssssssssss280ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 react \n", - "4 subscriber \n", - "5 special \n", - "6 montage \n", - "7 old pewdiepie\n", - "8 new pewdiepie\n", - "9 vlog \n", - "10 fridays \n", - "Name: 280, dtype: object\n", - "ssssssssssssssssssssssssssssssssss281ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 vlog \n", - "4 kicked out\n", - "5 moving \n", - "6 house \n", - "7 landlord \n", - "Name: 281, dtype: object\n", - "ssssssssssssssssssssssssssssssssss282ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 diamond \n", - "4 play button\n", - "5 playbutton \n", - "6 youtube \n", - "7 unboxing \n", - "Name: 282, dtype: object\n", - "ssssssssssssssssssssssssssssssssss283ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 283, dtype: object\n", - "ssssssssssssssssssssssssssssssssss284ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 uncharted 4 \n", - "4 uncharted \n", - "5 gameplay \n", - "6 uncharted 4 gameplay \n", - "7 uncharted 4 walkthrough part 1\n", - "8 through \n", - "9 play \n", - "10 walk \n", - "11 let's play \n", - "12 uncharted 4 trailer \n", - "13 gameplay walkthrough \n", - "14 a theif's end \n", - "15 review \n", - "16 multiplayer \n", - "Name: 284, dtype: object\n", - "ssssssssssssssssssssssssssssssssss285ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 sophie's curse\n", - "4 sohpie \n", - "5 curse \n", - "6 steam \n", - "7 horror \n", - "8 jumpscare \n", - "9 let's play \n", - "Name: 285, dtype: object\n", - "ssssssssssssssssssssssssssssssssss286ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 286, dtype: object\n", - "ssssssssssssssssssssssssssssssssss287ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 video games \n", - "4 dark souls 3 \n", - "5 dark souls 3 gameplay\n", - "6 gameplay \n", - "7 lets play \n", - "8 lets \n", - "9 play \n", - "10 commentary \n", - "11 dark souls \n", - "12 part 2 \n", - "13 game \n", - "14 walk \n", - "15 through \n", - "16 walkthrough \n", - "17 playthrough \n", - "Name: 287, dtype: object\n", - "ssssssssssssssssssssssssssssssssss288ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 288, dtype: object\n", - "ssssssssssssssssssssssssssssssssss289ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 video games\n", - "4 60 seconds \n", - "5 60 \n", - "6 seconds \n", - "7 steam \n", - "8 lets play \n", - "Name: 289, dtype: object\n", - "ssssssssssssssssssssssssssssssssss290ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 290, dtype: object\n", - "ssssssssssssssssssssssssssssssssss291ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 291, dtype: object\n", - "ssssssssssssssssssssssssssssssssss292ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewds \n", - "2 pdp \n", - "3 video games \n", - "4 pewdiepie iq\n", - "5 iq \n", - "6 iq test \n", - "7 smart \n", - "8 how smart \n", - "Name: 292, dtype: object\n", - "ssssssssssssssssssssssssssssssssss293ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 293, dtype: object\n", - "ssssssssssssssssssssssssssssssssss294ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 294, dtype: object\n", - "ssssssssssssssssssssssssssssssssss295ssssssssssssssssssssssssssssssssss\n", - "0 PewDiePie \n", - "1 YouTube Red \n", - "2 YouTube Red Original Series\n", - "3 horror games \n", - "4 horror video games \n", - "5 video games \n", - "6 pranks \n", - "7 YouTube Red membership \n", - "8 YouTube Red subscription \n", - "Name: 295, dtype: object\n", - "ssssssssssssssssssssssssssssssssss296ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 walk through\n", - "7 video games \n", - "8 lets play \n", - "Name: 296, dtype: object\n", - "ssssssssssssssssssssssssssssssssss297ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 walk through\n", - "7 video games \n", - "8 lets play \n", - "9 world chef \n", - "Name: 297, dtype: object\n", - "ssssssssssssssssssssssssssssssssss298ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 298, dtype: object\n", - "ssssssssssssssssssssssssssssssssss299ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 walk through \n", - "7 video games \n", - "8 lets play \n", - "9 mgsv \n", - "10 metal gear solid \n", - "11 the phantom pain \n", - "12 metal gear solid 5\n", - "13 intense \n", - "14 youtube gaming \n", - "15 gaming \n", - "16 gameplay \n", - "Name: 299, dtype: object\n", - "ssssssssssssssssssssssssssssssssss300ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 300, dtype: object\n", - "ssssssssssssssssssssssssssssssssss301ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 walk through\n", - "7 video games \n", - "8 lets play \n", - "Name: 301, dtype: object\n", - "ssssssssssssssssssssssssssssssssss302ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 walk through \n", - "7 video games \n", - "8 lets play \n", - "9 spookys \n", - "10 spooky's \n", - "11 house \n", - "12 of jumpscares\n", - "13 jumpscare \n", - "14 jumpscares \n", - "15 jumpscared \n", - "16 horror \n", - "17 scary \n", - "18 funny \n", - "19 reaction \n", - "Name: 302, dtype: object\n", - "ssssssssssssssssssssssssssssssssss303ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 303, dtype: object\n", - "ssssssssssssssssssssssssssssssssss304ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 304, dtype: object\n", - "ssssssssssssssssssssssssssssssssss305ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 305, dtype: object\n", - "ssssssssssssssssssssssssssssssssss306ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through\n", - "10 video games \n", - "11 vlog vlog \n", - "12 vlog \n", - "13 vlogs \n", - "Name: 306, dtype: object\n", - "ssssssssssssssssssssssssssssssssss307ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 307, dtype: object\n", - "ssssssssssssssssssssssssssssssssss308ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through \n", - "10 video games \n", - "11 The Walking Dead - Season 2 (TV Season)\n", - "12 telltale game \n", - "13 telltale games \n", - "14 walking dead \n", - "15 story \n", - "16 zombie \n", - "Name: 308, dtype: object\n", - "ssssssssssssssssssssssssssssssssss309ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through\n", - "10 video games \n", - "Name: 309, dtype: object\n", - "ssssssssssssssssssssssssssssssssss310ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 310, dtype: object\n", - "ssssssssssssssssssssssssssssssssss311ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through \n", - "10 video games \n", - "11 the imossible \n", - "12 quiz \n", - "13 question \n", - "14 questions \n", - "15 funny \n", - "16 reaction \n", - "17 the impossible quiz\n", - "18 all answers \n", - "19 answers \n", - "20 cheat \n", - "Name: 311, dtype: object\n", - "ssssssssssssssssssssssssssssssssss312ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through \n", - "10 video games \n", - "11 the wolf among us trailer\n", - "12 telltale \n", - "13 wolf among us \n", - "14 Gameplay \n", - "15 Ps3 \n", - "16 review \n", - "17 telltale games \n", - "18 part 1 \n", - "19 Xbox \n", - "20 the wolf among us \n", - "21 among \n", - "22 snowwhite \n", - "23 snow white \n", - "24 fairytale \n", - "Name: 312, dtype: object\n", - "ssssssssssssssssssssssssssssssssss313ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through \n", - "10 video games \n", - "11 linger \n", - "12 oculus \n", - "13 rift \n", - "14 reaction \n", - "15 oculus rift \n", - "16 vr \n", - "17 virtual reality\n", - "Name: 313, dtype: object\n", - "ssssssssssssssssssssssssssssssssss314ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through\n", - "10 video games \n", - "Name: 314, dtype: object\n", - "ssssssssssssssssssssssssssssssssss315ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 pewds \n", - "3 let's play \n", - "4 playthrough \n", - "5 walkthrough \n", - "6 play \n", - "7 walk \n", - "8 through \n", - "9 walk through\n", - "10 video games \n", - "Name: 315, dtype: object\n", - "ssssssssssssssssssssssssssssssssss316ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 316, dtype: object\n", - "ssssssssssssssssssssssssssssssssss317ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 317, dtype: object\n", - "ssssssssssssssssssssssssssssssssss318ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 lets \n", - "3 play \n", - "4 let´s play \n", - "5 horror \n", - "6 game \n", - "7 walkthrough\n", - "8 playthrough\n", - "9 letsplay \n", - "10 mod \n", - "11 gameplay \n", - "12 trailer \n", - "13 commentary \n", - "14 funny \n", - "Name: 318, dtype: object\n", - "ssssssssssssssssssssssssssssssssss319ssssssssssssssssssssssssssssssssss\n", - "0 Sequence\n", - "1 01 \n", - "2 19 \n", - "Name: 319, dtype: object\n", - "ssssssssssssssssssssssssssssssssss320ssssssssssssssssssssssssssssssssss\n", - "0 pewdiepie \n", - "1 pewdie \n", - "2 lets \n", - "3 play \n", - "4 let´s play \n", - "5 horror \n", - "6 game \n", - "7 walkthrough\n", - "8 playthrough\n", - "9 letsplay \n", - "10 mod \n", - "11 gameplay \n", - "12 trailer \n", - "13 commentary \n", - "14 funny \n", - "Name: 320, dtype: object\n", - "ssssssssssssssssssssssssssssssssss321ssssssssssssssssssssssssssssssssss\n", - "0 condemned \n", - "1 part \n", - "2 condmned \n", - "3 parrt \n", - "4 condomned \n", - "5 pewdiepie \n", - "6 lets \n", - "7 play \n", - "8 let's play\n", - "9 video \n", - "10 games \n", - "11 horror \n", - "12 xbox \n", - "13 ps3 \n", - "14 hd \n", - "15 pewdie \n", - "16 scary \n", - "17 game \n", - "18 scary game\n", - "19 gameplay \n", - "20 ending \n", - "21 secret \n", - "22 jumpscare \n", - "23 pop \n", - "24 pewds \n", - "Name: 321, dtype: object\n", - "ssssssssssssssssssssssssssssssssss322ssssssssssssssssssssssssssssssssss\n", - "0 Amnesiaaa \n", - "1 followed \n", - "2 by \n", - "3 death \n", - "4 ch2 \n", - "5 part \n", - "6 amnesia \n", - "7 the \n", - "8 dark \n", - "9 descent \n", - "10 pewdiepie\n", - "11 pewdie \n", - "12 custom \n", - "13 Ghosts \n", - "14 Tape \n", - "15 Pewdiepie\n", - "16 screaming\n", - "17 scream \n", - "18 girly \n", - "19 girl \n", - "20 horror \n", - "21 Scared \n", - "22 Creepy \n", - "23 Funny \n", - "24 chapter \n", - "Name: 322, dtype: object\n", - "ssssssssssssssssssssssssssssssssss323ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 323, dtype: object\n", - "ssssssssssssssssssssssssssssssssss324ssssssssssssssssssssssssssssssssss\n", - "0 callllllllll \n", - "1 28 \n", - "2 calling \n", - "3 27 \n", - "4 part \n", - "5 26 \n", - "6 clallinin \n", - "7 penis \n", - "8 the \n", - "9 lets \n", - "10 play \n", - "11 walkthrough \n", - "12 playthrough \n", - "13 through \n", - "14 wii \n", - "15 gameplay \n", - "16 Suzutani \n", - "17 suzutani \n", - "18 The \n", - "19 Possession \n", - "20 possession \n", - "21 ghosts \n", - "22 yt:quality=high\n", - "23 pewdiepie \n", - "24 funny \n", - "25 scary \n", - "26 wierd \n", - "Name: 324, dtype: object\n", - "ssssssssssssssssssssssssssssssssss325ssssssssssssssssssssssssssssssssss\n", - "0 calling \n", - "1 27 \n", - "2 part \n", - "3 26 \n", - "4 clallinin \n", - "5 penis \n", - "6 the \n", - "7 lets \n", - "8 play \n", - "9 walkthrough \n", - "10 playthrough \n", - "11 through \n", - "12 wii \n", - "13 gameplay \n", - "14 Suzutani \n", - "15 suzutani \n", - "16 The \n", - "17 Possession \n", - "18 possession \n", - "19 ghosts \n", - "20 yt:quality=high\n", - "21 pewdiepie \n", - "22 funny \n", - "23 scary \n", - "24 wierd \n", - "Name: 325, dtype: object\n", - "ssssssssssssssssssssssssssssssssss326ssssssssssssssssssssssssssssssssss\n", - "0 part \n", - "1 26 \n", - "2 clallinin \n", - "3 penis \n", - "4 calling \n", - "5 the \n", - "6 lets \n", - "7 play \n", - "8 walkthrough \n", - "9 playthrough \n", - "10 through \n", - "11 wii \n", - "12 gameplay \n", - "13 Suzutani \n", - "14 suzutani \n", - "15 The \n", - "16 Possession \n", - "17 possession \n", - "18 ghosts \n", - "19 yt:quality=high\n", - "20 pewdiepie \n", - "21 funny \n", - "22 scary \n", - "23 wierd \n", - "Name: 326, dtype: object\n", - "ssssssssssssssssssssssssssssssssss327ssssssssssssssssssssssssssssssssss\n", - "0 calling \n", - "1 the \n", - "2 lets \n", - "3 play \n", - "4 walkthrough \n", - "5 playthrough \n", - "6 through \n", - "7 wii \n", - "8 gameplay \n", - "9 Suzutani \n", - "10 suzutani \n", - "11 The \n", - "12 Possession \n", - "13 possession \n", - "14 ghosts \n", - "15 yt:quality=high\n", - "16 pewdiepie \n", - "17 funny \n", - "18 scary \n", - "19 wierd \n", - "Name: 327, dtype: object\n", - "ssssssssssssssssssssssssssssssssss328ssssssssssssssssssssssssssssssssss\n", - "0 the \n", - "1 attic \n", - "2 part \n", - "3 The \n", - "4 lets \n", - "5 play \n", - "6 playthrough \n", - "7 pewdiepie \n", - "8 chapter \n", - "9 scary \n", - "10 pewdie \n", - "11 walkthrough \n", - "12 horror \n", - "13 scared \n", - "14 screaming \n", - "15 scream \n", - "16 Funny \n", - "17 Horror Fiction\n", - "18 Maze \n", - "19 Game \n", - "20 Weird \n", - "21 Creepy \n", - "22 Open \n", - "23 Scare \n", - "24 Next \n", - "25 Strange \n", - "26 Prank \n", - "27 Story \n", - "28 Outside \n", - "29 Scary Maze \n", - "30 Rat \n", - "31 Scaring \n", - "Name: 328, dtype: object\n", - "ssssssssssssssssssssssssssssssssss329ssssssssssssssssssssssssssssssssss\n", - "0 Sequence \n", - "1 01 \n", - "2 aom \n", - "3 Afraid \n", - "4 Of \n", - "5 Monsters \n", - "6 director's \n", - "7 cut \n", - "8 ending \n", - "9 all endings\n", - "10 soundtrack \n", - "11 creepy \n", - "12 half \n", - "13 life \n", - "14 mod \n", - "15 sweden \n", - "16 pewdiepie \n", - "17 pewdie \n", - "18 scary \n", - "19 Scream \n", - "20 Game \n", - "21 Scared \n", - "22 Maze \n", - "23 Weird \n", - "24 Screaming \n", - "25 Strange \n", - "26 Funny \n", - "27 Prank \n", - "28 Scary Maze \n", - "29 Scaring \n", - "Name: 329, dtype: object\n", - "ssssssssssssssssssssssssssssssssss330ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 330, dtype: object\n", - "ssssssssssssssssssssssssssssssssss331ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 331, dtype: object\n", - "ssssssssssssssssssssssssssssssssss332ssssssssssssssssssssssssssssssssss\n", - "0 octodad \n", - "1 Octodad \n", - "2 Official \n", - "3 Trailer \n", - "4 octodad ending \n", - "5 octodad trailer \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 lets \n", - "9 play \n", - "10 let's \n", - "11 pewdiepie \n", - "12 funny \n", - "13 wierd \n", - "14 indie \n", - "15 Trailer (promotion)\n", - "16 Game \n", - "17 Weird \n", - "18 Gameplay \n", - "19 Playthrough Part \n", - "20 Humour \n", - "21 Play (theatre) \n", - "22 Crazy \n", - "23 Random \n", - "24 Silly \n", - "25 Mission \n", - "Name: 332, dtype: object\n", - "ssssssssssssssssssssssssssssssssss333ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 333, dtype: object\n", - "ssssssssssssssssssssssssssssssssss334ssssssssssssssssssssssssssssssssss\n", - "0 octodad \n", - "1 Octodad \n", - "2 Official \n", - "3 Trailer \n", - "4 octodad ending \n", - "5 octodad trailer \n", - "6 walkthrough \n", - "7 playthrough \n", - "8 lets \n", - "9 play \n", - "10 let's \n", - "11 pewdiepie \n", - "12 funny \n", - "13 wierd \n", - "14 indie \n", - "15 Trailer (promotion)\n", - "16 Game \n", - "17 Weird \n", - "18 Gameplay \n", - "19 Playthrough Part \n", - "Name: 334, dtype: object\n", - "ssssssssssssssssssssssssssssssssss335ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough \n", - "5 naked \n", - "6 scared \n", - "7 playthrough \n", - "8 amnesia \n", - "9 the \n", - "10 dark \n", - "11 descent \n", - "12 custom \n", - "13 story \n", - "14 mod \n", - "15 100% \n", - "16 scary \n", - "17 Scary and Funny Moments\n", - "18 scariest \n", - "19 moment \n", - "20 funny \n", - "21 Black \n", - "22 Plauge \n", - "23 Requiem \n", - "24 Frictional \n", - "25 how \n", - "26 to \n", - "27 Top \n", - "28 Scary \n", - "29 Moments \n", - "30 /W \n", - "31 PewDiePie \n", - "32 countdown \n", - "33 library of alexandria \n", - "34 part \n", - "35 Stephanos House \n", - "36 Stephano \n", - "37 piggeh \n", - "38 bro \n", - "39 Funny \n", - "40 Best \n", - "41 Let's \n", - "42 Game \n", - "43 Weird \n", - "44 Part 2 \n", - "Name: 335, dtype: object\n", - "ssssssssssssssssssssssssssssssssss336ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 336, dtype: object\n", - "ssssssssssssssssssssssssssssssssss337ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 337, dtype: object\n", - "ssssssssssssssssssssssssssssssssss338ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough\n", - "5 naked \n", - "6 scared \n", - "7 playthrough\n", - "8 amnesia \n", - "9 the \n", - "10 dark \n", - "11 descent \n", - "12 custom \n", - "13 story \n", - "14 mod \n", - "15 100% \n", - "16 scary \n", - "17 cannibalism\n", - "18 funny \n", - "19 moments \n", - "20 moment \n", - "21 top \n", - "22 pewdie \n", - "23 monster \n", - "24 trailer \n", - "25 100 \n", - "26 part 2 \n", - "27 episode 2 \n", - "28 Amnesia \n", - "29 nightmare \n", - "Name: 338, dtype: object\n", - "ssssssssssssssssssssssssssssssssss339ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough \n", - "5 magicka \n", - "6 playthrough \n", - "7 through \n", - "8 xebaz \n", - "9 tsubasahara \n", - "10 part 2 \n", - "11 magic wizards and shit\n", - "12 game \n", - "13 gameplay \n", - "14 playthrough part \n", - "15 mission \n", - "16 kevin \n", - "17 video game \n", - "18 Orlando Magic \n", - "19 Magic Johnson \n", - "20 playstation \n", - "21 trick \n", - "22 ps2 \n", - "23 xbox \n", - "24 card \n", - "25 tricks \n", - "26 ARMA 2 \n", - "27 john \n", - "28 revealed \n", - "29 david \n", - "30 criss \n", - "31 PlayStation 3 \n", - "32 Xbox \n", - "Name: 339, dtype: object\n", - "ssssssssssssssssssssssssssssssssss340ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough\n", - "5 naked \n", - "6 scared \n", - "7 playthrough\n", - "8 amnesia \n", - "9 the \n", - "10 dark \n", - "11 descent \n", - "12 custom \n", - "13 story \n", - "14 mod \n", - "15 100% \n", - "16 scary \n", - "17 cannibalism\n", - "18 funny \n", - "19 moments \n", - "20 moment \n", - "21 top \n", - "22 pewdie \n", - "23 monster \n", - "24 trailer \n", - "25 100 \n", - "26 part 5 \n", - "27 episode 5 \n", - "28 Through \n", - "29 portal \n", - "30 secret room\n", - "31 trollface \n", - "32 problem \n", - "Name: 340, dtype: object\n", - "ssssssssssssssssssssssssssssssssss341ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough\n", - "5 naked \n", - "6 scared \n", - "7 playthrough\n", - "8 amnesia \n", - "9 the \n", - "10 dark \n", - "11 descent \n", - "12 custom \n", - "13 story \n", - "14 mod \n", - "15 100% \n", - "16 scary \n", - "17 cannibalism\n", - "18 funny \n", - "19 moments \n", - "20 moment \n", - "21 top \n", - "22 pewdie \n", - "23 monster \n", - "24 trailer \n", - "25 100 \n", - "26 part 3 \n", - "27 episode 3 \n", - "28 Through \n", - "29 portal \n", - "30 game \n", - "31 level \n", - "32 let's \n", - "33 let's play \n", - "34 gameplay \n", - "35 techno \n", - "36 kevin \n", - "37 games \n", - "Name: 341, dtype: object\n", - "ssssssssssssssssssssssssssssssssss342ssssssssssssssssssssssssssssssssss\n", - "0 dead \n", - "1 island \n", - "2 Dead island gameplay \n", - "3 co-op \n", - "4 coop \n", - "5 lets \n", - "6 play \n", - "7 let \n", - "8 playthrough \n", - "9 walkthrough \n", - "10 dead island lets play \n", - "11 dead island playthrough\n", - "12 ending \n", - "13 zombie \n", - "14 zombies \n", - "15 survival \n", - "16 horror \n", - "17 pegi \n", - "18 uk \n", - "19 violence \n", - "20 violent \n", - "21 open \n", - "22 world \n", - "23 sandbox \n", - "24 Zombie \n", - "25 Horror \n", - "26 Banoi \n", - "27 Undead \n", - "28 PC \n", - "29 Xbox \n", - "30 360 \n", - "31 Playstation \n", - "32 PS3 \n", - "33 Deep \n", - "34 Silver \n", - "35 Techland \n", - "36 2011 \n", - "37 yt:quality=high \n", - "38 HD \n", - "39 720 \n", - "40 1080 \n", - "41 pewdiepie \n", - "42 morfar \n", - "43 cam \n", - "44 camera \n", - "45 pre \n", - "46 order \n", - "47 weapon \n", - "Name: 342, dtype: object\n", - "ssssssssssssssssssssssssssssssssss343ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough \n", - "5 naked \n", - "6 scared \n", - "7 playthrough \n", - "8 Fatal \n", - "9 Frame \n", - "10 Lets \n", - "11 blind \n", - "12 fatal \n", - "13 frame \n", - "14 II \n", - "15 pewdie \n", - "16 ending \n", - "17 part 1 \n", - "18 Fatal Frame Playthrough part 1\n", - "19 episode \n", - "20 let's \n", - "21 let's play \n", - "22 crimson \n", - "23 butterfly \n", - "24 scary \n", - "25 game \n", - "26 vampire \n", - "27 funny \n", - "28 gameplay \n", - "29 zero \n", - "30 playthrough part \n", - "31 mission \n", - "32 scream \n", - "33 anime \n", - "34 video \n", - "Name: 343, dtype: object\n", - "ssssssssssssssssssssssssssssssssss344ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough \n", - "5 naked \n", - "6 scared \n", - "7 playthrough \n", - "8 Fatal \n", - "9 Frame \n", - "10 Lets \n", - "11 blind \n", - "12 fatal \n", - "13 frame \n", - "14 II \n", - "15 pewdie \n", - "16 ending \n", - "17 part 1 \n", - "18 Fatal Frame Playthrough part 1\n", - "19 episode \n", - "20 let's \n", - "21 let's play \n", - "22 crimson \n", - "23 butterfly \n", - "24 scary \n", - "25 game \n", - "26 vampire \n", - "27 funny \n", - "28 gameplay \n", - "29 zero \n", - "30 playthrough part \n", - "31 mission \n", - "32 scream \n", - "33 anime \n", - "34 video game \n", - "35 can \n", - "36 basket \n", - "37 kevin \n", - "38 playstation \n", - "39 ps2 \n", - "Name: 344, dtype: object\n", - "ssssssssssssssssssssssssssssssssss345ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 pewdiepie \n", - "4 walkthrough \n", - "5 naked \n", - "6 scared \n", - "7 playthrough \n", - "8 Fatal \n", - "9 Frame \n", - "10 Lets \n", - "11 blind \n", - "12 fatal \n", - "13 frame \n", - "14 II \n", - "15 pewdie \n", - "16 ending \n", - "17 part 1 \n", - "18 Fatal Frame Playthrough part 1\n", - "19 episode \n", - "20 let's \n", - "21 let's play \n", - "22 crimson \n", - "23 butterfly \n", - "24 scary \n", - "25 game \n", - "26 vampire \n", - "27 funny \n", - "28 gameplay \n", - "29 zero \n", - "30 playthrough part \n", - "31 mission \n", - "32 scream \n", - "33 anime \n", - "34 video game \n", - "35 playstation \n", - "36 ps2 \n", - "37 basket \n", - "38 xbox \n", - "39 ps3 \n", - "40 maze \n", - "41 games \n", - "42 weird \n", - "43 creepy \n", - "44 screaming \n", - "Name: 345, dtype: object\n", - "ssssssssssssssssssssssssssssssssss346ssssssssssssssssssssssssssssssssss\n", - "0 Tags:\n", - "Name: 346, dtype: object\n", - "ssssssssssssssssssssssssssssssssss347ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 347, dtype: object\n", - "ssssssssssssssssssssssssssssssssss348ssssssssssssssssssssssssssssssssss\n", - "0 pewdie \n", - "1 xebaz \n", - "2 playing\n", - "3 fear \n", - "Name: 348, dtype: object\n", - "ssssssssssssssssssssssssssssssssss349ssssssssssssssssssssssssssssssssss\n", - "0 pewdie \n", - "1 Xebaz \n", - "2 are \n", - "3 playing\n", - "4 fear \n", - "5 again \n", - "6 and \n", - "7 still \n", - "8 failing\n", - "9 know \n", - "10 you \n", - "11 like \n", - "12 it \n", - "Name: 349, dtype: object\n", - "ssssssssssssssssssssssssssssssssss350ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 Amnesia \n", - "4 custom \n", - "5 story \n", - "6 la \n", - "7 caza \n", - "8 playthrough\n", - "9 walkthrough\n", - "10 walk \n", - "11 through \n", - "12 scary \n", - "13 fun \n", - "14 james \n", - "15 scream \n", - "16 moment \n", - "17 game \n", - "18 scared \n", - "19 horror \n", - "20 movie \n", - "21 gameplay \n", - "22 part 5 \n", - "23 episode 5 \n", - "Name: 350, dtype: object\n", - "ssssssssssssssssssssssssssssssssss351ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 Amnesia \n", - "4 custom \n", - "5 story \n", - "6 la \n", - "7 caza \n", - "8 playthrough\n", - "9 walkthrough\n", - "10 walk \n", - "11 through \n", - "12 scary \n", - "13 fun \n", - "14 james \n", - "15 scream \n", - "16 moment \n", - "17 game \n", - "18 scared \n", - "19 horror \n", - "20 movie \n", - "21 gameplay \n", - "22 part 4 \n", - "23 episode 4 \n", - "Name: 351, dtype: object\n", - "ssssssssssssssssssssssssssssssssss352ssssssssssssssssssssssssssssssssss\n", - "0 Sequence\n", - "1 01 \n", - "2 1 \n", - "Name: 352, dtype: object\n", - "ssssssssssssssssssssssssssssssssss353ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 Amnesia \n", - "4 custom \n", - "5 story \n", - "6 la \n", - "7 caza \n", - "8 playthrough\n", - "9 walkthrough\n", - "10 walk \n", - "11 through \n", - "12 scary \n", - "13 fun \n", - "14 james \n", - "15 scream \n", - "16 moment \n", - "17 game \n", - "18 scared \n", - "19 horror \n", - "20 movie \n", - "21 gameplay \n", - "22 part 2 \n", - "23 episode 2 \n", - "Name: 353, dtype: object\n", - "ssssssssssssssssssssssssssssssssss354ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 amnesia \n", - "4 DLC \n", - "5 Justine \n", - "6 Amnesia \n", - "7 justine \n", - "8 walkthrough\n", - "9 walk \n", - "10 through \n", - "11 pewdiepie \n", - "12 naked \n", - "13 scared \n", - "14 playthrough\n", - "15 the \n", - "16 dark \n", - "17 descent \n", - "18 dlc \n", - "19 100% \n", - "20 scary \n", - "21 funny \n", - "22 moments \n", - "23 moment \n", - "24 top \n", - "25 pewdie \n", - "26 ending \n", - "27 explained \n", - "28 monster \n", - "29 trailer \n", - "30 100 \n", - "31 part 5 \n", - "32 episode 5 \n", - "33 final \n", - "34 last \n", - "35 episode \n", - "36 part \n", - "Name: 354, dtype: object\n", - "ssssssssssssssssssssssssssssssssss355ssssssssssssssssssssssssssssssssss\n", - "0 lets \n", - "1 let \n", - "2 play \n", - "3 amnesia \n", - "4 DLC \n", - "5 Justine \n", - "6 Amnesia \n", - "7 justine \n", - "8 walkthrough\n", - "9 walk \n", - "10 through \n", - "11 pewdiepie \n", - "12 naked \n", - "13 scared \n", - "14 playthrough\n", - "15 the \n", - "16 dark \n", - "17 descent \n", - "18 dlc \n", - "19 100% \n", - "20 scary \n", - "21 funny \n", - "22 moments \n", - "23 moment \n", - "24 top \n", - "25 pewdie \n", - "26 ending \n", - "27 explained \n", - "28 monster \n", - "29 trailer \n", - "30 100 \n", - "31 part 3 \n", - "32 episode 3 \n", - "Name: 355, dtype: object\n", - "ssssssssssssssssssssssssssssssssss356ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 356, dtype: object\n", - "ssssssssssssssssssssssssssssssssss357ssssssssssssssssssssssssssssssssss\n", - "0 amnesia \n", - "1 fuck \n", - "2 scariest \n", - "3 moment \n", - "4 scary \n", - "5 horrible \n", - "6 scream \n", - "7 like \n", - "8 girl \n", - "9 reaction \n", - "10 funny \n", - "11 screaming \n", - "12 dark \n", - "13 descent \n", - "14 the \n", - "15 commentary \n", - "16 gothic \n", - "17 horror \n", - "18 moments \n", - "19 playthrough\n", - "20 first \n", - "21 lets \n", - "22 play \n", - "23 guide \n", - "24 prank \n", - "25 walkthrough\n", - "26 part \n", - "27 within \n", - "28 screams \n", - "29 subbed \n", - "30 scared \n", - "31 xdddd \n", - "32 turner \n", - "33 screamed \n", - "34 till \n", - "35 straight \n", - "36 tears \n", - "37 spoiler \n", - "38 yep \n", - "39 suiting \n", - "40 laughed \n", - "41 shriek \n", - "42 wheres \n", - "43 lmfao \n", - "44 yelping \n", - "45 upload \n", - "46 toby \n", - "Name: 357, dtype: object\n", - "ssssssssssssssssssssssssssssssssss358ssssssssssssssssssssssssssssssssss\n", - "0 SATIRE\n", - "Name: 358, dtype: object\n" + "Series([], Name: 142, dtype: object)\n" ] } ], @@ -4796,7 +872,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 11, "metadata": {}, "outputs": [ { @@ -4834,106 +910,106 @@ " \n", " \n", " 0\n", - " YOU HAD ONE JOB! - with editor Brad1\n", - " 5,292,299.0\n", - " 385,260.0\n", - " 4,080.0\n", + " PyCharm/IntelliJ fast and auto change of the color theme\n", + " 41.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 29,859.0\n", - " https://www.youtube.com/watch?v=B67OBHNCopk\n", - " [SATIRE, reddit, you had one job, onejob]\n", - " <pandas.io.formats.style.Styler object at 0x7f782f9170b8>\n", + " 2.0\n", + " https://www.youtube.com/embed/SsX9Fl958W0\n", + " https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", + " <pandas.io.formats.style.Styler object at 0x7ff60af976d8>\n", " \n", " \n", " 1\n", - " Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE\n", - " 5,358,149.0\n", - " 378,460.0\n", - " 3,950.0\n", + " How to add weather desklet to Linux Mint 19\n", + " 291.0\n", " 0.0\n", - " 38,075.0\n", - " https://www.youtube.com/watch?v=kLM_9gBZIqY\n", - " [SATIRE]\n", - " <pandas.io.formats.style.Styler object at 0x7f782f9170b8>\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " https://www.youtube.com/embed/-FPY_e0BdJs\n", + " https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", + " <pandas.io.formats.style.Styler object at 0x7ff60af976d8>\n", " \n", " \n", " 2\n", - " We broke another WORLD RECORD!\n", - " 8,557,324.0\n", - " 595,577.0\n", - " 7,899.0\n", + " How to easy integrate Google Calendar to Desktop for Linux Mint\n", + " 226.0\n", + " 1.0\n", + " 0.0\n", " 0.0\n", - " 53,664.0\n", - " https://www.youtube.com/watch?v=d1tAfXKc7-c\n", - " [SATIRE]\n", - " <pandas.io.formats.style.Styler object at 0x7f782f9170b8>\n", + " 0.0\n", + " https://www.youtube.com/embed/2evIujisdD0\n", + " https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", + " <pandas.io.formats.style.Styler object at 0x7ff60af976d8>\n", " \n", " \n", " 3\n", - " FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~\n", - " 3,609,152.0\n", - " 218,517.0\n", - " 3,125.0\n", + " Pandas use a list of values to select rows from a column\n", + " 45.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 17,595.0\n", - " https://www.youtube.com/watch?v=bMLdNrB5hAo\n", - " [SATIRE]\n", - " <pandas.io.formats.style.Styler object at 0x7f782f9170b8>\n", + " 10.0\n", + " https://www.youtube.com/embed/jlSbo5wmTPQ\n", + " https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", + " <pandas.io.formats.style.Styler object at 0x7ff60af976d8>\n", " \n", " \n", " 4\n", - " Don't Laugh Challenge, NEW SEASON!!!!!\n", - " 5,888,349.0\n", - " 569,878.0\n", - " 7,822.0\n", + " Pandas count and percentage by value for a column\n", + " 63.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 29,373.0\n", - " https://www.youtube.com/watch?v=Zgm_iM3f_ME\n", - " [SATIRE]\n", - " <pandas.io.formats.style.Styler object at 0x7f782f9170b8>\n", + " 0.0\n", + " https://www.youtube.com/embed/P5pxJkv71BU\n", + " https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", + " <pandas.io.formats.style.Styler object at 0x7ff60af976d8>\n", " \n", " \n", "\n", "" ], "text/plain": [ - " title Views \\\n", - "0 YOU HAD ONE JOB! - with editor Brad1 5,292,299.0 \n", - "1 Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE 5,358,149.0 \n", - "2 We broke another WORLD RECORD! 8,557,324.0 \n", - "3 FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~ 3,609,152.0 \n", - "4 Don't Laugh Challenge, NEW SEASON!!!!! 5,888,349.0 \n", + " title Views \\\n", + "0 PyCharm/IntelliJ fast and auto change of the color theme 41.0 \n", + "1 How to add weather desklet to Linux Mint 19 291.0 \n", + "2 How to easy integrate Google Calendar to Desktop for Linux Mint 226.0 \n", + "3 Pandas use a list of values to select rows from a column 45.0 \n", + "4 Pandas count and percentage by value for a column 63.0 \n", "\n", - " Like Dislike Favorite Comment \\\n", - "0 385,260.0 4,080.0 0.0 29,859.0 \n", - "1 378,460.0 3,950.0 0.0 38,075.0 \n", - "2 595,577.0 7,899.0 0.0 53,664.0 \n", - "3 218,517.0 3,125.0 0.0 17,595.0 \n", - "4 569,878.0 7,822.0 0.0 29,373.0 \n", + " Like Dislike Favorite Comment \\\n", + "0 0.0 0.0 0.0 2.0 \n", + "1 0.0 0.0 0.0 0.0 \n", + "2 1.0 0.0 0.0 0.0 \n", + "3 3.0 0.0 0.0 10.0 \n", + "4 3.0 0.0 0.0 0.0 \n", "\n", - " videoID \\\n", - "0 https://www.youtube.com/watch?v=B67OBHNCopk \n", - "1 https://www.youtube.com/watch?v=kLM_9gBZIqY \n", - "2 https://www.youtube.com/watch?v=d1tAfXKc7-c \n", - "3 https://www.youtube.com/watch?v=bMLdNrB5hAo \n", - "4 https://www.youtube.com/watch?v=Zgm_iM3f_ME \n", + " videoID \\\n", + "0 https://www.youtube.com/embed/SsX9Fl958W0 \n", + "1 https://www.youtube.com/embed/-FPY_e0BdJs \n", + "2 https://www.youtube.com/embed/2evIujisdD0 \n", + "3 https://www.youtube.com/embed/jlSbo5wmTPQ \n", + "4 https://www.youtube.com/embed/P5pxJkv71BU \n", "\n", - " tags \\\n", - "0 [SATIRE, reddit, you had one job, onejob] \n", - "1 [SATIRE] \n", - "2 [SATIRE] \n", - "3 [SATIRE] \n", - "4 [SATIRE] \n", + " tags \\\n", + "0 https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg \n", + "1 https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg \n", + "2 https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg \n", + "3 https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg \n", + "4 https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg \n", "\n", " nameurl \n", - "0 \n", - "1 \n", - "2 \n", - "3 \n", - "4 " + "0 \n", + "1 \n", + "2 \n", + "3 \n", + "4 " ] }, - "execution_count": 5, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } @@ -4956,7 +1032,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 12, "metadata": { "scrolled": false }, @@ -4982,63 +1058,63 @@ " \n", " \n", " 0\n", - " YOU HAD ONE JOB! - with editor Brad1\n", - " 5,292,299.0\n", - " 385,260.0\n", - " 4,080.0\n", + " PyCharm/IntelliJ fast and auto change of the color theme\n", + " 41.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 29,859.0\n", - " https://www.youtube.com/watch?v=B67OBHNCopk\n", - " [SATIRE, reddit, you had one job, onejob]\n", - " XXXXX\n", + " 2.0\n", + " https://www.youtube.com/embed/SsX9Fl958W0\n", + " https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", " 1\n", - " Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE\n", - " 5,358,149.0\n", - " 378,460.0\n", - " 3,950.0\n", + " How to add weather desklet to Linux Mint 19\n", + " 291.0\n", " 0.0\n", - " 38,075.0\n", - " https://www.youtube.com/watch?v=kLM_9gBZIqY\n", - " [SATIRE]\n", - " XXXXX\n", + " 0.0\n", + " 0.0\n", + " 0.0\n", + " https://www.youtube.com/embed/-FPY_e0BdJs\n", + " https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", " 2\n", - " We broke another WORLD RECORD!\n", - " 8,557,324.0\n", - " 595,577.0\n", - " 7,899.0\n", + " How to easy integrate Google Calendar to Desktop for Linux Mint\n", + " 226.0\n", + " 1.0\n", + " 0.0\n", " 0.0\n", - " 53,664.0\n", - " https://www.youtube.com/watch?v=d1tAfXKc7-c\n", - " [SATIRE]\n", - " XXXXX\n", + " 0.0\n", + " https://www.youtube.com/embed/2evIujisdD0\n", + " https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", " 3\n", - " FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~\n", - " 3,609,152.0\n", - " 218,517.0\n", - " 3,125.0\n", + " Pandas use a list of values to select rows from a column\n", + " 45.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 17,595.0\n", - " https://www.youtube.com/watch?v=bMLdNrB5hAo\n", - " [SATIRE]\n", - " XXXXX\n", + " 10.0\n", + " https://www.youtube.com/embed/jlSbo5wmTPQ\n", + " https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", " 4\n", - " Don't Laugh Challenge, NEW SEASON!!!!!\n", - " 5,888,349.0\n", - " 569,878.0\n", - " 7,822.0\n", + " Pandas count and percentage by value for a column\n", + " 63.0\n", + " 3.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 29,373.0\n", - " https://www.youtube.com/watch?v=Zgm_iM3f_ME\n", - " [SATIRE]\n", - " XXXXX\n", + " https://www.youtube.com/embed/P5pxJkv71BU\n", + " https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", "" @@ -5047,7 +1123,7 @@ "" ] }, - "execution_count": 6, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } @@ -5061,7 +1137,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 13, "metadata": {}, "outputs": [ { @@ -5084,244 +1160,244 @@ " \n", " \n", " \n", - " 77\n", - " bitch lasagna\n", - " 124,994,006.0\n", - " 6,176,065.0\n", - " 648,864.0\n", + " 91\n", + " No Python Interpreter Configured For The Module - PyCharm/IntelliJ\n", + " 11,367.0\n", + " 27.0\n", + " 20.0\n", " 0.0\n", - " 924,648.0\n", - " https://www.youtube.com/watch?v=6Dh-RL__uN4\n", - " [SATIRE, tseries, t series, diss, track, pewdiepie, song, rap, mixtape, disstrack, diss track, bitch lasagna]\n", - " XXXXX\n", + " 8.0\n", + " https://www.youtube.com/embed/mkKDI6y2kyE\n", + " https://i.ytimg.com/vi/mkKDI6y2kyE/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 263\n", - " THE RUBY PLAYBUTTON / YouTube 50 Mil Sub Reward Unbox\n", - " 61,378,839.0\n", - " 4,311,930.0\n", - " 145,857.0\n", + " 124\n", + " python extract text from image or pdf\n", + " 6,229.0\n", + " 16.0\n", + " 29.0\n", " 0.0\n", - " 609,535.0\n", - " https://www.youtube.com/watch?v=7Vj5M0qKh8g\n", - " [pewdiepie, pewds, pdp, pewdie]\n", - " XXXXX\n", + " 11.0\n", + " https://www.youtube.com/embed/PK-GvWWQ03g\n", + " https://i.ytimg.com/vi/PK-GvWWQ03g/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 33\n", - " YouTube Rewind 2018 but it's actually good\n", - " 47,979,866.0\n", - " 7,776,590.0\n", - " 79,097.0\n", + " 23\n", + " apex legends game requires directx 11 feature video card\n", + " 5,690.0\n", + " 36.0\n", + " 10.0\n", " 0.0\n", - " 705,084.0\n", - " https://www.youtube.com/watch?v=By_Cn5ixYLg\n", - " [rewind 2018, youtube rewind 2018]\n", - " XXXXX\n", + " 9.0\n", + " https://www.youtube.com/embed/NbvHU_KoD74\n", + " https://i.ytimg.com/vi/NbvHU_KoD74/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 309\n", - " GAME BANNED FROM KIDS? - Talking Angela\n", - " 37,174,431.0\n", - " 575,115.0\n", - " 16,369.0\n", + " 46\n", + " Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2\n", + " 5,397.0\n", + " 62.0\n", + " 2.0\n", " 0.0\n", - " 64,433.0\n", - " https://www.youtube.com/watch?v=pzYxlKSgxh0\n", - " [pewdiepie, pewdie, pewds, let's play, playthrough, walkthrough, play, walk, through, walk through, video games]\n", - " XXXXX\n", + " 26.0\n", + " https://www.youtube.com/embed/702lkQbZx50\n", + " https://i.ytimg.com/vi/702lkQbZx50/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 229\n", - " JAKE PAUL\n", - " 36,792,100.0\n", - " 1,832,490.0\n", - " 144,973.0\n", + " 134\n", + " ubuntu 16 04 server install headless google chrome\n", + " 4,468.0\n", + " 24.0\n", + " 6.0\n", " 0.0\n", - " 269,260.0\n", - " https://www.youtube.com/watch?v=TuIcBPm90aM\n", - " [pewdiepie, jake, paul, it's, everyday, bro, react, react world, fine]\n", - " XXXXX\n", + " 5.0\n", + " https://www.youtube.com/embed/t3ppxtEU6No\n", + " https://i.ytimg.com/vi/t3ppxtEU6No/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 267\n", - " DELETING MY CHANNEL\n", - " 35,035,463.0\n", - " 1,728,372.0\n", - " 261,139.0\n", + " 125\n", + " mysql 5 7 vs mysql 8 do you need to upgrade to mysql 8\n", + " 4,391.0\n", + " 12.0\n", + " 18.0\n", " 0.0\n", - " 220,740.0\n", - " https://www.youtube.com/watch?v=Y39LE5ZoKjw\n", - " [pewdiepie, pewds, pdp, pewdie, deleting, channel]\n", - " XXXXX\n", + " 9.0\n", + " https://www.youtube.com/embed/vHab6BNrHU8\n", + " https://i.ytimg.com/vi/vHab6BNrHU8/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 257\n", - " SHOOTING MY 50 MILLION AWARD!\n", - " 30,554,862.0\n", - " 1,110,375.0\n", - " 131,648.0\n", + " 116\n", + " Python read validate and import CSV JSON file to MySQL\n", + " 3,513.0\n", + " 12.0\n", + " 1.0\n", " 0.0\n", - " 106,113.0\n", - " https://www.youtube.com/watch?v=Jrvfoybj98Q\n", - " [pewdiepie, pewds, pdp, pewdie]\n", - " XXXXX\n", + " 6.0\n", + " https://www.youtube.com/embed/WbW0rHCX2UU\n", + " https://i.ytimg.com/vi/WbW0rHCX2UU/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 282\n", - " THE DIAMOND PLAY BUTTON!! (Part 1)\n", - " 29,833,868.0\n", - " 1,254,868.0\n", - " 43,421.0\n", + " 68\n", + " How to add annotations in new Youtube studio\n", + " 3,495.0\n", + " 21.0\n", + " 24.0\n", " 0.0\n", - " 120,324.0\n", - " https://www.youtube.com/watch?v=VY4wCi1pPkU\n", - " [pewdiepie, pewds, pdp, diamond, play button, playbutton, youtube, unboxing]\n", - " XXXXX\n", + " 6.0\n", + " https://www.youtube.com/embed/UcvCdFfI3bs\n", + " https://i.ytimg.com/vi/UcvCdFfI3bs/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 45\n", - " YouTube Rewind 2018 review\n", - " 27,723,233.0\n", - " 2,213,948.0\n", - " 95,125.0\n", + " 32\n", + " Apex Legends MSVCP140.dll Is Missing Fix, MSVCP120.dll Is Missing, not starting\n", + " 2,358.0\n", + " 14.0\n", + " 2.0\n", " 0.0\n", - " 138,585.0\n", - " https://www.youtube.com/watch?v=wYT1Qq6mo4I\n", - " [SATIRE, youtube, rewind, meme, yea, review]\n", - " XXXXX\n", + " 11.0\n", + " https://www.youtube.com/embed/ftGiBv3LL_A\n", + " https://i.ytimg.com/vi/ftGiBv3LL_A/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 163\n", - " PewDiePie Hej Monika Remix by Party In Backyard\n", - " 26,513,160.0\n", - " 951,974.0\n", - " 20,537.0\n", + " 13\n", + " Install latest NVIDIA drivers for Linux Mint 19/Ubuntu 18.04\n", + " 1,728.0\n", + " 13.0\n", + " 0.0\n", " 0.0\n", - " 140,487.0\n", - " https://www.youtube.com/watch?v=Vk8UEWHYfEg\n", - " [party in backyard, hej monika, monika, monica, song, pewdiepie, sing, singing]\n", - " XXXXX\n", + " 6.0\n", + " https://www.youtube.com/embed/CA6lyOmfRbM\n", + " https://i.ytimg.com/vi/CA6lyOmfRbM/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 311\n", - " The Impossible Quiz.\n", - " 26,013,637.0\n", - " 519,621.0\n", - " 7,816.0\n", + " 80\n", + " Simple ways to create shortcut in Linux Mint 19\n", + " 1,652.0\n", + " 9.0\n", + " 2.0\n", " 0.0\n", - " 39,587.0\n", - " https://www.youtube.com/watch?v=rOZ0OHaPmnk\n", - " [pewdiepie, pewdie, pewds, let's play, playthrough, walkthrough, play, walk, through, walk through, video games, the imossible, quiz, question, questions, funny, reaction, the impossible quiz, all answers, answers, cheat]\n", - " XXXXX\n", + " 6.0\n", + " https://www.youtube.com/embed/nOlH-P8-5PI\n", + " https://i.ytimg.com/vi/nOlH-P8-5PI/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 219\n", - " THE MOST ANNOYING SOUND IN THE WORLD!\n", - " 25,912,961.0\n", - " 867,935.0\n", - " 22,510.0\n", + " 52\n", + " linux mint disable login keyring\n", + " 1,592.0\n", + " 8.0\n", + " 1.0\n", " 0.0\n", - " 76,637.0\n", - " https://www.youtube.com/watch?v=baylWdHClNE\n", - " [5 weird, stuff, 5 weird stuff online]\n", - " XXXXX\n", + " 11.0\n", + " https://www.youtube.com/embed/dAKyi8aFq3Y\n", + " https://i.ytimg.com/vi/dAKyi8aFq3Y/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 264\n", - " WHO'S MORE LIKELY TO...?\n", - " 24,147,214.0\n", - " 838,813.0\n", - " 11,568.0\n", + " 81\n", + " The simplest way to run python headless test with Chrome on Ubuntu\n", + " 1,077.0\n", + " 8.0\n", + " 0.0\n", " 0.0\n", - " 79,185.0\n", - " https://www.youtube.com/watch?v=jA0xR2Ho9UU\n", - " [pewdiepie, pewds, pdp, pewdie, who is, more likely, markiplier, jacksepticeye, who is more likely, most likely]\n", - " XXXXX\n", + " 2.0\n", + " https://www.youtube.com/embed/BdppFIT_lIs\n", + " https://i.ytimg.com/vi/BdppFIT_lIs/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 265\n", - " BOTTLEFLIP CHALLENGE!\n", - " 23,462,006.0\n", - " 879,230.0\n", - " 14,349.0\n", + " 122\n", + " java benchmarks examples\n", + " 922.0\n", + " 5.0\n", + " 3.0\n", " 0.0\n", - " 75,539.0\n", - " https://www.youtube.com/watch?v=lyl6ibqnyis\n", - " [pewdiepie, pewds, pdp, pewdie, Bottleflip, Challenge, Bottle, Dab, Meme, Jacksepticeye]\n", - " XXXXX\n", + " 0.0\n", + " https://www.youtube.com/embed/m3Xf1ra2Ekg\n", + " https://i.ytimg.com/vi/m3Xf1ra2Ekg/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 225\n", - " THE RICH LIFE OF PEWDIEPIE\n", - " 20,579,289.0\n", - " 728,175.0\n", - " 22,673.0\n", + " 76\n", + " Easy way to convert dictionary to SQL insert with Python\n", + " 864.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 42,467.0\n", - " https://www.youtube.com/watch?v=GP9egt__qeI\n", - " [pewdiepie, the rich life of pewdiepie, before he was famous, before he was famous pewdiepie, pewdiepie rich, pewdiepie net worth, how much money does pewdiepie make, how much money, youtube money, money, net worth, networth, rich, rich life, the rich life]\n", - " XXXXX\n", + " 0.0\n", + " https://www.youtube.com/embed/hUXGQwTSfMs\n", + " https://i.ytimg.com/vi/hUXGQwTSfMs/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 41\n", - " Bitch Lasagna v1.2\n", - " 19,952,287.0\n", - " 1,758,301.0\n", - " 69,186.0\n", + " 14\n", + " Linux Mint identify, fix sound problems, set default device\n", + " 859.0\n", + " 4.0\n", + " 0.0\n", " 0.0\n", - " 152,529.0\n", - " https://www.youtube.com/watch?v=PX5QgITQAwk\n", - " [SATIRE]\n", - " XXXXX\n", + " 1.0\n", + " https://www.youtube.com/embed/PIAzK1rvqIY\n", + " https://i.ytimg.com/vi/PIAzK1rvqIY/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 248\n", - " TRY NOT TO LAUGH CHALLENGE #09 {Important Videos Edition}\n", - " 16,337,867.0\n", - " 591,088.0\n", - " 12,407.0\n", + " 71\n", + " python performance profiling in pycharm\n", + " 825.0\n", + " 0.0\n", + " 3.0\n", + " 0.0\n", " 0.0\n", - " 52,244.0\n", - " https://www.youtube.com/watch?v=IBhgOkorEZ4\n", - " [pewdiepie, pewds, pdp, pewdie, try not, to, laugh, try not to laugh, dont laugh, challenge]\n", - " XXXXX\n", + " https://www.youtube.com/embed/EZ-im7m8630\n", + " https://i.ytimg.com/vi/EZ-im7m8630/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 164\n", - " Trap Adventure 2 - WHO MADE THIS GAME AND WHY 😡😡? ! \" 🤰😡 - #001\n", - " 16,329,824.0\n", - " 725,795.0\n", - " 17,000.0\n", + " 70\n", + " Python Cumulative Sum per Group with Pandas\n", + " 801.0\n", + " 5.0\n", " 0.0\n", - " 39,152.0\n", - " https://www.youtube.com/watch?v=C1ObitoLwhM\n", - " [pewdiepie, trap adventure 2, rage, quit, game, videogame, trap, adventure, free download, link, trap adventure download, trap adventure 2 download, trap adventure 2 free download]\n", - " XXXXX\n", + " 0.0\n", + " 1.0\n", + " https://www.youtube.com/embed/1tCbvYv_ibw\n", + " https://i.ytimg.com/vi/1tCbvYv_ibw/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 260\n", - " DOES HE MAKE IT?\n", - " 16,282,855.0\n", - " 618,113.0\n", - " 11,354.0\n", + " 50\n", + " Linux Mint 19 How to change user password\n", + " 735.0\n", + " 6.0\n", + " 0.0\n", " 0.0\n", - " 34,363.0\n", - " https://www.youtube.com/watch?v=EfnDkNpXDBk\n", - " [pewdiepie, pewds, pdp, pewdie, to be continued, meme, compilation, continue, jojo, jojos, bizarre, adventure]\n", - " XXXXX\n", + " 2.0\n", + " https://www.youtube.com/embed/Odog86JslbA\n", + " https://i.ytimg.com/vi/Odog86JslbA/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", - " 52\n", - " Bich Lasagna V2. - Beat Saber / PART 1\n", - " 15,950,359.0\n", - " 1,005,972.0\n", - " 35,975.0\n", + " 21\n", + " play fortnite linux virtual machine\n", + " 532.0\n", + " 2.0\n", + " 1.0\n", " 0.0\n", - " 69,851.0\n", - " https://www.youtube.com/watch?v=2kpR0BdouNE\n", - " [SATIRE, beat, saber, vr, gameplay]\n", - " XXXXX\n", + " 1.0\n", + " https://www.youtube.com/embed/t_DI7NbjcFs\n", + " https://i.ytimg.com/vi/t_DI7NbjcFs/hqdefault.jpg\n", + " XXXXX\n", " \n", " \n", "" @@ -5330,7 +1406,7 @@ "" ] }, - "execution_count": 7, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -5343,22 +1419,22 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "" + "" ] }, - "execution_count": 8, + "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { - "image/png": "\n", + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAD8CAYAAACcjGjIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAIABJREFUeJzt3Xd8FHX+x/HXZ5OQAKGFEmlCKIIkAUIKQQQVqaKAHl3pRcTueaeeBT31Tu9QsSBKExAEFAX5nQoCoiLSQq9CKEKogVATSNvv74+dhIWlp0w2+Twfj31k9zvfmflMJtn3TtkZMcaglFJKuXPYXYBSSqmCR8NBKaWUBw0HpZRSHjQclFJKedBwUEop5UHDQSmllAcNB6WUUh40HJRSSnnQcFBKKeXB1+4CblSFChVMzZo17S5DKaW8xurVq48aYypeS1+vDYeaNWsSFxdndxlKKeU1ROTPa+2ru5WUUkp50HBQSinlQcNBKaWUB6895nAp6enpJCQkcO7cObtL8XoBAQFUq1YNPz8/u0tRStmgUIVDQkICpUqVombNmoiI3eV4LWMMx44dIyEhgZCQELvLUUrZoFDtVjp37hzly5fXYMghEaF8+fK6BaZUEVaowgHQYMgl+ntUqmgrdOGglIL4I2f4enUCehtgdaM0HHLRXXfdxfz58y9oGzVqFAMGDKBr1642VaWKmsV/HKHL6KX89av1vDp3swaEuiEaDrmoV69ezJgx44K2GTNmMGDAAGbNmmVTVaqoMMbw2dLdDJq0iupBJXgo9mYmL/uTl7/dhNOpAaGuj4ZDLuratSvfffcdaWlpAOzZs4cDBw5QvXp1wsLCAMjMzORvf/sb0dHRNGzYkE8//RSARx99lLlz5wJw//33M3DgQAAmTpzIiy++SHJyMh07dqRRo0aEhYUxc+ZMG5ZQFVTpmU5emrOJ1/5vC3ffGsysYc14vXMYw+6ozdTle3lxjgaEuj6F6lRWd6/932a2HDiVq9NsUKU0I+4LvezwoKAgYmJi+OGHH+jcuTMzZsyge/fuFxzcnTBhAmXKlGHVqlWkpqbSvHlz2rZtS4sWLViyZAmdOnVi//79HDx4EIAlS5bQs2dP5s2bR5UqVfjuu+8AOHnyZK4um/JeJ1PSefSLNfwWf5SH76jFc+3q43C4/uaea18PHweMXrwTYwz/uj88e5hSV6JbDrnMfdfSjBkz6NWr1wXDf/zxR6ZMmULjxo1p2rQpx44dY8eOHdnhsGXLFho0aEBwcDAHDx5k2bJl3HbbbYSHh7NgwQKee+45lixZQpkyZexYPFXA7DmazP1jlrJi9zH+07UhL3S49YI3fxHh2bb1eKJVHWas2sffv95Apm5BqGtQaLccrvQJPy917tyZp59+mjVr1pCSkkJkZCR79uzJHm6M4cMPP6Rdu3Ye4544cYJ58+bRsmVLkpKS+PLLLwkMDKRUqVKUKlWKNWvW8P333/PSSy9x991388orr+TjkqmCZvmuYwybuhqAzwc1JbZW+Uv2ExGeaVsPh0MYtXAHTqfhv90a4aNbEOoKCm042CUwMJC77rqLgQMHemw1ALRr144xY8bQqlUr/Pz82L59O1WrVqVkyZLExsYyatQofvrpJ44dO0bXrl2zz3I6cOAAQUFBPPTQQ5QtW5bx48fn96KpAuTLVft4cc5Gbg4qwYR+0dSsUPKq4zzV+hZ8RHhnwXacxjCyWyN8fXTngbo0DYc80KtXL+6//36PM5cABg8ezJ49e2jSpAnGGCpWrMicOXMAaNGiBT/++CN16tShRo0aJCUl0aJFCwA2btzI3/72NxwOB35+fowZMyZfl0kVDJlOw9vztjH21120qFuBj3o3oUzxa7/+1eN318XhEP47/w8yDbzXXQNCXZp46znQUVFR5uKb/WzdupVbb73VpooKH/19FizJqRk8OWMdC7ce5qHYmxlxXyh+N/jG/skvO3nrh210DK/MqJ6Nb3g6yruIyGpjTNS19NUtB6W8wIETZxk0OY4/Dp3i1fsa0O+2nF1cctgdtfF1CG98txWnMXzQK0IDQl1A/xqUKuDW7TtB59FL2ZeUwoT+0fRvHpIr174a3KIWL9/bgB82HeLRaWtIy3DmQrWqsNBwUKoA+7/1B+jx6TL8fR18M/w27qpXKVenP+j2EF69rwE/bjnM8GmrSc3IzNXpK++l4aBUAWSM4f2FO3h8+lrCq5bh20ebc0twqTyZV//mIbzeOZSFW4/wyNQ1nEvXgFAaDkoVOOfSM3lyxjreW7idByKqMm1IU8oH+ufpPPs0q8m/7g/np21HePjz1RoQSsNBqYIk8XQqvcYtZ+76A/ytXT3e6d4If1+ffJl376Y389YD4fy6I5EhU+I0IIo4DYdcFhgY6NH2ySefMGXKFADuvPNOLj4FVymArQdP0WX0UrYePMWYB5vw6F118v2mSz1jbubtvzTkt/ijDJ4cx9k0DYiiSk9lzQfDhg2zuwRVwC3aepgnpq8lMMCXrx6+jfBq9l07q3tUdXxEeHbWegZOWsWE/lGUKKZvFUWNbjnkg1dffZWRI0de0OZ0Ounfvz8vvfQS4LogX7NmzWjSpAndunXjzJkzdpSq8pkxhvFLdjF4ShwhFUvy7aO32xoMWf4SWY33ujdmxe5jDPhsFcmpGXaXpPJZ4f048MPzcGhj7k7zpnDo8FaOJ5ORkcGDDz5IWFgYL774IkePHuWNN95g4cKFlCxZkrfffpt3331XL6xXyKVnOnnl201MX7mP9qE38W6PRgXqE3qXiKqIwNMz1zHgs1VMHBBNoH/BqU/lLV3TNnj44Yfp3r07L774IgDLly9ny5YtNG/eHIC0tDSaNWtmZ4kqj51ISeORqWtYtusYw++szbPWVVMLms6Nq+LjEJ6csY7+E1fy2YBoSgVc+7WclPe6ajiIyETgXuCIMSbMagsCZgI1gT1Ad2PMcXEdPXsfuAdIAfobY9ZY4/QDXrIm+4YxZrLVHglMAooD3wNPmty44FMufMLPK7fddhuLFy/mr3/9KwEBARhjaNOmDdOnT7e7NJUPdiWeYdDkOBKOp/BOt0b8JbKa3SVd0b0Nq+AjwuPT19J34komD4yhtAZEoXctxxwmAe0vanseWGSMqQsssl4DdADqWo+hwBjIDpMRQFMgBhghIuWsccYAQ9zGu3hehc6gQYO455576N69OxkZGcTGxrJ06VLi4+MBSE5OZvv27TZXqfLC7/FHuf/j3zl5Np0vhsQW+GDI0iG8Mh/1bsLGhJP0mbCSk2fT7S5J5bGrhoMx5lcg6aLmzsBk6/lkoItb+xTjshwoKyKVgXbAAmNMkjHmOLAAaG8NK22MWW5tLUxxm5ZXSklJoVq1atmPd99995L9nnnmGSIiIujTpw/ly5dn0qRJ9OrVi4YNG9KsWTO2bduWz5WrvDZ95V76TlxJpVL+zBnenOiaQXaXdF3ah93EmIci2XLgJH0mrOBkigZEYXajxxyCjTEHreeHgGDreVVgn1u/BKvtSu0Jl2j3Wk7nlS9e9vPPP2c/f+2117Kft2rVilWrVuVVWcpGmU7Dv77fyoTfdtPylop81DvCa3fLtGkQzCcPRfLI1DU8OGE5Uwc1pWyJYnaXpfJAjk9ltT7x58tNIURkqIjEiUhcYmJifsxSqRw5k5rBkClxTPhtN/1vq8nEflFeGwxZ7r41mE/7RLL98Bl6j1vB8eQ0u0tSeeBGw+GwtUsI6+cRq30/UN2tXzWr7Urt1S7RfknGmLHGmChjTFTFihVvsHSl8kfC8RS6jvmdX7Yn8nrnUF7tFFpo7rp2V/1KjOsbxc7EM/Qat5xjZ1LtLknlshv9S50L9LOe9wO+dWvvKy6xwElr99N8oK2IlLMORLcF5lvDTolIrHWmU1+3aSnltdbsPU6X0UvZf+IskwZE06dZTbtLynV33FKRCf2i2X00md7jVnBUA6JQuWo4iMh0YBlQT0QSRGQQ8BbQRkR2AK2t1+A6FXUXEA+MA4YDGGOSgNeBVdbjn1YbVp/x1jg7gR9yZ9GUsse36/bTc+xyShTzZfbw22hRt/Bu5d5etwKf9Y/mz6Rkeo1dTuJpDYjCQu8hrS5Lf5/Xx+k0jFq4nQ9+iiemZhCf9IkkqGTROFi7bOcxBk5aRZWyAUwfEkul0gF2l6Qu4XruIV04doAqZbNz6Zk8PmMtH/wUT9fIanw+OKbIBANAs9rlmTwwhoMnz9Fz7HIOnzpnd0kqhzQccpmPjw+NGzcmNDSURo0a8c4772Sf3hoXF8cTTzxx2XH37NlDWFiYR99LXbhPFRxHTp2jx9jlfL/xIM93qM9/uzbMt3swFCQxIUFMGRjD4VOugDh0UgPCm+m1lXJZ8eLFWbduHQBHjhyhd+/enDp1itdee42oqCiioq5pi+66+ir7bD5wksGT4ziRks4nD0XSLvQmu0uyVVTNIKYMiqHfxFX0GLuM6UNiqVK2uN1lqRugWw55qFKlSowdO5aPPvoIYww///wz9957LwC//PILjRs3pnHjxkRERHD69OkLxnXv627cuHF06NCBs2fPsnPnTtq3b09kZCQtWrTQb1Xnsx83H6LbJ8swBr4a1qzIB0OWyBpBfD4ohqQzafQYu4yE4yl2l6RuQKHdcnh75dtsS8rdN8v6QfV5Lua56xqnVq1aZGZmcuTIkQvaR44cyejRo2nevDlnzpwhIODqB/A++ugjFixYwJw5c/D392fo0KF88skn1K1blxUrVjB8+HB++umn66pPXT9jDGN/3cVb87YRXrUM4/pGEawHYC8QcXM5pg5uSp8JK+g5djnTh8RSPaiE3WWp61Bow6Gga968Oc888wwPPvggDzzwANWqXfkCbFOmTKF69erMmTMHPz8/zpw5w++//063bt2y+6Sm6mmEeS0tw8lLczbyZVwCHcMrM7JbI4oXK3rHF65Fo+plmTY4lofcAuLm8hoQ3qLQhsP1fsLPK7t27cLHx4dKlSqxdevW7Pbnn3+ejh078v3339O8eXPmz59/xa2H8PBw1q1bR0JCAiEhITidTsqWLZt9fEPlvePJaQybupoVu5N4olUdnmp9S4G8B0NBEl6tDNMGN7UCYhlfDImlZoWSdpelroEec8hDiYmJDBs2jMcee8zjRvE7d+4kPDyc5557jujo6KseL4iIiODTTz+lU6dOHDhwgNKlSxMSEsJXX30FuHZ1rF+/Ps+WpaiLP3KGLh8vZe3eE4zq0ZhnCujNeQqisKpl+GJwLGfTM+k5djm7jybbXZK6BhoOuezs2bPZp7K2bt2atm3bMmLECI9+o0aNIiwsjIYNG+Ln50eHDh2uOu3bb7+dkSNH0rFjR44ePcq0adOYMGECjRo1IjQ0lG+/1SuP5IXfdhzl/o+XkpyawfShsXSJ8OoLB9uiQZXSTB8aS3qmkx6fLmNnot4jvaDTb0iry9LfJ0xd/icj5m6mTsVAxveL0oOqObT98Gl6j1uOiDB9SFPqVCpld0lFin5DWqkcysh08urczbw0ZxMt61Zg1iPNNBhywS3BpZgxNBaAnmNXsP3w6auMoeyi4aDURU6dS2fQ5Dgm/b6Hgc1DGN8vmlJefg+GgqROJVdAOAR6jV3OtkOn7C5JXYKGg1Ju9iW57sGwNP4ob94fxiv3NcBHDzznutoVA5kxNBZfH6H3uBVsOaABUdBoOChliduTROfRSzl08hyTB8bwYNMadpdUqNWqGMjMoc3w93XQe/xyNh84aXdJyo2Gg1LAN2sS6D1uBaUDfJn9aHOa16lgd0lFQs0KJZk5tBkli/nSe9wKNu3XgCgoNBxUkeZ0Gv47fxvPfLmeJjXKMnt4c2pXDLS7rCLl5vIlmDE0lkB/X3qPW876fSfsLkmh4ZDrsi7ZnfXYs2dPns3rwIEDdO3aFYB169bx/fff59m8CqOzaZk8+sUaRi/eSc/o6kwZ2JRyRegeDAVJ9aASzHw4ljIl/HhowgrW7j1ud0lFnoZDLsu6ZHfWo2bNmnkyn4yMDKpUqcKsWbMADYfrdfjUObp/uox5mw/xUsdb+fcD4RTz1X8HO1UrV4IZQ5tRrkQx+k5Yyeo/NSDspP8N+WDPnj20aNGCJk2a0KRJE37//XcAevbsyXfffZfdr3///syaNYtz584xYMAAwsPDiYiIYPHixQBMmjSJTp060apVK+6+++7smwOlpaXxyiuvMHPmTBo3bszMmTNJTk5m4MCBxMTEEBERod+edrNp/0k6ffQbOxPPMK5PFINb1PK4vImyR9WyxZn5cCzlA4vRb+JK4vYkXX0klScK7YX3Dv3rX6Ruzd1LdvvfWp+b/vGPK/bJunwGQEhICLNnz6ZSpUosWLCAgIAAduzYQa9evYiLi6NHjx58+eWXdOzYkbS0NBYtWsSYMWMYPXo0IsLGjRvZtm0bbdu2Zfv27QCsWbOGDRs2EBQUlL3LqlixYvzzn/8kLi6Ojz76CIB//OMftGrViokTJ3LixAliYmJo3bo1JUsW7Yuezdt0kKdnrqdcCT9mDbuNBlVK212SukjlMsWZ+XAzeo1dTt+JK5k0IIaYkCC7yypyCm042MX9TnBZ0tPTeeyxx1i3bh0+Pj7Zb/QdOnTgySefJDU1lXnz5tGyZUuKFy/Ob7/9xuOPPw5A/fr1qVGjRvY4bdq0ISjo6v8oP/74I3Pnzs2+vei5c+fYu3dvkb0chjGGMb/s5D/z/qBx9bKM7RtJpVJ6D4aCKrh0ADOGxtJr3HL6f7aSif2jia1V3u6yipRCGw5X+4Sfn9577z2Cg4NZv349Tqcz+9LcAQEB3HnnncyfP5+ZM2fSs2fPq07rWj/5G2P4+uuvqVevXo5qLwxSMzJ54ZuNfLNmP/c1qsJ/uzYkwE/vwVDQVSodwPShsTw4boUrIPpFc5ueYpxv9JhDPjh58iSVK1fG4XDw+eefk5mZmT2sR48efPbZZyxZsoT27dsD0KJFC6ZNmwbA9u3b2bt371Xf5EuVKnXBrUbbtWvHhx9+SNaFFdeuXZvbi+UVjp1J5aHxK/hmzX6eal2XD3o21mDwIpVKuQKiRlBJBkxaxW87jtpdUpGh4ZAPhg8fzuTJk2nUqBHbtm274NN/27Zt+eWXX2jdujXFihXL7u90OgkPD6dHjx5MmjQJf3//K87jrrvuYsuWLdkHpF9++WXS09Np2LAhoaGhvPzyy3m6jAXRjsOn6fLxUjYknOSDXhE81foWPfDshSoE+vPFkKaEVCjJoMmr+HV7ot0lFQl6yW51Wd78+/xleyKPTVuDv58P4/pGEnFzObtLUjmUlJzGQ+NXEJ94hrF9IrmzXiW7S/I6esluVaRN/n0PAz5bSdVyxfn2seYaDIVEUMlifDGkKXUrBTJ0ymp+2nbY7pIKtRyFg4g8LSKbRWSTiEwXkQARCRGRFSISLyIzRaSY1dffeh1vDa/pNp0XrPY/RKRdzhZJFVUZmU5enrOJEXM306p+JWY9chtVyxa3uyyVi8qWKMYXg2OpX7kUD3++mgVbNCDyyg2Hg4hUBZ4AoowxYYAP0BN4G3jPGFMHOA4MskYZBBy32t+z+iEiDazxQoH2wMciokcM1XU5eTadAZNW8fnyPxnashaf9oki0L/QnoxXpJUp4cfng5rSoEoZhk9bzfzNh+wuqVDK6W4lX6C4iPgCJYCDQCtgljV8MtDFet7Zeo01/G5xHR3sDMwwxqQaY3YD8UBMDutSRcifx5J54OOlLNt5jLf/Es4/7rlV78FQyJUp7sfng2IIq1qGR6et4YeNB+0uqdC54XAwxuwHRgJ7cYXCSWA1cMIYk2F1SwCy7sZeFdhnjZth9S/v3n6JcZS6opW7k+gyeinHktP4fFBTekTfbHdJKp+UDvBjysAYGlUvy2PT1/LdBg2I3JST3UrlcH3qDwGqACVx7RbKMyIyVETiRCQuMVFPZyvqvorbx4Pjl1OuRDFmD29Os9r6DdqiplSAH5MHxtDk5rI8MWMtc9cfsLukQiMnu5VaA7uNMYnGmHTgG6A5UNbazQRQDdhvPd8PVAewhpcBjrm3X2KcCxhjxhpjoowxURUrVsxB6Xnn0KFD9OzZk9q1axMZGck999yTfemLgubnn3/OvgigN3E6DW/9sI2/zdpATEgQs4c3J6RC0b5mVFEW6O/LpAExRNYox1Mz1jJn7SXfPtR1ykk47AViRaSEdezgbmALsBjoavXpB2RdDnSu9Rpr+E/G9SWLuUBP62ymEKAusDIHddnGGMP999/PnXfeyc6dO1m9ejX//ve/OXy4YJ5R4Y3hkJKWwbCpq/nkl530bnozkwbEUKaEn91lKZuV9Pdl0oBomoaU55kv1/HNmgS7S/J+xpgbfgCvAduATcDngD9QC9ebezzwFeBv9Q2wXsdbw2u5TedFYCfwB9DhWuYdGRlpLrZlyxaPtvy0aNEi06JFC492p9Npnn32WRMaGmrCwsLMjBkzjDHGLF682LRs2dJ06tTJhISEmOeee85MnTrVREdHm7CwMBMfH2+MMaZfv35m2LBhpmnTpiYkJMQsXrzYDBgwwNSvX9/069cvez7z5883sbGxJiIiwnTt2tWcPn3aGGNMjRo1zCuvvGIiIiJMWFiY2bp1q9m9e7cJDg42VapUMY0aNTK//vqrR912/z4vduBEirnn/V9NyPP/MxOW7DJOp9PuklQBk5KaYXqPW2ZqPv8/8+WqvXaXU+AAceYa399zdK6fMWYEMOKi5l1c4mwjY8w5oNtlpvMm8GZOarnYki+3c3TfmdycJBWqB9Ki+y2XHb5p0yYiIyM92r/55hvWrVvH+vXrOXr0KNHR0bRs2RKA9evXs3XrVoKCgqhVqxaDBw9m5cqVvP/++3z44YeMGjUKgOPHj7Ns2TLmzp1Lp06dWLp0KePHjyc6Opp169ZRrVo13njjDRYuXEjJkiV5++23effdd3nllVdctVeowJo1a/j4448ZOXIk48ePZ9iwYQQGBvLss8/m6u8pL6zfd4IhU+JISctkQr9o7qqv345VnooX82FCv2iGTInj719vwGmMnqRwg/RE8Hzw22+/0atXL3x8fAgODuaOO+5g1apVlC5dmujoaCpXrgxA7dq1adu2LQDh4eHZN/kBuO+++xARwsPDCQ4OJjw8HIDQ0FD27NlDQkICW7ZsoXnz5gCkpaXRrFmz7PEfeOABACIjI/nmm2/yZblzy3cbDvLMl+uoEOjP1480pd5NpewuSRVgAX4+jOsbxcOfr+a5rzeS6YTeTTUgrlehDYcrfcLPK6Ghodm37bxW7hfUczgc2a8dDgcZGRke/dz7uPfz8fGhTZs2TJ8+/Yrz8fHxuWC6BZkxho9+iuedBduJrFGOT/tEUiHwyhcgVApcAfFpn0gembqaf8zeSKYx9ImtYXdZXkWvrZSLWrVqRWpqKmPHjs1u27BhA2XLlmXmzJlkZmaSmJjIr7/+SkxM7n7PLzY2lqVLlxIfHw9AcnLyVc+Suvgy3wXJufRMnp65jncWbKdL4ypMG9xUg0FdlwA/Hz7pE0nrWyvx8pxNTP59j90leRUNh1wkIsyePZuFCxdSu3ZtQkNDeeGFF+jduzcNGzakUaNGtGrViv/85z/cdNNNuTrvihUrMmnSJHr16kXDhg1p1qwZ27Zd+Tap9913H7Nnz6Zx48YsWbIkV+vJiaNnUnlw/ArmrDvAX9vcwns99B4M6sb4+/rw8YORtGkQzIi5m5n42267S/IaesludVl2/D7/OHSagZNWcSw5lXe6NaZjw8r5On9VOKVnOnn8i7XM23yIlzreyuAWtewuyRZ6yW7llRZvO8JfxvxOeqaTLx9upsGgco2fj4MPe0fQMbwyb3y3lU9/2Wl3SQVeoT0grbyHMYbPlu7hje+2UP+m0kzoH0XlMnqpbZW7/HwcvN+zMSLw7x+2kWkMw++sY3dZBVahCwdjjN4KMhfk1+7G9EwnI+Zu5osVe2nbIJj3ejSmpF5qW+URXx8Ho3o0xsch/GfeH2RmGh6/u67dZRVIheq/MCAggGPHjlG+fHkNiBwwxnDs2DECAgLydD4nU9IZ/sVqlsYfY9gdtfl7u3o49FLbKo/5+jh4t3tjfER4Z8F2Mo3hqdb5f+p7QVeowqFatWokJCSgV2zNuYCAAKpVq5Zn0999NJlBk1ax73gK/+3akG5R1a8+klK5xMch/LdbIxwOYdTCHTgNPN26rn6odFOowsHPz4+QkBC7y1BXsWznMYZNXY1DYOqgpjStpZfaVvnPxyH85y8NcQh8sGgHTqfhr21v0YCwFKpwUAXfzFV7eXH2JmpWKMmEflHUKK+X2lb2cTiEtx5oiI9D+GhxPBlOw3Pt62lAoOGg8kmm0/DWD1sZt2Q3LepWYPSDTSgdoJfaVvZzOIQ3u4Tj4xA++WUnTmN4oUP9Ih8QGg4qzyWnZvDkjLUs3HqEvs1q8Mq9DfD10a/YqILD4RBe7xyGQ4Sxv+4i02l4qeOtRTogNBxUntp/4iyDJ8fxx6FTvNYplH631bS7JKUuSUR4rVMoDhEm/LabTKdhxH0NimxAaDioPLN273GGTFlNanomnw2I4Y5bCuatXZXKIiKMuK8BPg5XQDiN4bVOoUUyIDQcVJ6Yu/4Az361nuDS/kwf0pS6wXoPBuUdRISXOt6Kr0P41NrF9HrnsCL3HRwNB5WrjDG8v2gHoxbuILpmOT55KJLyeqlt5WVEhOc71MfhEMb87DpI/WaX8CIVEBoOKtecS8/k77M2MHf9AR5oUpV/PxCOv69ealt5JxHh7+3q4SOu01wznYa3HmhYZAJCw0HliiOnzzF0ymrW7TvB39vX45E7ahfJ/bSqcBER/tr2FhwO4YNFO8h0wn+6ur4XUdhpOKgc23rwFIMmreJ4SjqfPBRJ+7DcvZGRUnYSEZ5pcws+Iry3cDvGGP7brVGhDwgNB5UjC7cc5okZaykV4MtXw5oRVrWM3SUplSeebF0XHweM/NF1sb53ujUq1N/X0XBQN8QYw/glu/nXD1sJq1KGcX2juKlM3l7FVSm7PdaqLo6sy307DaN6NC60AaHhoK5bWoaTl+dsYmbcPjqE3cS73RtTvJgeeFZFw/BsqJMgAAAUOUlEQVQ76+Ajwr9/2IbTGN7vGYFfIQwIDQd1XU6kpDFs6mqW70risbvq8EybW4rM2RtKZXn4jtr4OIQ3vtuK07mWD3pFUMy3cAWEhoO6ZrsSzzBochz7j5/l3e6NeKBJ3t3vQamCbnCLWjhE+Of/tvDMl+v4oGdEofqglKOoE5GyIjJLRLaJyFYRaSYiQSKyQER2WD/LWX1FRD4QkXgR2SAiTdym08/qv0NE+uV0oVTuWxp/lC6jl3LybDpfDGmqwaAUMPD2EJ7vUJ//bTjIm99vtbucXJXT7aD3gXnGmPpAI2Ar8DywyBhTF1hkvQboANS1HkOBMQAiEgSMAJoCMcCIrEBRBcO0FX/Sd+JKbioTwLePNieqZpDdJSlVYDzcshb9b6vJhN92M+7XXXaXk2tueLeSiJQBWgL9AYwxaUCaiHQG7rS6TQZ+Bp4DOgNTjOvO9cutrY7KVt8Fxpgka7oLgPbA9ButTeWOTKfhze+2MnHpbu6sV5EPe0VQSu/BoNQFRIRX7m1A4ulU3vx+K5VK+9O5cVW7y8qxnBxzCAESgc9EpBGwGngSCDbGHLT6HAKCredVgX1u4ydYbZdrVzY6fS6dJ2es46dtR+h/W03XhcgK4RkZSuUGh0N4p3sjjp5J5dmv1lO+pD+3161gd1k5kpP/dl+gCTDGGBMBJHN+FxIA1laCycE8LiAiQ0UkTkTiEhMTc2uy6iL7klLoOmYZv2xP5PUuYbzaKVSDQamrCPDzYWzfKGpXDOThz+PYtP+k3SXlSE7+4xOABGPMCuv1LFxhcdjaXYT184g1fD9Q3W38albb5do9GGPGGmOijDFRFSvqvQHywuo/k+gyeikHTp5l0oBo+sTWsLskpbxGmeJ+TBoQQ5nifvT/bBX7klLsLumG3XA4GGMOAftEpJ7VdDewBZgLZJ1x1A/41no+F+hrnbUUC5y0dj/NB9qKSDnrQHRbq03lszlr99Nr7AoCA3yZPbw5LepqACt1vW4qE8CUQTGkZzrpO3ElSclpdpd0Q3K6r+BxYJqIbAAaA/8C3gLaiMgOoLX1GuB7YBcQD4wDhgNYB6JfB1ZZj39mHZxW+cPpNLz74x88NXMdjW8uy5zhzalTKdDuspTyWnUqlWJCvygOnDjLwEmrSEnLsLuk6yauwwLeJyoqysTFxV33eD0+XYafj4OgksUu+ShfshjlShajXIlihf6qiwBn0zJ59qv1fLfxIN2jqvFGl/BC901Ppewyf/MhHpm6mjvrVWJsn0jbj92JyGpjTNS19C1S35A2xlC6uB/HzqSScDyFY8lpnD536UQXgbLF/ShnBcaFIeJPUEk/gkr6Z4dJ+ZLFCPDzrusLHTl1jsFT4ti4/yQvdKjP0Ja19B4MSuWidqE38c/OYbw0ZxP/mL2Rt//S0Gv+x4pUOIgI4/peGJppGU5OpKRxLDmN48mun0mXeOw5msLqP09wPCWNTOelt7ZKFPPx3BopUYygQGuLpEQxygda4VKiGKWL+9r2h7Jp/0mGTInj5Nl0xvaJok2D4KuPpJS6bg/F1uDIqXN88FM8N5UO4Jm29a4+UgFQpMLhUor5OqhUOoBKpa/tctNOp+H0uQyOJadeEB5Z4ZLkFjA7Dp8hKTmNs+mZl5yWr0MolxUgJV0hkvW8fKAVJm7t5UoWy5WrP87ffIinZqyjbAk/vhrWjNAqeg8GpfLS021u4fCpVD74KZ7gMgE82LTgnwVY5MPhejkcQpkSfpQp4UetazyZ52xaJkkpaSSdSXP9TE7l2BlXgBxPSct+vvXgKZKS0ziRkn7ZaZUO8L3q7i33LZcSxXyyt06MMXz66y7enreNhtXKMq5P5DWHolLqxokIb94fRuKZVF6es4kKgf60Cy3Yd0wscgekvUFGppMTZ9NdWyFnrABJdoVL9vPkVJKS062faaRnXno9+vs6soPDxyFsSDhJx4aVeadbI687RqKUt0tJy6DXuBVsO3iKaYOb5vt1yq7ngLSGQyFgjOFMasb5XVrZWygXPo6npNH61mAeuaN2obq0sFLeJCk5ja5jfudYchqzhjWjbnCpfJu3hoNSShVg+5JSeGDM7/g5hG+GN8+3W+xeTzjoCe1KKZXPqgeV4LP+0Zw6l0H/z1Zy8uzljzPaRcNBKaVsEFa1DJ88FMnOxDM8/HkcqRmXPqvRLhoOSillk9vrVmBkt0Ys35XEMzPX47zMd6jsoKeyKqWUjTo3rsqRU64bBVUs5c+I+xoUiG9RazgopZTNhrSsxaFT55jw225uKhPAsDtq212ShoNSShUEL95zK0dOp/LWD9uoVMqfB5pUs7UeDQellCoAHA5hZLeGHD2dyt9nbaBCoD8tb7Hvnip6QFoppQoIf18fPu0bSZ1KgQybupqNCfbdalTDQSmlCpDSAX5MHhhDuRLFGDBpJX8eS7alDg0HpZQqYIJLu241muE09Ju4kqNnUvO9Bg0HpZQqgGpXDGRCv2gOnTrHoEmrSE7N31uNajgopVQBFVmjHB/2asLG/Sd59Is1pGc6823eGg5KKVWAtWkQzJv3h/PzH4k8//VG8utiqXoqq1JKFXC9Ym7m8KlzjFq4g5vK+PO3dvXzfJ665aCUUl7gybvr0iumOj9sOpQvxx90y0EppbyAiPB65zCSUzMp6Z/3b90aDkop5SV8fRyUKZE/O3x0t5JSSikPGg5KKaU85DgcRMRHRNaKyP+s1yEiskJE4kVkpogUs9r9rdfx1vCabtN4wWr/Q0Ta5bQmpZRSOZMbWw5PAlvdXr8NvGeMqQMcBwZZ7YOA41b7e1Y/RKQB0BMIBdoDH4uITy7UpZRS6gblKBxEpBrQERhvvRagFTDL6jIZ6GI972y9xhp+t9W/MzDDGJNqjNkNxAMxOalLKaVUzuR0y2EU8Hcg6zvd5YETxpisk3ATgKrW86rAPgBr+Emrf3b7Jca5gIgMFZE4EYlLTEzMYelKKaUu54bDQUTuBY4YY1bnYj1XZIwZa4yJMsZEVaxo300wlFKqsMvJ9xyaA51E5B4gACgNvA+UFRFfa+ugGrDf6r8fqA4kiIgvUAY45taexX0cpZRSNrjhLQdjzAvGmGrGmJq4Dij/ZIx5EFgMdLW69QO+tZ7PtV5jDf/JuK4gNRfoaZ3NFALUBVbeaF1KKaVyLi++If0cMENE3gDWAhOs9gnA5yISDyThChSMMZtF5EtgC5ABPGqMycyDupRSSl0jya/Lv+a2qKgoExcXZ3cZSinlNURktTEm6lr66jeklVJKedBwUEop5UHDQSmllAcNB6WUUh40HJRSSnnQcFBKKeVBw0EppZQHDQellFIeNByUUkp50HBQSinlQcNBKaWUBw0HpZRSHjQclFJKedBwUEop5UHDQSmllAcNB6WUUh40HJRSSnnQcFBKKeVBw0EppZQHDQellFIeNByUUkp50HBQSinlQcNBKaWUBw0HpZRSHjQclFJKebjhcBCR6iKyWES2iMhmEXnSag8SkQUissP6Wc5qFxH5QETiRWSDiDRxm1Y/q/8OEemX88VSSimVEznZcsgA/mqMaQDEAo+KSAPgeWCRMaYusMh6DdABqGs9hgJjwBUmwAigKRADjMgKFKWUUva44XAwxhw0xqyxnp8GtgJVgc7AZKvbZKCL9bwzMMW4LAfKikhloB2wwBiTZIw5DiwA2t9oXUoppXIuV445iEhNIAJYAQQbYw5agw4BwdbzqsA+t9ESrLbLtSullLJJjsNBRAKBr4GnjDGn3IcZYwxgcjoPt3kNFZE4EYlLTEzMrckqpZS6SI7CQUT8cAXDNGPMN1bzYWt3EdbPI1b7fqC62+jVrLbLtXswxow1xkQZY6IqVqyYk9KVUkpdQU7OVhJgArDVGPOu26C5QNYZR/2Ab93a+1pnLcUCJ63dT/OBtiJSzjoQ3dZqU0opZRPfHIzbHOgDbBSRdVbbP4C3gC9FZBDwJ9DdGvY9cA8QD6QAAwCMMUki8jqwyur3T2NMUg7qUkoplUPiOizgfaKiokxcXJzdZSillNcQkdXGmKhr6avfkFZKKeVBw0EppZQHDQellFIeNByUUkp50HBQSinlQcNBKaWUBw0HpZRSHjQclFJKedBwUEop5UHDQSmllAcNB6WUUh40HJRSSnnQcFBKKeVBw0EppZQHDQellFIeNByUUkp50HBQSinlQcNBKaWUBw0HpZRSHjQclFJKedBwUEop5UHDQSmllAcNB6WUUh40HJRSSnnQcFBKKeVBw0EppZSHAhMOItJeRP4QkXgRed7uepRSqigrEOEgIj7AaKAD0ADoJSIN7K1KKaWKLl+7C7DEAPHGmF0AIjID6Axsye0ZOX//CDLTwOkE4wQDGCfGaf00TjAG43RijAHr4XruxDitn8aA07j1t8aH7OFZfc+Pn9XPbTzInpZrvKzhXDhedn+5qN2AVeaF7WRP371fNgERsZ64GkQExGS3uQ8X3J4LgMMaz31agIg1OUf2dC+clmQ3n3+eNVGxpi3gcHstYs3PuM3HgYi1erKeZy2Lw4HJGg3BiGTXatzmZbI+GomDrA5GztduxG24VasBjMORvVwGt/lZpZusZXWIq7/778/hwFjjuWp2YMSACCZ7HVjzEddrg1i1uJZFHNbz7Brchrs9z1o5xvodZ63+838GgsFk/aKw/lSsaRprfm59jblgOhdM47J9zQV9sueTtd4vmKf79MBp3MaRy0z3otfuC5i93Mathov6XrAs5vy8L1V31vKZrJHOLw6C4CsOfBw++CD4OAQf8cFXXO0OEXxw4IvgIw4cWf2tdtdP8ME1jgOx+rqeW28crpn7+MHNseS1ghIOVYF9bq8TgKZ5MaOPJ9dGxD8vJq1U/jNO9xeXeX4pcpXhuTyeFIidFPbKXlfuMe0Ek3lB2/nYyko59/4Gw2keHVd0wuGaiMhQYCjAzTfffEPTqOi3Ad8T57hwZVx6JVz42cJqu6CP9dOc/8xzcX/3Pp6f3cz5zybmUv3P9zXZdV5Yq2T/wbjP/6KaDBf1cf8Hd/tU7fqMYjW7f8o7/ylU3MdzH561FXBxu9v0soa7f3J1n7/nG4941nrRMOM+uvu05fy4WXO++BPv5X4P5z9Vei6vcXt+pdovP+7llsX9+cWfsi/8PV56PKuvIXvr6uI1fcnOl3F9EXDhGr1av+uZtlyhxvPTzPqduY13DXVccn5Xmq65/HSNnF9Pl343cH/LF882t3WGOb8V5jktwRl4o8F+fQpKOOwHqru9rma1XcAYMxYYCxAVFXW1v5pL6vHRczcymlJKFSkFZVtvFVBXREJEpBjQE5hrc01KKVVkFYgtB2NMhog8BswHfICJxpjNNpellFJFVoEIBwBjzPfA93bXoZRSquDsVlJKKVWAaDgopZTyoOGglFLKg4aDUkopDxoOSimlPIi56jcQCyYRSQT+vMzgCsDRfCwnvxTW5QJdNm+ly+ZdahhjKl5LR68NhysRkThjTJTddeS2wrpcoMvmrXTZCi/draSUUsqDhoNSSikPhTUcxtpdQB4prMsFumzeSpetkCqUxxyUUkrlTGHdclBKKZUDXh0OIlJdRBaLyBYR2SwiT1rtQSKyQER2WD/L2V3rjRIRHxFZKyL/s16HiMgKEYkXkZnWJc69joiUFZFZIrJNRLaKSLPCsN5E5Gnrb3GTiEwXkQBvXmciMlFEjojIJre2S64ncfnAWs4NItLEvsqv7DLL9V/r73GDiMwWkbJuw16wlusPEWlnT9X5y6vDAcgA/mqMaQDEAo+KSAPgeWCRMaYusMh67a2eBLa6vX4beM8YUwc4Dgyypaqcex+YZ4ypDzTCtYxevd5EpCrwBBBljAnDdfn5nnj3OpsEtL+o7XLrqQNQ13oMBcbkU403YhKey7UACDPGNAS2Ay8AWO8pPYFQa5yPRcQn/0q1h1eHgzHmoDFmjfX8NK43mKpAZ2Cy1W0y0MWeCnNGRKoBHYHx1msBWgGzrC5euWwiUgZoCUwAMMakGWNOUDjWmy9QXER8gRLAQbx4nRljfgWSLmq+3HrqDEwxLsuBsiJSOX8qvT6XWi5jzI/GmAzr5XJcd6QE13LNMMakGmN2A/FATL4VaxOvDgd3IlITiABWAMHGmIPWoENAsE1l5dQo4O9A1p3JywMn3P6AE3CFobcJARKBz6xdZuNFpCRevt6MMfuBkcBeXKFwElhN4Vhn7i63nqoC+9z6efOyDgR+sJ4XpuW6ZoUiHEQkEPgaeMoYc8p9mHGdjuV1p2SJyL3AEWPMartryQO+QBNgjDEmAkjmol1I3rjerH3vnXGFXxWgJJ67LgoVb1xPVyMiL+LaZT3N7lrs5PXhICJ+uIJhmjHmG6v5cNbmrPXziF315UBzoJOI7AFm4No18T6uTfWsO/hVA/bbU16OJAAJxpgV1utZuMLC29dba2C3MSbRGJMOfINrPRaGdebucutpP1DdrZ/XLauI9AfuBR4058/z9/rluhFeHQ7WPvgJwFZjzLtug+YC/azn/YBv87u2nDLGvGCMqWaMqYnrYNhPxpgHgcVAV6ubty7bIWCfiNSzmu4GtuD9620vECsiJay/zazl8vp1dpHLrae5QF/rrKVY4KTb7qcCT0Ta49qN28kYk+I2aC7QU0T8RSQE1wH3lXbUmK+MMV77AG7HtUm7AVhnPe7BtW9+EbADWAgE2V1rDpfzTuB/1vNauP4w44GvAH+767vBZWoMxFnrbg5QrjCsN+A1YBuwCfgc8PfmdQZMx3X8JB3XFt+gy60nQIDRwE5gI66ztmxfhutYrnhcxxay3ks+cev/orVcfwAd7K4/Px76DWmllFIevHq3klJKqbyh4aCUUsqDhoNSSikPGg5KKaU8aDgopZTyoOGglFLKg4aDUkopDxoOSimlPPw//dxRNyxb2QMAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] @@ -5375,22 +1451,22 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "" + "" ] }, - "execution_count": 9, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { - "image/png": "\n", + "image/png": "\n", "text/plain": [ "
" ] @@ -5407,7 +1483,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 16, "metadata": {}, "outputs": [], "source": [ @@ -5416,7 +1492,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 17, "metadata": {}, "outputs": [ { @@ -5455,118 +1531,118 @@ " \n", " \n", " 0\n", - " YOU HAD ONE JOB! - with editor Brad1\n", - " 5,292,299.0\n", - " 385,260.0\n", - " 4,080.0\n", + " PyCharm/IntelliJ fast and auto change of the color theme\n", + " 41.0\n", + " 0.0\n", " 0.0\n", - " 29,859.0\n", - " https://www.youtube.com/watch?v=B67OBHNCopk\n", - " [SATIRE, reddit, you had one job, onejob]\n", - " <a href=\"https://www.youtube.com/watch?v=B67OBHNCopk\">XXXXX</a>\n", - " YOU HAD ONE JOB! - w\n", + " 0.0\n", + " 2.0\n", + " https://www.youtube.com/embed/SsX9Fl958W0\n", + " https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg\n", + " <a href=\"https://www.youtube.com/embed/SsX9Fl958W0\">XXXXX</a>\n", + " PyCharm/IntelliJ fas\n", " \n", " \n", " 1\n", - " Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE\n", - " 5,358,149.0\n", - " 378,460.0\n", - " 3,950.0\n", + " How to add weather desklet to Linux Mint 19\n", + " 291.0\n", + " 0.0\n", " 0.0\n", - " 38,075.0\n", - " https://www.youtube.com/watch?v=kLM_9gBZIqY\n", - " [SATIRE]\n", - " <a href=\"https://www.youtube.com/watch?v=kLM_9gBZIqY\">XXXXX</a>\n", - " Demi Lovato DID a WH\n", + " 0.0\n", + " 0.0\n", + " https://www.youtube.com/embed/-FPY_e0BdJs\n", + " https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg\n", + " <a href=\"https://www.youtube.com/embed/-FPY_e0BdJs\">XXXXX</a>\n", + " How to add weather d\n", " \n", " \n", " 2\n", - " We broke another WORLD RECORD!\n", - " 8,557,324.0\n", - " 595,577.0\n", - " 7,899.0\n", + " How to easy integrate Google Calendar to Desktop for Linux Mint\n", + " 226.0\n", + " 1.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 53,664.0\n", - " https://www.youtube.com/watch?v=d1tAfXKc7-c\n", - " [SATIRE]\n", - " <a href=\"https://www.youtube.com/watch?v=d1tAfXKc7-c\">XXXXX</a>\n", - " We broke another WOR\n", + " https://www.youtube.com/embed/2evIujisdD0\n", + " https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg\n", + " <a href=\"https://www.youtube.com/embed/2evIujisdD0\">XXXXX</a>\n", + " How to easy integrat\n", " \n", " \n", " 3\n", - " FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~\n", - " 3,609,152.0\n", - " 218,517.0\n", - " 3,125.0\n", + " Pandas use a list of values to select rows from a column\n", + " 45.0\n", + " 3.0\n", " 0.0\n", - " 17,595.0\n", - " https://www.youtube.com/watch?v=bMLdNrB5hAo\n", - " [SATIRE]\n", - " <a href=\"https://www.youtube.com/watch?v=bMLdNrB5hAo\">XXXXX</a>\n", - " FLOSSING in VR with\n", + " 0.0\n", + " 10.0\n", + " https://www.youtube.com/embed/jlSbo5wmTPQ\n", + " https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg\n", + " <a href=\"https://www.youtube.com/embed/jlSbo5wmTPQ\">XXXXX</a>\n", + " Pandas use a list of\n", " \n", " \n", " 4\n", - " Don't Laugh Challenge, NEW SEASON!!!!!\n", - " 5,888,349.0\n", - " 569,878.0\n", - " 7,822.0\n", + " Pandas count and percentage by value for a column\n", + " 63.0\n", + " 3.0\n", + " 0.0\n", + " 0.0\n", " 0.0\n", - " 29,373.0\n", - " https://www.youtube.com/watch?v=Zgm_iM3f_ME\n", - " [SATIRE]\n", - " <a href=\"https://www.youtube.com/watch?v=Zgm_iM3f_ME\">XXXXX</a>\n", - " Don't Laugh Challeng\n", + " https://www.youtube.com/embed/P5pxJkv71BU\n", + " https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg\n", + " <a href=\"https://www.youtube.com/embed/P5pxJkv71BU\">XXXXX</a>\n", + " Pandas count and per\n", " \n", " \n", "\n", "" ], "text/plain": [ - " title Views \\\n", - "0 YOU HAD ONE JOB! - with editor Brad1 5,292,299.0 \n", - "1 Demi Lovato DID a WHAT?! - YouTube Admits MASSIVE OPSIE 5,358,149.0 \n", - "2 We broke another WORLD RECORD! 8,557,324.0 \n", - "3 FLOSSING in VR with Green Man. ~ UNSEEN FOOTAGE ~ 3,609,152.0 \n", - "4 Don't Laugh Challenge, NEW SEASON!!!!! 5,888,349.0 \n", + " title Views \\\n", + "0 PyCharm/IntelliJ fast and auto change of the color theme 41.0 \n", + "1 How to add weather desklet to Linux Mint 19 291.0 \n", + "2 How to easy integrate Google Calendar to Desktop for Linux Mint 226.0 \n", + "3 Pandas use a list of values to select rows from a column 45.0 \n", + "4 Pandas count and percentage by value for a column 63.0 \n", "\n", - " Like Dislike Favorite Comment \\\n", - "0 385,260.0 4,080.0 0.0 29,859.0 \n", - "1 378,460.0 3,950.0 0.0 38,075.0 \n", - "2 595,577.0 7,899.0 0.0 53,664.0 \n", - "3 218,517.0 3,125.0 0.0 17,595.0 \n", - "4 569,878.0 7,822.0 0.0 29,373.0 \n", + " Like Dislike Favorite Comment \\\n", + "0 0.0 0.0 0.0 2.0 \n", + "1 0.0 0.0 0.0 0.0 \n", + "2 1.0 0.0 0.0 0.0 \n", + "3 3.0 0.0 0.0 10.0 \n", + "4 3.0 0.0 0.0 0.0 \n", "\n", - " videoID \\\n", - "0 https://www.youtube.com/watch?v=B67OBHNCopk \n", - "1 https://www.youtube.com/watch?v=kLM_9gBZIqY \n", - "2 https://www.youtube.com/watch?v=d1tAfXKc7-c \n", - "3 https://www.youtube.com/watch?v=bMLdNrB5hAo \n", - "4 https://www.youtube.com/watch?v=Zgm_iM3f_ME \n", + " videoID \\\n", + "0 https://www.youtube.com/embed/SsX9Fl958W0 \n", + "1 https://www.youtube.com/embed/-FPY_e0BdJs \n", + "2 https://www.youtube.com/embed/2evIujisdD0 \n", + "3 https://www.youtube.com/embed/jlSbo5wmTPQ \n", + "4 https://www.youtube.com/embed/P5pxJkv71BU \n", "\n", - " tags \\\n", - "0 [SATIRE, reddit, you had one job, onejob] \n", - "1 [SATIRE] \n", - "2 [SATIRE] \n", - "3 [SATIRE] \n", - "4 [SATIRE] \n", + " tags \\\n", + "0 https://i.ytimg.com/vi/SsX9Fl958W0/hqdefault.jpg \n", + "1 https://i.ytimg.com/vi/-FPY_e0BdJs/hqdefault.jpg \n", + "2 https://i.ytimg.com/vi/2evIujisdD0/hqdefault.jpg \n", + "3 https://i.ytimg.com/vi/jlSbo5wmTPQ/hqdefault.jpg \n", + "4 https://i.ytimg.com/vi/P5pxJkv71BU/hqdefault.jpg \n", "\n", - " nameurl \\\n", - "0 XXXXX \n", - "1 XXXXX \n", - "2 XXXXX \n", - "3 XXXXX \n", - "4 XXXXX \n", + " nameurl \\\n", + "0 XXXXX \n", + "1 XXXXX \n", + "2 XXXXX \n", + "3 XXXXX \n", + "4 XXXXX \n", "\n", " title_short \n", - "0 YOU HAD ONE JOB! - w \n", - "1 Demi Lovato DID a WH \n", - "2 We broke another WOR \n", - "3 FLOSSING in VR with \n", - "4 Don't Laugh Challeng " + "0 PyCharm/IntelliJ fas \n", + "1 How to add weather d \n", + "2 How to easy integrat \n", + "3 Pandas use a list of \n", + "4 Pandas count and per " ] }, - "execution_count": 11, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } @@ -5577,7 +1653,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 18, "metadata": {}, "outputs": [], "source": [ @@ -5586,22 +1662,22 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "" + "" ] }, - "execution_count": 13, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEICAYAAABYoZ8gAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAEg5JREFUeJzt3X+w3XWd3/HnCxLNYiJqcscpBgwrO6tZQiJegYUNkOIAamvKLKtkgK4Bhtku9de0WWx1klb+2Z3aXXbcAqaYpu46BGWpQxcDYbpOSY1YbgKTROKG7ZLVS2hzN/HHFkUSffePe2IvIfdHck9ycvN5PmYY7/l8vud734cZn/nyPefmpqqQJLXjlF4PIEk6vgy/JDXG8EtSYwy/JDXG8EtSYwy/JDXmhA1/kjVJ9iTZPoFjz0ry9SRPJdma5H3HY0ZJmopO2PADa4GrJ3jsp4EvV9U7geuAu47VUJI01Z2w4a+qx4F9I9eSvC3JI0k2J9mY5O0HDwde3/n6dGD3cRxVkqaUab0e4AitBn6nqp5NciHDV/b/EPg3wIYkHwFeB7yndyNK0oltyoQ/yUzgYuArSQ4uv7bzv8uAtVX175P8OvCnSc6tqp/3YFRJOqFNmfAzfFvqB1W16DB7N9N5P6CqvplkBjAH2HMc55OkKeGEvcd/qKr6EfBckt8CyLCFne3vAld01t8BzACGejKoJJ3gcqL+7ZxJ7gMuZ/jK/f8Aq4C/BO4G/gEwHVhXVZ9JMh/4j8BMht/o/b2q2tCLuSXpRHfChl+SdGxMmVs9kqTuOCHf3J0zZ07Nmzev12NI0pSxefPmv6uqvokce0KGf968eQwMDPR6DEmaMpL87USP9VaPJDXG8EtSYwy/JDXmhLzHL0mj2b9/P4ODg7z00ku9HqUnZsyYwdy5c5k+ffpRn8PwS5pSBgcHmTVrFvPmzWPE39vVhKpi7969DA4OcvbZZx/1ebzVI2lKeemll5g9e3Zz0QdIwuzZsyf9XzuGX9KU02L0D+rGazf8ktQYwy9JR2DJkiU8+uijr1i78847Wb58Oddee22Ppjoyhl+SjsCyZctYt27dK9bWrVvH8uXLeeCBB3o01ZEx/JJ0BK699loefvhhXn75ZQB27drF7t27OfPMMzn33HMB+NnPfsaKFSt497vfzXnnncfnP/95AG677TYeeughAK655hpuuukmANasWcOnPvUpXnzxRd7//vezcOFCzj33XO6///5j8hr8OKekKevf/tdv88zuH3X1nPPPeD2r/vGvjbr/pje9iQsuuID169ezdOlS1q1bxwc/+MFXvOn6hS98gdNPP50nn3ySn/70p1xyySVceeWVLF68mI0bN/KBD3yA559/nhdeeAGAjRs3ct111/HII49wxhln8PDDDwPwwx/+sKuv7SCv+CXpCI283bNu3TqWLVv2iv0NGzbwxS9+kUWLFnHhhReyd+9enn322V+E/5lnnmH+/Pm8+c1v5oUXXuCb3/wmF198MQsWLOCxxx7j9ttvZ+PGjZx++unHZH6v+CVNWWNdmR9LS5cu5ROf+ARbtmzhxz/+Me9617vYtWvXL/aris997nNcddVVr3ruD37wAx555BEuvfRS9u3bx5e//GVmzpzJrFmzmDVrFlu2bOFrX/san/70p7niiitYuXJl1+f3il+SjtDMmTNZsmQJN91006uu9gGuuuoq7r77bvbv3w/Azp07efHFFwG46KKLuPPOO7n00ktZvHgxn/3sZ1m8eDEAu3fv5rTTTuOGG25gxYoVbNmy5ZjM7xW/JB2FZcuWcc0117zqEz4At9xyC7t27eL888+nqujr6+OrX/0qAIsXL2bDhg2cc845vPWtb2Xfvn2/CP+2bdtYsWIFp5xyCtOnT+fuu+8+JrOfkL9zt7+/v/xFLJIOZ8eOHbzjHe/o9Rg9dbh/B0k2V1X/RJ7vrR5Jaozhl6TGGH5JU86JeIv6eOnGazf8kqaUGTNmsHfv3ibjf/Dv458xY8akzuOneiRNKXPnzmVwcJChoaFej9ITB38D12QYfklTyvTp0yf126fkrR5Jao7hl6TGTCj8SdYk2ZNk+yj7S5NsTfJ0koEkvzFi77eTPNv557e7Nbgk6ehM9Ip/LXD1GPv/DVhYVYuAm4B7AZK8CVgFXAhcAKxK8sajnlaSNGkTCn9VPQ7sG2P//9b//2zV64CDX18FPFZV+6rq+8BjjP0HiCTpGOvaPf4k1yT5DvAww1f9AG8BvjfisMHO2uGef2vnNtFAqx/TkqTjoWvhr6r/UlVvB/4JcMdRPH91VfVXVX9fX1+3xpIkHaLrn+rp3Bb65SRzgOeBM0dsz+2sSZJ6pCvhT3JOOr9wMsn5wGuBvcCjwJVJ3th5U/fKzpokqUcm9JO7Se4DLgfmJBlk+JM60wGq6h7gN4F/mmQ/8BPgQ503e/cluQN4snOqz1TVqG8SS5KOPX8RiySdBPxFLJKkURl+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWqM4Zekxhh+SWrMuOFPsibJniTbR9m/PsnWJNuSbEqycMTeJ5J8O8n2JPclmdHN4SVJR24iV/xrgavH2H8OuKyqFgB3AKsBkrwF+CjQX1XnAqcC101qWknSpE0b74CqejzJvDH2N414+AQw95Dz/1KS/cBpwO6jG1OS1C3dvsd/M7AeoKqeBz4LfBd4AfhhVW3o8veTJB2hroU/yRKGw3975/EbgaXA2cAZwOuS3DDG829NMpBkYGhoqFtjSZIO0ZXwJzkPuBdYWlV7O8vvAZ6rqqGq2g88CFw82jmqanVV9VdVf19fXzfGkiQdxqTDn+QshqN+Y1XtHLH1XeCiJKclCXAFsGOy30+SNDnjvrmb5D7gcmBOkkFgFTAdoKruAVYCs4G7hvvOgc6V+7eSPABsAQ4AT9H5xI8kqXdSVb2e4VX6+/trYGCg12NI0pSRZHNV9U/kWH9yV5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaY/glqTGGX5IaM274k6xJsifJ9lH2r0+yNcm2JJuSLByx94YkDyT5TpIdSX69m8NLko7cRK741wJXj7H/HHBZVS0A7gBWj9j7Y+CRqno7sBDYcZRzSpK6ZNp4B1TV40nmjbG/acTDJ4C5AElOBy4FPtw57mXg5aMfVZLUDd2+x38zsL7z9dnAEPCfkjyV5N4krxvtiUluTTKQZGBoaKjLY0mSDupa+JMsYTj8t3eWpgHnA3dX1TuBF4FPjvb8qlpdVf1V1d/X19etsSRJh+hK+JOcB9wLLK2qvZ3lQWCwqr7VefwAw38QSJJ6aNLhT3IW8CBwY1XtPLheVf8b+F6SX+0sXQE8M9nvJ0manHHf3E1yH3A5MCfJILAKmA5QVfcAK4HZwF1JAA5UVX/n6R8BvpTkNcDfAMu7/QIkSUdmIp/qWTbO/i3ALaPsPQ30H25PktQb/uSuJDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDXG8EtSYwy/JDVm3PAnWZNkT5Lto+xfn2Rrkm1JNiVZeMj+qUmeSvIX3RpaknT0JnLFvxa4eoz954DLqmoBcAew+pD9jwE7jmo6SVLXjRv+qnoc2DfG/qaq+n7n4RPA3IN7SeYC7wfuneSckqQu6fY9/puB9SMe3wn8HvDz8Z6Y5NYkA0kGhoaGujyWJOmgroU/yRKGw3975/E/AvZU1eaJPL+qVldVf1X19/X1dWssSdIhpnXjJEnOY/h2znuram9n+RLgA0neB8wAXp/kz6rqhm58T0nS0Zn0FX+Ss4AHgRuraufB9ar6V1U1t6rmAdcBf2n0Jan3xr3iT3IfcDkwJ8kgsAqYDlBV9wArgdnAXUkADlRV/7EaWJI0OamqXs/wKv39/TUwMNDrMSRpykiyeaIX3f7kriQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmMMvyQ1xvBLUmPGDX+SNUn2JNk+yv71SbYm2ZZkU5KFnfUzk3w9yTNJvp3kY90eXpJ05CZyxb8WuHqM/eeAy6pqAXAHsLqzfgD4F1U1H7gIuC3J/EnMKknqgnHDX1WPA/vG2N9UVd/vPHwCmNtZf6GqtnS+/ntgB/CWSU8sSZqUbt/jvxlYf+hiknnAO4Fvdfn7SZKO0LRunSjJEobD/xuHrM8E/hz4eFX9aIzn3wrcCnDWWWd1ayxJ0iG6csWf5DzgXmBpVe0dsT6d4eh/qaoeHOscVbW6qvqrqr+vr68bY0mSDmPS4U9yFvAgcGNV7RyxHuALwI6q+sPJfh9JUneMe6snyX3A5cCcJIPAKmA6QFXdA6wEZgN3DbeeA1XVD1wC3AhsS/J053T/uqq+1u0XIUmauHHDX1XLxtm/BbjlMOv/A8jRjyZJOhb8yV1Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jaozhl6TGGH5Jasy44U+yJsmeJNtH2b8+ydYk25JsSrJwxN7VSf4qyV8n+WQ3B5ckHZ2JXPGvBa4eY/854LKqWgDcAawGSHIq8B+A9wLzgWVJ5k9qWknSpI0b/qp6HNg3xv6mqvp+5+ETwNzO1xcAf11Vf1NVLwPrgKWTnFeSNEndvsd/M7C+8/VbgO+N2BvsrB1WkluTDCQZGBoa6vJYkqSDuhb+JEsYDv/tR/P8qlpdVf1V1d/X19etsSRJh5jWjZMkOQ+4F3hvVe3tLD8PnDnisLmdNUlSD036ij/JWcCDwI1VtXPE1pPAryQ5O8lrgOuAhyb7/SRJkzPuFX+S+4DLgTlJBoFVwHSAqroHWAnMBu5KAnCgc8vmQJJ/DjwKnAqsqapvH5NXIUmasFRVr2d4lf7+/hoYGOj1GJI0ZSTZXFX9EznWn9yVpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqjOGXpMYYfklqTKqq1zO8SpIh4G97PccRmgP8Xa+HOM58zW3wNU8Nb62qvokceEKGfypKMlBV/b2e43jyNbfB13zy8VaPJDXG8EtSYwx/96zu9QA94Gtug6/5JOM9fklqjFf8ktQYwy9JjTH8ktQYwy9JjTH8ktQYwy9JjTH8ktQYw6+TRpI3JPndztdnJHmg8/WiJO8bcdyHk/xJl77n5Un+YpLn+HCSM7oxjzQRhl8nkzcAvwtQVbur6trO+iLgfaM+q4eSnAp8GDD8Om4Mv04mvw+8LcnTSb6SZHuS1wCfAT7UWf/QyCck6Uvy50me7PxzyWgnT3JZ5xxPJ3kqyazO1swkDyT5TpIvJUnn+Cs6x21LsibJazvru5L8QZItwDKgH/hS57y/dAz+vUivYPh1Mvkk8L+qahGwAqCqXgZWAvdX1aKquv+Q5/wx8EdV9W7gN4F7xzj/vwRu65x/MfCTzvo7gY8D84FfBi5JMgNYC3yoqhYA04B/NuJce6vq/Kr6M2AAuL4z30+QjjHDr9a9B/iTJE8DDwGvTzJzlGO/Afxhko8Cb6iqA531/1lVg1X1c+BpYB7wq8BzVbWzc8x/Bi4dca5D/wCSjptpvR5A6rFTgIuq6qXxDqyq30/yMMPvF3wjyVWdrZ+OOOxnTOz/Vy8e8aRSl3jFr5PJ3wOzjmAdYAPwkYMPkiwa7eRJ3lZV26rqD4AngbePMctfAfOSnNN5fCPw349wbumYMPw6aVTVXoavxLcD/27E1teB+Yd7cxf4KNCfZGuSZ4DfGeNbfLzzhvFWYD+wfoxZXgKWA19Jsg34OXDPKIevBe7xzV0dL/59/JLUGK/4JakxvrkrHSLJcuBjhyx/o6pu68U8Urd5q0eSGuOtHklqjOGXpMYYfklqjOGXpMb8P+PZ1UPntR/oAAAAAElFTkSuQmCC\n", + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEBCAYAAACT92m7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAE2hJREFUeJzt3X+0XWV95/H3BxOMNgEE77CEgKGFoUZQRo6IMqHiD4J11RAblHQcMaBMR5xOdZWBLl3YH/9Ix1HGH6WyhCFMuwiUmZF0IgSG5ZpkudByk1KTkCJRqV5AExPREcvv7/xxd5zDfRJyk5xw5ibv11pnnb2/+9nPfZ6slXzu3s8+OakqJEnqd9CwByBJ+v+P4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqTGtGEPYE+9/OUvrzlz5gx7GJI0paxZs+bHVTWyq3ZTNhzmzJnD6OjosIchSVNKkn+cTDtvK0mSGoaDJKlhOEiSGlN2zUGSduapp55ibGyMxx9/fNhDGZoZM2Ywe/Zspk+fvkfnGw6S9jtjY2PMmjWLOXPmkGTYw3nBVRVbt25lbGyM4447bo/68LaSpP3O448/zhFHHHFABgNAEo444oi9unIyHCTtlw7UYNhub+dvOEiSGoaDJA3YWWedxcqVK59Tu+qqq1iyZAmLFi0a0qh2j+EgSQO2ePFili1b9pzasmXLWLJkCbfccsuQRrV7DAdJGrBFixaxYsUKnnzySQAefPBBHn74YY455hhOOukkAJ555hkuvfRSXv/61/Oa17yGL33pSwBccsklLF++HICFCxdy4YUXAnDdddfx8Y9/nMcee4x3vvOdvPa1r+Wkk07ipptu2idz8FFWSfu1P/6bDdz38M8G2ufcow7hk7/16p0eP/zwwznttNO47bbbWLBgAcuWLeM973nPcxaJr732Wg499FDuuecennjiCc444wzOPvts5s2bx+rVq3nXu97FQw89xCOPPALA6tWrOf/887n99ts56qijWLFiBQA//elPBzq37bxykKR9oP/W0rJly1i8ePFzjt9xxx3ccMMNnHLKKbzhDW9g69atPPDAA78Mh/vuu4+5c+dy5JFH8sgjj3D33Xfzpje9iZNPPpk777yTyy67jNWrV3PooYfuk/F75SBpv/Z8v+HvSwsWLOCjH/0oa9eu5Re/+AWnnnoqDz744C+PVxWf//znmT9/fnPuo48+yu23386ZZ57Jtm3buPnmm5k5cyazZs1i1qxZrF27lq9+9at84hOf4K1vfStXXHHFwMfvlYMk7QMzZ87krLPO4sILL2yuGgDmz5/P1VdfzVNPPQXAt7/9bR577DEATj/9dK666irOPPNM5s2bx6c//WnmzZsHwMMPP8xLX/pS3ve+93HppZeydu3afTJ+rxwkaR9ZvHgxCxcubJ5cAvjgBz/Igw8+yOte9zqqipGREb7yla8AMG/ePO644w6OP/54XvnKV7Jt27ZfhsO6deu49NJLOeigg5g+fTpXX331Phl7qmqfdLyv9Xq98st+JO3Ixo0bedWrXjXsYQzdjv4ckqypqt6uzvW2kiSpYThIkhqGg6T90lS9ZT4oezt/w0HSfmfGjBls3br1gA2I7d/nMGPGjD3uw6eVJO13Zs+ezdjYGFu2bBn2UIZm+zfB7SnDQdJ+Z/r06Xv8DWga520lSVLDcJAkNQwHSVLDcJAkNQwHSVJjl+GQ5Lokm5Os76udl2RDkmeT9Prq05MsTbIuycYkf9h37Jwk9yfZlOTyvvpxSb7Z1W9KcvAgJyhJ2n2TuXK4HjhnQm098G5g1YT6ecCLq+pk4FTg3ySZk+RFwBeBdwBzgcVJ5nbnXAl8tqqOB34CXLQnE5EkDc4uw6GqVgHbJtQ2VtX9O2oO/EqSacBLgCeBnwGnAZuq6rtV9SSwDFiQ8e/Mewuw/Ru3lwLn7ulkJEmDMeg1h1uAx4BHgO8Dn66qbcDRwA/62o11tSOAR6vq6Ql1SdIQDfoT0qcBzwBHAS8DVif5X4PqPMnFwMUAxx577KC6lSRNMOgrh98Bbq+qp6pqM/B1oAc8BBzT1252V9sKHNbdhuqv71BVXVNVvarqjYyMDHjokqTtBh0O32d8DYEkvwKcDvwDcA9wQvdk0sHA+cDyGv8vE78GLOrOvwC4dcBjkiTtpsk8ynojcDdwYpKxJBclWZhkDHgjsCLJyq75F4GZSTYwHgj/paq+1a0pfARYCWwEbq6qDd05lwEfS7KJ8TWIawc5QUnS7vM7pCXpAOJ3SEuS9pjhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElq7DIcklyXZHOS9X2185JsSPJskt6E9q9Jcnd3fF2SGV391G5/U5LPJUlXPzzJnUke6N5fNuhJSpJ2z2SuHK4HzplQWw+8G1jVX0wyDfhL4Her6tXAm4GnusNXAx8CTuhe2/u8HLirqk4A7ur2JUlDtMtwqKpVwLYJtY1Vdf8Omp8NfKuq/r5rt7WqnknyCuCQqvpGVRVwA3Bud84CYGm3vbSvLkkakkGvOfxzoJKsTLI2yX/o6kcDY33txroawJFV9Ui3/UPgyAGPSZK0m6btg/7+JfB64BfAXUnWAD+dzMlVVUlqZ8eTXAxcDHDsscfu/WglSTs06CuHMWBVVf24qn4BfBV4HfAQMLuv3eyuBvCj7rYT3fvmnXVeVddUVa+qeiMjIwMeuiRpu0GHw0rg5CQv7RanfwO4r7tt9LMkp3dPKb0fuLU7ZzlwQbd9QV9dkjQkk3mU9UbgbuDEJGNJLkqyMMkY8EZgRZKVAFX1E+AzwD3AvcDaqlrRdfVh4MvAJuA7wG1d/VPA25M8ALyt25ckDVHGHx6aenq9Xo2Ojg57GJI0pSRZU1W9XbXzE9KSpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElqGA6SpIbhIElq7DIcklyXZHOS9X2185JsSPJskt4Ozjk2yc+T/EFf7Zwk9yfZlOTyvvpxSb7Z1W9KcvAgJiZJ2nOTuXK4HjhnQm098G5g1U7O+Qxw2/adJC8Cvgi8A5gLLE4ytzt8JfDZqjoe+Alw0WQHL0naN3YZDlW1Ctg2obaxqu7fUfsk5wLfAzb0lU8DNlXVd6vqSWAZsCBJgLcAt3TtlgLn7vYsJEkDNdA1hyQzgcuAP55w6GjgB337Y13tCODRqnp6Qn1n/V+cZDTJ6JYtWwY3cEnScwx6QfqPGL9F9PMB9wtAVV1TVb2q6o2MjOyLHyFJAqYNuL83AIuS/BlwGPBskseBNcAxfe1mAw8BW4HDkkzrrh621yVJQzTQcKiqedu3k/wR8POq+kKSacAJSY5j/B//84HfqapK8jVgEePrEBcAtw5yTJKk3TeZR1lvBO4GTkwyluSiJAuTjAFvBFYkWfl8fXRXBR8BVgIbgZuravuC9WXAx5JsYnwN4to9n44kaRBSVcMewx7p9Xo1Ojo67GFI0pSSZE1VNZ9Pm8hPSEuSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGrsMhyTXJdmcZH1f7bwkG5I8m6TXV397kjVJ1nXvb+k7dmpX35Tkc0nS1Q9PcmeSB7r3lw16kpKk3TOZK4frgXMm1NYD7wZWTaj/GPitqjoZuAD4r33HrgY+BJzQvbb3eTlwV1WdANzV7UuShmiX4VBVq4BtE2obq+r+HbT9u6p6uNvdALwkyYuTvAI4pKq+UVUF3ACc27VbACzttpf21SVJQ7Iv1xx+G1hbVU8ARwNjfcfGuhrAkVX1SLf9Q+DIfTgmSdIkTNsXnSZ5NXAlcPbunFdVlaSep9+LgYsBjj322L0aoyRp5wZ+5ZBkNvA/gPdX1Xe68kPA7L5ms7sawI+6205075t31ndVXVNVvarqjYyMDHrokqTOQMMhyWHACuDyqvr69np32+hnSU7vnlJ6P3Brd3g544vXdO+3Ikkaqsk8ynojcDdwYpKxJBclWZhkDHgjsCLJyq75R4DjgSuS3Nu9/ll37MPAl4FNwHeA27r6p4C3J3kAeFu3L0kaoow/PDT19Hq9Gh0dHfYwJGlKSbKmqnq7aucnpCVJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktQwHCRJDcNBktTYZTgkuS7J5iTr+2rnJdmQ5NkkvQnt/zDJpiT3J5nfVz+nq21Kcnlf/bgk3+zqNyU5eFCTkyTtmclcOVwPnDOhth54N7Cqv5hkLnA+8OrunD9P8qIkLwK+CLwDmAss7toCXAl8tqqOB34CXLRnU5EkDcouw6GqVgHbJtQ2VtX9O2i+AFhWVU9U1feATcBp3WtTVX23qp4ElgELkgR4C3BLd/5S4Nw9no0kaSAGveZwNPCDvv2xrraz+hHAo1X19IS6JGmIptSCdJKLk4wmGd2yZcuwhyNJ+61Bh8NDwDF9+7O72s7qW4HDkkybUN+hqrqmqnpV1RsZGRnowCVJ/8+gw2E5cH6SFyc5DjgB+FvgHuCE7smkgxlftF5eVQV8DVjUnX8BcOuAxyRJ2k2TeZT1RuBu4MQkY0kuSrIwyRjwRmBFkpUAVbUBuBm4D7gduKSqnunWFD4CrAQ2Ajd3bQEuAz6WZBPjaxDXDnaKkqTdlfFf3qeeXq9Xo6Ojwx6GJE0pSdZUVW9X7abUgrQk6YVhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGpMKhyTXJdmcZH1f7fAkdyZ5oHt/WVc/NMnfJPn7JBuSLOk754Ku/QNJLuirn5pkXZJNST6XJIOcpCRp90z2yuF64JwJtcuBu6rqBOCubh/gEuC+qnot8GbgPyU5OMnhwCeBNwCnAZ/cHijA1cCHgBO618SfJUl6AU0qHKpqFbBtQnkBsLTbXgqcu705MKv77X9md97TwHzgzqraVlU/Ae4EzknyCuCQqvpGVRVwQ19fkqQhmLYX5x5ZVY902z8Ejuy2vwAsBx4GZgHvrapnkxwN/KDv/DHg6O41toO6JGlIBrIg3f3GX93ufOBe4CjgFOALSQ4ZxM9JcnGS0SSjW7ZsGUSXkqQd2Jtw+FF3S4jufXNXXwL89xq3Cfge8OvAQ8AxfefP7moPddsT642quqaqelXVGxkZ2YuhS5Kez96Ew3Jg+xNHFwC3dtvfB94KkORI4ETgu8BK4OwkL+sWos8GVna3pn6W5PRuneL9fX1JkoZgUmsOSW5k/MmjlycZY/ypo08BNye5CPhH4D1d8z8Frk+yDghwWVX9uOvnT4F7unZ/UlXbF7k/zPgTUS8BbutekqQhyfhywdTT6/VqdHR02MOQpCklyZqq6u2qnZ+QliQ1DAdJUsNwkCQ1DAdJUsNwkCQ1puzTSkm2MP4I7VTycuDHwx7EC8w5Hxic89Txyqra5aeIp2w4TEVJRifzCNn+xDkfGJzz/sfbSpKkhuEgSWoYDi+sa4Y9gCFwzgcG57yfcc1BktTwykGS1DAcJEkNw0GS1DAcJEkNw0GS1DAcJEkNw0EHlCSHJflwt31Uklu67VOS/GZfuw8k+cKAfuabk/zPvezjA0mOGsR4pMkwHHSgOYzx7yynqh6uqkVd/RTgN3d61hAleRHwAcBw0AvGcNCB5lPAryW5N8lfJ1mf5GDgT4D3dvX39p+QZCTJf0tyT/c6Y2edJ/mNro97k/xdklndoZlJbknyD0n+Kkm69m/t2q1Lcl2SF3f1B5NcmWQtsBjoAX/V9fuSffDnIj2H4aADzeXAd6rqFOBSgKp6ErgCuKmqTqmqmyac85+Bz1bV64HfBr78PP3/AXBJ1/884J+6+r8Afh+YC/wqcEaSGcD1wHur6mRgGvBv+/raWlWvq6q/BEaBf9WN75+Q9jHDQdq1twFfSHIvsBw4JMnMnbT9OvCZJL8HHFZVT3f1v62qsap6FrgXmAOcCHyvqr7dtVkKnNnX18SQkl4w04Y9AGkKOAg4vaoe31XDqvpUkhWMr198Pcn87tATfc2eYXJ/9x7b7ZFKA+KVgw40/weYtRt1gDuAf7d9J8kpO+s8ya9V1bqquhK4B/j15xnL/cCcJMd3+/8a+N+7OW5pnzAcdECpqq2M/0a/HviPfYe+Bszd0YI08HtAL8m3ktwH/O7z/Ijf7xa5vwU8Bdz2PGN5HFgC/HWSdcCzwF/spPn1wF+4IK0Xiv9ltySp4ZWDJKnhgrS0B5IsAf79hPLXq+qSYYxHGjRvK0mSGt5WkiQ1DAdJUsNwkCQ1DAdJUsNwkCQ1/i875WAmq/t/RQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] @@ -5618,7 +1694,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 20, "metadata": {}, "outputs": [], "source": [ @@ -5627,12 +1703,12 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 21, "metadata": {}, "outputs": [ { "data": { - "image/png": "\n", + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEPCAYAAACp/QjLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAFtNJREFUeJzt3X2QHXW95/H3dzJgRCAJmLCQoUjKGyQIdxGnAEs3UnAJEHIXMMrCjkuASOq6oCArgheRgEaEuy5yhQsCiUnQJTwYTAwRluVJLrU8DIIhErKJgszkInniQaCCwH73j+mMEzqBOH0yPUner6qp6f72r/t8TxXhM939O30iM5EkqaemuhuQJPU/hoMkqcRwkCSVGA6SpBLDQZJUYjhIkkoMB0lSieEgSSoxHCRJJc11N9BbH/7wh3PEiBF1tyFJW4zHH398VWYO3ZSxW2w4jBgxgvb29rrbkKQtRkT8YVPHellJklRiOEiSSgwHSVLJFnvPQZKqeuutt+js7GTt2rV1t9JQAwcOpKWlhe22267XxzAcJG2zOjs72WmnnRgxYgQRUXc7DZGZrF69ms7OTkaOHNnr43hZSdI2a+3atey6665bTTAARAS77rpr5bMhw0HSNm1rCoZ1GvGeDAdJUon3HMSI8++ouwUAnvveMXW3oG1co/8tbOp/01deeSXXX389mcnpp5/O2Wefza233sqUKVNYvHgxjz76KK2trQ3t7f145iBJNVq0aBHXX389jz76KL/5zW+YP38+y5YtY7/99mPOnDmMGTOmlr4MB0mq0eLFizn44IPZYYcdaG5u5jOf+Qxz5sxh9OjRfPSjH62tL8NBkmq033778eCDD7J69WreeOMNFixYQEdHR91tec9Bkuo0evRozjvvPMaOHcuHPvQhDjjgAAYMGFB3W545SFLdJk2axOOPP86vfvUrhgwZwt577113S545SFLdVqxYwbBhw3j++eeZM2cODz/8cN0tGQ6StE5d06knTJjA6tWr2W677bj66qsZPHgwt99+O1/+8pdZuXIlxxxzDAcccAB33XVXn/VkOEhSzR588MFS7fjjj+f444+voZsu3nOQJJUYDpKkEsNBklRiOEiSSgwHSVKJ4SBJKnEqqyStM2VQg4/3yiYNO+2005g/fz7Dhg1j0aJFAJx77rn84he/YPvtt+cjH/kIP/7xjxk8eHD3Ps8//zz77rsvU6ZM4Wtf+1pj+8YzB0mq3SmnnMKdd965Xu2II45g0aJFLFy4kL333ptLL710ve3nnHMORx999GbryXCQpJqNGTOGXXbZZb3a2LFjaW7uurhzyCGH0NnZ2b3t5z//OSNHjuRjH/vYZuvJcJCkfm769OndZwmvvfYal112GRdddNFmfU3DQZL6salTp9Lc3ExbWxsAU6ZM4atf/So77rjjZn1db0hLUj81Y8YM5s+fzz333ENEAPDII49w22238fWvf52XX36ZpqYmBg4cyJlnntnQ1zYcJKkfuvPOO7n88st54IEH2GGHHbrrPR/SN2XKFHbccceGBwNsQjhExHRgPLAiM/crarsANwMjgOeAEzLzpeiKtiuBccAbwCmZ+etin4nAN4vDficzZxb1TwAzgA8CC4CzMjMb9P4kadNt4tTTRjvppJO4//77WbVqFS0tLVx88cVceumlvPnmmxxxxBFA103pa6+9ts962pQzhxnAVcCsHrXzgXsy83sRcX6xfh5wNDCq+DkYuAY4uAiTi4BWIIHHI2JeZr5UjDkdeISucDgK+GX1tyZJW4abbrqpVJs0adL77jdlypTN0E2X970hnZm/Ata8q3wsMLNYngkc16M+K7s8DAyOiN2BI4G7M3NNEQh3A0cV23bOzIeLs4VZPY4lSapJb2cr7ZaZLxTLfwR2K5aHAx09xnUWtfeqd26gvkERMTki2iOifeXKlb1sXZL0fipPZS3+4u+TewSZeV1mtmZm69ChQ/viJSVpm9TbcHixuCRE8XtFUV8O7NljXEtRe696ywbqkqQa9TYc5gETi+WJwNwe9ZOjyyHAK8Xlp7uAsRExJCKGAGOBu4ptr0bEIcVMp5N7HEuSVJNNmcp6E3Ao8OGI6KRr1tH3gFsiYhLwB+CEYvgCuqaxLqNrKuupAJm5JiK+DTxWjLskM9fd5P6v/GUq6y9xppIk1e59wyEzT9rIpsM3MDaBMzZynOnA9A3U24H93q8PSdrc9p+5f0OP99TEp953TEdHByeffDIvvvgiEcHkyZM566yzuPDCC5k7dy5NTU0MGzaMGTNmsMceezS0v/fis5UkqUbNzc18//vf5+mnn+bhhx/m6quv5umnn+bcc89l4cKFPPnkk4wfP55LLrmkT/syHCSpRrvvvjsHHnggADvttBOjR49m+fLl7Lzzzt1jXn/99e5nK/UVn60kSf3Ec889xxNPPMHBBx8MwAUXXMCsWbMYNGgQ9913X5/24pmDJPUDr732GhMmTOAHP/hB91nD1KlT6ejooK2tjauuuqpP+zEcJKlmb731FhMmTKCtrY3Pfvazpe1tbW387Gc/69OeDAdJqlFmMmnSJEaPHs0555zTXV+6dGn38ty5c9lnn336tC/vOUhSYVOmnjbaQw89xI033sj+++/PAQccAMB3v/tdpk2bxpIlS2hqamKvvfbq08d1g+EgSbX69Kc/zYa+wmbcuHE1dPMXXlaSJJUYDpKkEsNBklRiOEiSSgwHSVKJ4SBJKnEqqyQVFu8zuqHHG/3M4k0e+84779Da2srw4cOZP38+mck3v/lNbr31VgYMGMCXvvQlvvKVrzS0v/diOEhSP3DllVcyevRoXn31VQBmzJhBR0cHzzzzDE1NTaxYseJ9jtBYXlaSpJp1dnZyxx138MUvfrG7ds011/Ctb32Lpqau/00PGzasT3syHCSpZmeffTaXX355dxAA/O53v+Pmm2+mtbWVo48+er1nLfUFw0GSajR//nyGDRvGJz7xifXqb775JgMHDqS9vZ3TTz+d0047rU/78p6DJNXooYceYt68eSxYsIC1a9fy6quv8oUvfIGWlpbux3cff/zxnHrqqX3al2cOklSjSy+9lM7OTp577jlmz57NYYcdxk9+8hOOO+647m9/e+CBB9h77737tC/PHCSp8NdMPd3czj//fNra2rjiiivYcccdueGGG/r09Q0HSeonDj30UA499FAABg8ezB133FFbL15WkiSVGA6SpBLDQZJUYjhIkkoqhUNEfDUifhsRiyLipogYGBEjI+KRiFgWETdHxPbF2A8U68uK7SN6HOcbRX1JRBxZ7S1JkqrqdThExHDgK0BrZu4HDABOBC4DrsjMvwFeAiYVu0wCXirqVxTjiIh9i/0+BhwF/EtEDOhtX5Kk6qpOZW0GPhgRbwE7AC8AhwH/udg+E5gCXAMcWywD3AZcFRFR1Gdn5pvAsxGxDDgI+D8Ve5Okv8rV/3BvQ493xrWHbdK40047rfsxGosWLQLgwgsvZO7cuTQ1NTFs2DBmzJjBHnvs0b3PY489xic/+Ulmz57N5z73uYb2DRXOHDJzOfDfgefpCoVXgMeBlzPz7WJYJzC8WB4OdBT7vl2M37VnfQP7SNJW75RTTuHOO+9cr3buueeycOFCnnzyScaPH88ll1zSve2dd97hvPPOY+zYsZutpyqXlYbQ9Vf/SGAP4EN0XRbabCJickS0R0T7ypUrN+dLSVKfGTNmDLvssst6tZ133rl7+fXXX6frQkuXH/7wh0yYMGGzPsa7ymWlvwOezcyVABExB/gUMDgimouzgxZgeTF+ObAn0BkRzcAgYHWP+jo991lPZl4HXAfQ2tqaFXqXpH7vggsuYNasWQwaNKj7OUvLly/n9ttv57777uOxxx7bbK9dZbbS88AhEbFDce/gcOBp4D5g3QWwicDcYnlesU6x/d7MzKJ+YjGbaSQwCni0Ql+StFWYOnUqHR0dtLW1cdVVVwFd3/1w2WWXrffdD5tDr88cMvORiLgN+DXwNvAEXX/V3wHMjojvFLVpxS7TgBuLG85r6JqhRGb+NiJuoStY3gbOyMx3etuXJG1t2traGDduHBdffDHt7e2ceOKJAKxatYoFCxbQ3NzMcccd19DXrDRbKTMvAi56V/n3dM02evfYtcDnN3KcqcDUKr1I0tZk6dKljBo1CoC5c+eyzz77APDss892jznllFMYP358w4MBfCqrJHXb1KmnjXbSSSdx//33s2rVKlpaWrj44otZsGABS5Ysoampib322otrr722T3syHCSpZjfddFOpNmnSpA2MXN+MGTM2QzddfLaSJKnEcJAklRgOkrZpXTPqty6NeE+Gg6Rt1sCBA1m9evVWFRCZyerVqxk4cGCl43hDWtI2q6Wlhc7OTra2x/EMHDiQlpaWSscwHCRts7bbbjtGjhxZdxv9kpeVJEklhoMkqcRwkCSVGA6SpBLDQZJUYjhIkkoMB0lSieEgSSoxHCRJJYaDJKnEcJAklRgOkqQSw0GSVGI4SJJKDAdJUonhIEkqMRwkSSWGgySpxHCQJJX4HdLqP6YMqrsDmPJK3R1I/UKlM4eIGBwRt0XEMxGxOCI+GRG7RMTdEbG0+D2kGBsR8c8RsSwiFkbEgT2OM7EYvzQiJlZ9U5KkaqpeVroSuDMz9wH+PbAYOB+4JzNHAfcU6wBHA6OKn8nANQARsQtwEXAwcBBw0bpAkSTVo9fhEBGDgDHANIDM/HNmvgwcC8wshs0EjiuWjwVmZZeHgcERsTtwJHB3Zq7JzJeAu4GjetuXJKm6KmcOI4GVwI8j4omIuCEiPgTslpkvFGP+COxWLA8HOnrs31nUNlYviYjJEdEeEe0rV66s0Lok6b1UCYdm4EDgmsz8OPA6f7mEBEBmJpAVXmM9mXldZrZmZuvQoUMbdVhJ0rtUCYdOoDMzHynWb6MrLF4sLhdR/F5RbF8O7Nlj/5aitrG6JKkmvQ6HzPwj0BERHy1KhwNPA/OAdTOOJgJzi+V5wMnFrKVDgFeKy093AWMjYkhxI3psUZMk1aTq5xy+DPw0IrYHfg+cSlfg3BIRk4A/ACcUYxcA44BlwBvFWDJzTUR8G3isGHdJZq6p2JckqYJK4ZCZTwKtG9h0+AbGJnDGRo4zHZhepRdJUuP4CWmph/1n7l93CwA8NfGpulvQNs5nK0mSSgwHSVKJ4SBJKjEcJEkl3pCW+qHF+4yuuwVGP7O47hZUI88cJEklhoMkqcRwkCSVGA6SpBLDQZJUYjhIkkoMB0lSieEgSSoxHCRJJYaDJKnEcJAklRgOkqQSw0GSVGI4SJJKDAdJUonhIEkq8ct+JG3Q1f9wb90tAHDGtYfV3cI2yTMHSVKJ4SBJKjEcJEklhoMkqaRyOETEgIh4IiLmF+sjI+KRiFgWETdHxPZF/QPF+rJi+4gex/hGUV8SEUdW7UmSVE0jzhzOAhb3WL8MuCIz/wZ4CZhU1CcBLxX1K4pxRMS+wInAx4CjgH+JiAEN6EuS1EuVwiEiWoBjgBuK9QAOA24rhswEjiuWjy3WKbYfXow/FpidmW9m5rPAMuCgKn1JkqqpeubwA+DrwP8r1ncFXs7Mt4v1TmB4sTwc6AAotr9SjO+ub2AfSVINeh0OETEeWJGZjzewn/d7zckR0R4R7StXruyrl5WkbU6VM4dPAf8xIp4DZtN1OelKYHBErPvkdQuwvFheDuwJUGwfBKzuWd/APuvJzOsyszUzW4cOHVqhdUnSe+l1OGTmNzKzJTNH0HVD+d7MbAPuAz5XDJsIzC2W5xXrFNvvzcws6icWs5lGAqOAR3vblySpus3xbKXzgNkR8R3gCWBaUZ8G3BgRy4A1dAUKmfnbiLgFeBp4GzgjM9/ZDH1JkjZRQ8IhM+8H7i+Wf88GZhtl5lrg8xvZfyowtRG9SJKq8xPSkqQSw0GSVGI4SJJKDAdJUonhIEkqMRwkSSWGgySpxHCQJJUYDpKkEsNBklRiOEiSSgwHSVKJ4SBJKjEcJEklhoMkqcRwkCSVGA6SpBLDQZJUYjhIkkoMB0lSieEgSSoxHCRJJYaDJKnEcJAklRgOkqQSw0GSVGI4SJJKDAdJUkmvwyEi9oyI+yLi6Yj4bUScVdR3iYi7I2Jp8XtIUY+I+OeIWBYRCyPiwB7HmliMXxoRE6u/LUlSFVXOHN4G/ltm7gscApwREfsC5wP3ZOYo4J5iHeBoYFTxMxm4BrrCBLgIOBg4CLhoXaBIkurR63DIzBcy89fF8p+AxcBw4FhgZjFsJnBcsXwsMCu7PAwMjojdgSOBuzNzTWa+BNwNHNXbviRJ1TXknkNEjAA+DjwC7JaZLxSb/gjsViwPBzp67NZZ1DZW39DrTI6I9ohoX7lyZSNalyRtQOVwiIgdgZ8BZ2fmqz23ZWYCWfU1ehzvusxszczWoUOHNuqwkqR3qRQOEbEdXcHw08ycU5RfLC4XUfxeUdSXA3v22L2lqG2sLkmqSZXZSgFMAxZn5v/osWkesG7G0URgbo/6ycWspUOAV4rLT3cBYyNiSHEjemxRkyTVpLnCvp8C/gvwVEQ8WdT+EfgecEtETAL+AJxQbFsAjAOWAW8ApwJk5pqI+DbwWDHuksxcU6EvSVJFvQ6HzPxXIDay+fANjE/gjI0cazowvbe9SJIay09IS5JKDAdJUonhIEkqMRwkSSWGgySpxHCQJJUYDpKkEsNBklRiOEiSSgwHSVKJ4SBJKjEcJEklhoMkqcRwkCSVGA6SpBLDQZJUYjhIkkoMB0lSieEgSSoxHCRJJYaDJKnEcJAklRgOkqQSw0GSVGI4SJJKDAdJUonhIEkq6TfhEBFHRcSSiFgWEefX3Y8kbcv6RThExADgauBoYF/gpIjYt96uJGnb1S/CATgIWJaZv8/MPwOzgWNr7kmStln9JRyGAx091juLmiSpBs11N/DXiIjJwORi9bWIWFJnP2qsqLsBABY14iAfBlZVOUC/uKa65PC6OwDgzB/V3cFWZa9NHdhfwmE5sGeP9Zaitp7MvA64rq+aknojItozs7XuPqQq+stlpceAURExMiK2B04E5tXckyRts/rFmUNmvh0RZwJ3AQOA6Zn525rbkqRtVmRm3T1IW5WImFxcApW2WIaDJKmkv9xzkCT1I4aDJKnEcJAklRgOUkUR8fmI2KlY/mZEzImIA+vuS6rCcJCquzAz/xQRnwb+DpgGXFNzT1IlhoNU3TvF72OA6zLzDmD7GvuRKjMcpOqWR8SPgP8ELIiID+C/LW3h/JyDVFFE7AAcBTyVmUsjYndg/8z8XzW3JvWaf91I1X0D+BPwbwCZ+YLBoC2d4SBV93vgJKA9Ih6NiO9HhF9WpS2al5WkBomIfwecAHwNGJKZO9XcktRrhoNUUUTcQNf387wIPAj8K/DrzHy71sakCrysJFW3K12Pmn8ZWAOsMhi0pfPMQWqQiBgNHAl8FRiQmS01tyT1Wr/4sh9pSxYR44H/AIwBBgP30nV5SdpieeYgVRQRV9EVBg9m5r/V3Y/UCIaD1AARsRcwKjP/d0R8EGjOzD/V3ZfUW96QliqKiNOB24AfFaUW4Of1dSRVZzhI1Z0BfAp4FSAzlwLDau1IqshwkKp7MzP/vG4lIpoBr9dqi2Y4SNU9EBH/CHwwIo4AbgV+UXNPUiXekJYqiogmYBIwFgjgLuCG9B+XtmCGgySpxA/BSb0UEbdk5gkR8RQbuMeQmX9bQ1tSQxgOUu+9Vnxv9N/jDWhtZQwHqfd+A/wTsDtwC3BTZj5Rb0tSY3jPQaqo+HT0icXPB4Gb6AqK/1trY1IFhoPUQBHxcWA68LeZOaDufqTe8nMOUkUR0RwRfx8RPwV+CSwBPltzW1IlnjlIvVR84O0kYBzwKDAbmJuZr9famNQAhoPUSxFxL/A/gZ9l5kt19yM1kuEgSSrxnoMkqcRwkCSVGA6SpBLDQZJUYjhIkkr+PwIDUkkMKEJyAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] @@ -5652,12 +1728,12 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 22, "metadata": {}, "outputs": [ { "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEMCAYAAAA/Jfb8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAFTRJREFUeJzt3X2QXfV93/H3ByQgDjK2pY1rkIwYG8doMMawBmJXMgR7ENCi4BAbTUnaAmbaGjfjZFQrYwbbtJ3xA2lJXSAWhVDcCTLGLtEU8dQa15rUdrUI8yQClkExK3BYS/gJSnjwt3/cK7TIK+2VdLVHe+77NbOz95zz23u/Otr72d/9/c5DqgpJUrvs13QBkqT+M9wlqYUMd0lqIcNdklrIcJekFjLcJamFGg33JNcleTrJgz20fXOSu5Pcm+T+JGdMRY2SNB013XO/HljcY9tLgJuq6l3AucBVe6soSZruGg33qvoWsGX8uiRvSXJ7knuSrEny9q3Ngdd2Hx8CPDmFpUrStDKj6QImsAL4F1X1/SQn0umh/zbwaeDOJB8Dfh14f3MlStK+bZ8K9yQHA+8Bvppk6+oDu9+XAtdX1Z8m+S3gy0mOrqpfNlCqJO3T9qlwpzNM9JOqOnaCbRfQHZ+vqm8nOQiYAzw9hfVJ0rTQ9ITqq1TVz4DHk/weQDre2d38Q+DU7vqjgIOAsUYKlaR9XJq8KmSSG4GT6fTA/w74FPAN4GrgTcBMYGVVXZZkAXANcDCdydV/U1V3NlG3JO3rGg13SdLesU8Ny0iS+qOxCdU5c+bU/Pnzm3p5SZqW7rnnnh9X1dBk7RoL9/nz5zMyMtLUy0vStJTkb3tp57CMJLWQ4S5JLWS4S1IL7WtnqEoSAC+++CKjo6M8//zzTZfSiIMOOoi5c+cyc+bM3fp5w13SPml0dJRZs2Yxf/58xl1raiBUFZs3b2Z0dJQjjjhit57DYRlJ+6Tnn3+e2bNnD1ywAyRh9uzZe/SpxXCXtM8axGDfak//7Ya7JLWQY+6SpoX5y2/t6/Nt/OyZO91+yimnsHz5ck477bRX1l1xxRXcd999/PznP+fmm2/uaz39Nq3Dvd//2btjsl8QSdPT0qVLWbly5avCfeXKlXz+859n0aJFDVbWG4dlJGkC55xzDrfeeisvvPACABs3buTJJ59k3rx5HH300QC8/PLLLFu2jHe/+90cc8wxfOlLXwLgox/9KKtWrQLg7LPP5vzzzwfguuuu45Of/CTPPvssZ555Ju985zs5+uij+cpXvtL3+g13SZrAG97wBk444QRuu+02oNNr/9CHPvSqic5rr72WQw45hLVr17J27VquueYaHn/8cRYuXMiaNWsA2LRpE+vXrwdgzZo1LFq0iNtvv51DDz2U++67jwcffJDFixf3vX7DXZJ2YOvQDHTCfenSpa/afuedd3LDDTdw7LHHcuKJJ7J582a+//3vvxLu69evZ8GCBbzxjW/kqaee4tvf/jbvec97eMc73sFdd93FJz7xCdasWcMhhxzS99qn9Zi7JO1NS5Ys4eMf/zjr1q3jueee4/jjj2fjxo2vbK8qvvjFL75qXH6rn/zkJ9x+++0sWrSILVu2cNNNN3HwwQcza9YsZs2axbp161i9ejWXXHIJp556Kpdeemlfa5+0557kuiRPJ3lwB9v/SZL7kzyQ5P+Mu+epJE1rBx98MKeccgrnn3/+r/TaAU477TSuvvpqXnzxRQAeffRRnn32WQBOOukkrrjiChYtWsTChQu5/PLLWbhwIQBPPvkkr3nNazjvvPNYtmwZ69at63vtvfTcrwf+M3DDDrY/Dryvqp5JcjqwAjixP+VJUkdTR6YtXbqUs88++5XhmfEuvPBCNm7cyHHHHUdVMTQ0xC233ALAwoULufPOO3nrW9/K4YcfzpYtW14J9wceeIBly5ax3377MXPmTK6++uq+193TPVSTzAf+R1UdPUm71wMPVtVhkz3n8PBw7enNOjwUUmqvhx9+mKOOOqrpMho10T5Ick9VDU/2s/2eUL0AuG1HG5NclGQkycjY2FifX1qStFXfwj3JKXTC/RM7alNVK6pquKqGh4YmvQWgJGk39eVomSTHAP8FOL2qNvfjOSWpqgb24mG9DJnvzB733JO8Gfg68PtV9eiePp8kQedmFZs3b97jkJuOtl7P/aCDDtrt55i0557kRuBkYE6SUeBTwMxuAX8OXArMBq7q/oV9qZfBfknamblz5zI6Osqgzs9tvRPT7po03KvqVw/ufPX2C4ELd7sCSZrAzJkzd/suRPLyA5LUSoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgtNGu5JrkvydJIHd7A9Sf5Tkg1J7k9yXP/LlCTtil567tcDi3ey/XTgyO7XRcDVe16WJGlPTBruVfUtYMtOmiwBbqiO7wCvS/KmfhUoSdp1/RhzPwx4YtzyaHfdr0hyUZKRJCNjY2N9eGlJ0kSmdEK1qlZU1XBVDQ8NDU3lS0vSQOlHuG8C5o1bnttdJ0lqSD/CfRXwB92jZk4CflpVT/XheSVJu2nGZA2S3AicDMxJMgp8CpgJUFV/DqwGzgA2AM8B/3xvFStJ6s2k4V5VSyfZXsBH+1aRJGmPeYaqJLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQj2Fe5LFSR5JsiHJ8gm2vznJ3UnuTXJ/kjP6X6okqVeThnuS/YErgdOBBcDSJAu2a3YJcFNVvQs4F7iq34VKknrXS8/9BGBDVT1WVS8AK4El27Up4LXdx4cAT/avREnSruol3A8Dnhi3PNpdN96ngfOSjAKrgY9N9ERJLkoykmRkbGxsN8qVJPWiXxOqS4Hrq2oucAbw5SS/8txVtaKqhqtqeGhoqE8vLUnaXi/hvgmYN255bnfdeBcANwFU1beBg4A5/ShQkrTregn3tcCRSY5IcgCdCdNV27X5IXAqQJKj6IS74y6S1JBJw72qXgIuBu4AHqZzVMxDSS5Lcla32R8DH0lyH3Aj8M+qqvZW0ZKknZvRS6OqWk1nonT8ukvHPV4PvLe/pUmSdpdnqEpSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS3UU7gnWZzkkSQbkizfQZsPJVmf5KEkf9nfMiVJu2LGZA2S7A9cCXwAGAXWJllVVevHtTkS+BPgvVX1TJLf2FsFa2Lzl9/adAls/OyZTZcgqauXnvsJwIaqeqyqXgBWAku2a/MR4Mqqegagqp7ub5mSpF3RS7gfBjwxbnm0u268twFvS/LXSb6TZPFET5TkoiQjSUbGxsZ2r2JJ0qT6NaE6AzgSOBlYClyT5HXbN6qqFVU1XFXDQ0NDfXppSdL2egn3TcC8cctzu+vGGwVWVdWLVfU48CidsJckNaCXcF8LHJnkiCQHAOcCq7ZrcwudXjtJ5tAZpnmsj3VKknbBpOFeVS8BFwN3AA8DN1XVQ0kuS3JWt9kdwOYk64G7gWVVtXlvFS1J2rlJD4UEqKrVwOrt1l067nEBf9T9kiQ1zDNUJamFDHdJaiHDXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaqGeLvkrTSfzl9/adAls/OyZTZegAWfPXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklqop3BPsjjJI0k2JFm+k3a/m6SSDPevREnSrpo03JPsD1wJnA4sAJYmWTBBu1nAHwLf7XeRkqRd00vP/QRgQ1U9VlUvACuBJRO0+7fA54Dn+1ifJGk39BLuhwFPjFse7a57RZLjgHlVtdMrNiW5KMlIkpGxsbFdLlaS1Js9nlBNsh/wH4A/nqxtVa2oquGqGh4aGtrTl5Yk7UAv4b4JmDdueW533VazgKOBbybZCJwErHJSVZKa00u4rwWOTHJEkgOAc4FVWzdW1U+rak5Vza+q+cB3gLOqamSvVCxJmtSk4V5VLwEXA3cADwM3VdVDSS5LctbeLlCStOt6uhNTVa0GVm+37tIdtD15z8uS1A/elWpweZs9SQNh0P7QefkBSWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaiHDXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaiHDXZJayHCXpBYy3CWphXoK9ySLkzySZEOS5RNs/6Mk65Pcn+R/JTm8/6VKkno1abgn2R+4EjgdWAAsTbJgu2b3AsNVdQxwM/D5fhcqSepdLz33E4ANVfVYVb0ArASWjG9QVXdX1XPdxe8Ac/tbpiRpV/QS7ocBT4xbHu2u25ELgNsm2pDkoiQjSUbGxsZ6r1KStEv6OqGa5DxgGPjCRNurakVVDVfV8NDQUD9fWpI0zowe2mwC5o1bnttd9ypJ3g98EnhfVf19f8qTJO2OXnrua4EjkxyR5ADgXGDV+AZJ3gV8CTirqp7uf5mSpF0xabhX1UvAxcAdwMPATVX1UJLLkpzVbfYF4GDgq0m+l2TVDp5OkjQFehmWoapWA6u3W3fpuMfv73NdkqQ94BmqktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSC/UU7kkWJ3kkyYYkyyfYfmCSr3S3fzfJ/H4XKknq3aThnmR/4ErgdGABsDTJgu2aXQA8U1VvBf4j8Ll+FypJ6l0vPfcTgA1V9VhVvQCsBJZs12YJ8F+7j28GTk2S/pUpSdoVqaqdN0jOARZX1YXd5d8HTqyqi8e1ebDbZrS7/INumx9v91wXARd1F38TeKRf/5A9MAf48aStBoP7Yhv3xTbui232hX1xeFUNTdZoxlRUslVVrQBWTOVrTibJSFUNN13HvsB9sY37Yhv3xTbTaV/0MiyzCZg3bnlud92EbZLMAA4BNvejQEnSrusl3NcCRyY5IskBwLnAqu3arAL+affxOcA3arLxHknSXjPpsExVvZTkYuAOYH/guqp6KMllwEhVrQKuBb6cZAOwhc4fgOlinxomapj7Yhv3xTbui22mzb6YdEJVkjT9eIaqJLWQ4S5JLWS4S1ILGe6S1EJTehJTk5KsA74O3FhVP2i6nqYl2Q+gqn7ZPcT1aGBjVW1ptrKpleQ1wMVAAV+kc6TXB4G/AS6rql80WF5jkrweeLmqftZ0LU1KMkznHJ6XgUer6m8aLqlng9Rzfz3wOuDuJP83yceTHNp0UU1I8jvAU8CmJEuANcAXgPuT/ONGi5t61wNvBI4AbgWG6eyLAFc3V9bUS3JokhuS/JTOKfYPJvlhkk8nmdl0fVMpyfuSjACfBa6jc9mUa5N8M8m8nf/0PqKqBuILWDfu8ULgKuBHwN3ARU3XN8X74l7gH9AJtJ8Bv9ldfzidcxcar3EK98X3ut/T/X3IuOX7m65vivfFN4CTu48/SOcKr78O/DtgRdP1TfG+uBcY6j4+Avjv3ccfAO5sur5evgap5/7KVSqrak1V/SvgMDqXJ/6txqpqSFX9qKoeB35YVY901/0tg/Vp7hXVeeeu7n7fujxoJ4HMrqpvAlTV14FFVfVsVV0CLGq0sqm3f1WNdR//kE7Hh6q6i05u7PMGZsydCa5AWVUvA7d3vwZKkv2q6pfA+ePW7Q8c0FxVjRhJcnBV/aKqxu+LtwA/b7CuJowlOY/Op9kPAhsBupfvHrQ/+iNJrqXzaeYs4JvwyhzN/g3W1bOBOUM1yb+m89HqiaZraVqSdwMPVNXz262fD/zDqvpvTdTVlCQn0Omsr+3eiGYxnc7AKz35QZDkzcDldG7K8z1gWVU9lWQ2neGarzVa4BTqzjF8hM6+uI/OZVdeTvJrwG90P+Xu0wYp3H8KPAv8ALgR+Oq4j10DL8nsqhq4K3km+RSdu4zNAO4CTqTTc/0AcEdV/fsGy5N22yCF+73A8cD7gQ/T+ah1D52g/3pVDcxH8CSfBS6vqh93D/W6CfglMBP4g6r6340WOIWSPAAcCxxIZ0J1blX9rNtD+25VHdNogVOoe7nuC4DfYdu48ibgr4Brq+rFpmqbakleC/wJnUuc31ZVfzlu21XdObt92iCNo1VV/bKq7qyqC4BD6Rwxsxh4rNnSptyZte0uWV8APlyd+99+APjT5spqxEtV9XJVPQf8oLrHdVfV/6PzB2+QfJnOH7rPAGd0vz4DvBMYqKE64C/oHITxNeDcJF9LcmB320nNldW7QZpQfdU9Xbu9kFXAqu4kySCZkWRGVb0E/FpVrQWoqkfH/QIPiheSvKYb7sdvXZnkEAYv3I+vqrdtt24U+E6SR5soqEFvqarf7T6+JckngW8kOavJonbFIPXcP7yjDd039iC5Clid5LeB25P8Wfekjc/QmUgbJIu2/v93jx7aaibbbkAzKLYk+b2tZy9D56iqJB8GnmmwriYcOH4/dOdergG+BcxurKpdMDBj7nq1JCcD/xJ4G51PcE8AtwB/MUhjq9qme7TU54BTgJ90V7+OzgTz8u55EQMhyefpnKz0P7dbvxj4YlUd2UxlvTPcB1SSt9OZNPtujbt+SpLFVTVwx/2rI8mJdE7e+gHwdjon+K2vqtWNFtaAnbxHTq+q25qrrDeG+wDqHvP/UeBhOhNof1hVf9Xdtq6qjmuyPjVjgsNCT6Bz8s7AHRaa5GN0Lig3bd8jgzShqm0+Qmfy7Bfdj+I3J5lfVX/GdhPPGijnMPFhoZcD3wUGJtzpXChsWr9HDPfBtN/Wj5lVtbE7/n5zksOZJr+42ite6l6S47kkrzosNMmgHTk07d8jg3S0jLb5uyTHbl3o/hL/I2AO8I7GqlLTXhh3WPCgHxY67d8jjrkPoCRz6fTSfjTBtvdW1V83UJYaluTAqvr7CdbPAd5UVQ80UFYj2vAeMdwlqYUclpGkFjLcJamFDHdJaiHDXZJa6P8DEVM3h9QH/TcAAAAASUVORK5CYII=\n", + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEFCAYAAAAIZiutAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAFQVJREFUeJzt3X+Q3XV97/HnGwikkJQfYZtKkpL0kmlFVMQ1pDrJgOklQbwNOMiQqZIBbDoVrbV3cg0Xx1QsM8hYQanmihIFxxqY+INM+ZnyQ9Op0oQgAomajESyIUqaBbRQCsH3/eN8Qg75bMiyZ7PfTfb5mNnZ7/fz+Xy/+97DHl75fr6fc05kJpIktTuo6QIkScOP4SBJqhgOkqSK4SBJqhgOkqSK4SBJqhgOkqSK4SBJqhgOkqTKIU0XMFDHHntsTp48uekyJGm/8cADD/xHZnb1Z+x+Gw6TJ09mzZo1TZchSfuNiPhFf8c6rSRJqhgOkqSK4SBJquy39xwkaU9efPFFenp6eP7555supRGjR49m4sSJjBo1asDnMBwkHXB6enoYO3YskydPJiKaLmdIZSbbt2+np6eHKVOmDPg8TitJOuA8//zzjBs3bsQFA0BEMG7cuI6vmgwHSQekkRgMOw3G7244SJIq3nOQdMCbvOjWQT3fpivPetX+008/nUWLFjF79uyX26655hoeeughfvOb37B8+fJBrWdfGLHhMNh/LAO1tz8ySfufefPmsWzZsleEw7Jly7jqqquYOXNmg5X1n9NKkjTIzj33XG699VZeeOEFADZt2sQTTzzBpEmTOOmkkwB46aWXWLhwIW9729t405vexJe+9CUALrnkElasWAHAOeecw0UXXQTA0qVLueyyy3j22Wc566yzePOb38xJJ53ETTfdtE9+B8NBkgbZMcccw7Rp07j99tuB1lXDeeed94obxddffz1HHnkkq1evZvXq1Xz5y1/mscceY8aMGaxatQqALVu2sG7dOgBWrVrFzJkzueOOOzjuuON46KGHeOSRR5gzZ84++R0MB0naB3ZOLUErHObNm/eK/rvuuosbb7yRk08+mVNPPZXt27ezYcOGl8Nh3bp1nHjiiYwfP56tW7fygx/8gLe//e288Y1vZOXKlXzsYx9j1apVHHnkkfuk/hF7z0GS9qW5c+fy0Y9+lLVr1/Lcc8/x1re+lU2bNr3cn5lce+21r7gvsdPTTz/NHXfcwcyZM+nt7eXmm29mzJgxjB07lrFjx7J27Vpuu+02Pv7xjzNr1iw+8YlPDHr9XjlI0j4wZswYTj/9dC666KLqqgFg9uzZLFmyhBdffBGAn/3sZzz77LMATJ8+nWuuuYaZM2cyY8YMPvOZzzBjxgwAnnjiCQ4//HDe9773sXDhQtauXbtP6vfKQdIBr6lVgfPmzeOcc855eXqp3Qc+8AE2bdrEKaecQmbS1dXFd7/7XQBmzJjBXXfdxQknnMDxxx9Pb2/vy+Hw8MMPs3DhQg466CBGjRrFkiVL9kntkZn75MT7Wnd3d3byYT8uZZUOXOvXr+f1r39902U0qq/HICIeyMzu/hzvtJIkqWI4SJIqhoOkA9L+OmU+GAbjdzccJB1wRo8ezfbt20dkQOz8PIfRo0d3dB5XK0k64EycOJGenh62bdvWdCmN2PlJcJ0wHCQdcEaNGtXRp6DJaSVJUh8MB0lSxXCQJFUMB0lSxXCQJFUMB0lSxXCQJFX2Gg4RsTQinoyIR9rajomIlRGxoXw/urRHRHw+IjZGxI8j4pS2Y+aX8RsiYn5b+1sj4uFyzOej/XP0JEmN6M+Vw9eA3T+kdBFwd2ZOBe4u+wBnAlPL1wJgCbTCBFgMnApMAxbvDJQy5i/ajts3H4gqSeq3vYZDZn4f6N2teS5wQ9m+ATi7rf3GbPkhcFREvA6YDazMzN7MfApYCcwpfb+bmT/M1pug3Nh2LklSQwZ6z2F8Zm4t278ExpftCcDmtnE9pe3V2nv6aJckNajjG9LlX/xD8taHEbEgItZExJqR+oZakjQUBhoOvypTQpTvT5b2LcCktnETS9urtU/so71PmXldZnZnZndXV9cAS5ck7c1Aw2EFsHPF0Xzglrb2C8qqpenAM2X66U7gjIg4utyIPgO4s/T9OiKml1VKF7SdS5LUkL2+ZXdEfBM4DTg2InporTq6Erg5Ii4GfgGcV4bfBrwL2Ag8B1wIkJm9EfEpYHUZd3lm7rzJ/UFaK6J+B7i9fEmSGrTXcMjMeXvomtXH2AQu2cN5lgJL+2hfA5y0tzokSUPHV0hLkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiqGgySpYjhIkiodhUNEfDQiHo2IRyLimxExOiKmRMT9EbExIm6KiEPL2MPK/sbSP7ntPJeW9p9GxOzOfiVJUqcGHA4RMQH4a6A7M08CDgbOBz4NXJ2ZJwBPAReXQy4GnirtV5dxRMSJ5bg3AHOAL0bEwQOtS5LUuU6nlQ4BficiDgEOB7YC7wSWl/4bgLPL9tyyT+mfFRFR2pdl5n9n5mPARmBah3VJkjow4HDIzC3AZ4DHaYXCM8ADwNOZuaMM6wEmlO0JwOZy7I4yflx7ex/HSJIa0Mm00tG0/tU/BTgOOILWtNA+ExELImJNRKzZtm3bvvxRkjSidTKt9KfAY5m5LTNfBL4NvAM4qkwzAUwEtpTtLcAkgNJ/JLC9vb2PY14hM6/LzO7M7O7q6uqgdEnSq+kkHB4HpkfE4eXewSxgHXAvcG4ZMx+4pWyvKPuU/nsyM0v7+WU10xRgKvDvHdQlSerQIXsf0rfMvD8ilgNrgR3Ag8B1wK3Asoj4+9J2fTnkeuDrEbER6KW1QonMfDQibqYVLDuASzLzpYHWJUnq3IDDASAzFwOLd2v+OX2sNsrM54H37uE8VwBXdFKLJGnw+AppSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLFcJAkVQwHSVLlkKYLUPMmL7q16RIA2HTlWU2XIKno6MohIo6KiOUR8ZOIWB8RfxIRx0TEyojYUL4fXcZGRHw+IjZGxI8j4pS288wv4zdExPxOfylJUmc6nVb6HHBHZv4x8GZgPbAIuDszpwJ3l32AM4Gp5WsBsAQgIo4BFgOnAtOAxTsDRZLUjAGHQ0QcCcwErgfIzBcy82lgLnBDGXYDcHbZngvcmC0/BI6KiNcBs4GVmdmbmU8BK4E5A61LktS5Tq4cpgDbgK9GxIMR8ZWIOAIYn5lby5hfAuPL9gRgc9vxPaVtT+2ViFgQEWsiYs22bds6KF2S9Go6CYdDgFOAJZn5FuBZdk0hAZCZCWQHP+MVMvO6zOzOzO6urq7BOq0kaTedhEMP0JOZ95f95bTC4ldluojy/cnSvwWY1Hb8xNK2p3ZJUkMGHA6Z+Utgc0T8UWmaBawDVgA7VxzNB24p2yuAC8qqpenAM2X66U7gjIg4utyIPqO0SZIa0unrHD4MfCMiDgV+DlxIK3BujoiLgV8A55WxtwHvAjYCz5WxZGZvRHwKWF3GXZ6ZvR3WJUnqQEfhkJk/Arr76JrVx9gELtnDeZYCSzupRZI0eHyFtNTGV4tLLb63kiSpYjhIkiqGgySpYjhIkirekJbUJ2/Oj2xeOUiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKliOEiSKoaDJKnih/1I0l6MxA8+8spBklQxHCRJFcNBklQxHCRJlY7DISIOjogHI+Kfy/6UiLg/IjZGxE0RcWhpP6zsbyz9k9vOcWlp/2lEzO60JklSZwbjyuEjwPq2/U8DV2fmCcBTwMWl/WLgqdJ+dRlHRJwInA+8AZgDfDEiDh6EuiRJA9RROETEROAs4CtlP4B3AsvLkBuAs8v23LJP6Z9Vxs8FlmXmf2fmY8BGYFondUmSOtPplcM1wP8Bflv2xwFPZ+aOst8DTCjbE4DNAKX/mTL+5fY+jpEkNWDA4RAR7waezMwHBrGevf3MBRGxJiLWbNu2bah+rCSNOJ1cObwD+LOI2AQsozWd9DngqIjY+crricCWsr0FmARQ+o8Etre393HMK2TmdZnZnZndXV1dHZQuSXo1Aw6HzLw0Mydm5mRaN5Tvycw/B+4Fzi3D5gO3lO0VZZ/Sf09mZmk/v6xmmgJMBf59oHVJkjq3L95b6WPAsoj4e+BB4PrSfj3w9YjYCPTSChQy89GIuBlYB+wALsnMl/ZBXZKkfhqUcMjM+4D7yvbP6WO1UWY+D7x3D8dfAVwxGLVIkjrnK6QlSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUMRwkSRXDQZJUGXA4RMSkiLg3ItZFxKMR8ZHSfkxErIyIDeX70aU9IuLzEbExIn4cEae0nWt+Gb8hIuZ3/mtJkjrRyZXDDuB/Z+aJwHTgkog4EVgE3J2ZU4G7yz7AmcDU8rUAWAKtMAEWA6cC04DFOwNFktSMAYdDZm7NzLVl+zfAemACMBe4oQy7ATi7bM8FbsyWHwJHRcTrgNnAyszszcyngJXAnIHWJUnq3KDcc4iIycBbgPuB8Zm5tXT9EhhfticAm9sO6ylte2rv6+csiIg1EbFm27Ztg1G6JKkPHYdDRIwBvgX8TWb+ur0vMxPITn9G2/muy8zuzOzu6uoarNNKknbTUThExChawfCNzPx2af5VmS6ifH+ytG8BJrUdPrG07aldktSQTlYrBXA9sD4zP9vWtQLYueJoPnBLW/sFZdXSdOCZMv10J3BGRBxdbkSfUdokSQ05pINj3wG8H3g4In5U2v4vcCVwc0RcDPwCOK/03Qa8C9gIPAdcCJCZvRHxKWB1GXd5ZvZ2UJckqUMDDofM/Fcg9tA9q4/xCVyyh3MtBZYOtBZJ0uDyFdKSpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpMqwCYeImBMRP42IjRGxqOl6JGkkGxbhEBEHA18AzgROBOZFxInNViVJI9ewCAdgGrAxM3+emS8Ay4C5DdckSSPWcAmHCcDmtv2e0iZJakBkZtM1EBHnAnMy8wNl//3AqZn5od3GLQAWlN0/An46pIXWjgX+o+Eahgsfi118LHbxsdhlODwWx2dmV38GHrKvK+mnLcCktv2Jpe0VMvM64LqhKmpvImJNZnY3Xcdw4GOxi4/FLj4Wu+xvj8VwmVZaDUyNiCkRcShwPrCi4ZokacQaFlcOmbkjIj4E3AkcDCzNzEcbLkuSRqxhEQ4AmXkbcFvTdbxGw2aKaxjwsdjFx2IXH4td9qvHYljckJYkDS/D5Z6DJGkYMRwkSRXDQZJUMRwkSRXDYRBExO1N19C0iPhg0zU0ISJ+PyKWRMQXImJcRPxdRDwcETdHxOuarm+oRMSHIuLYsn1CRHw/Ip6OiPsj4o1N1zccRMSfNV3DazFslrIOdxFxyp66gJOHspamRcTf7t4EXBoRowEy87NDX1VjvgbcChwB3At8A3gXcDbw/xg5byD5V5n5j2X7c8DVmfmdiDiN1uPwjsYqa0BEvGf3JuALEXEIQGZ+e+irem0Mh/5bDXyP1n/k3R01xLU07ZO0XpPyKLsej4OBsY1V1JzxmXkttK6eMvPTpf3aiLi4wbqGWvv/S34vM78DkJn3RcRI/Lu4idaLep9k13PkCOB/AQkYDgeQ9cBfZuaG3TsiYnMf4w9kbwD+gdYf+ycz87mImJ+Zn2y4ria0T83euFvfwUNZSMOWR8TXgMuB70TE3wDfAd4JPN5kYQ15O3AlsDozlwBExGmZeWGzZfWf9xz67+/Y8+P14SGso3GZ+Xhmvhf4N2BleVfdkeqWiBgDkJkf39kYESfQ/LsGD5nMvIzWlfU3gb8FPgXcDkwF/rzB0hqRmauB/wkcGhH3RsQ0WlcM+w1fIf0aRMQfAu+h9Q6yLwE/A/4pM3/daGENiogjaAXnqZk5s+FyGhERf0zr80fuz8z/bGufk5l3NFdZsyLi65n5/qbraFpEHAdcA3Rn5h82XU9/GQ79FBF/Dbwb+D6tG44PAk8D5wAfzMz7mqtOTYmIDwMfojXteDLwkcy8pfStzcw9LWQ4oEREX++i/E7gHoDM3K9W6shw6LeIeBg4OTNfiojDgdsy87SI+APglsx8S8MlDpmI+F3gUlqfu3F7Zv5TW98XM3PELGstfxd/kpn/GRGTgeXA1zPzcxHx4Ej5u4iItcA64Cu0pk+C1hTT+QCZ+b3mqht6EfH7wGLgt8AnaE09vwf4Ca1/QGxtsLx+8Z7Da7PzBv5hwM555seBUY1V1Iyv0nryfws4PyK+FRGHlb7pzZXViIN2TiVl5ibgNODMiPgsfa9sO1B1Aw8AlwHPlCvp/8rM7420YCi+RissN9Na4vxfwFnAKlpLe4c9w6H/vgKsjogvAz8AvgAQEV1Ab5OFNeB/ZOaizPxumS5YC9wTEeOaLqwBv4qIl1/nUoLi3bQ+EnLEvPgrM3+bmVcDFwKXRcQ/MrJXQ47PzGsz80rgqMz8dGZuLsuej2+6uP4Yyf/xXpMyTfAvwOuBf8jMn5T2bcBIuxF7WEQclJm/BcjMKyJiC637MWOaLW3IXQDsaG/IzB3ABRHxpWZKak5m9gDvjYizgBG7UIMDYImz9xz0mkXEVcBdmfkvu7XPAa7NzKnNVCYNDxFxOXBV++q10n4CcGVmDvvl34aDBlVEXJiZX226Dmm42l+eI4aDBlVEPJ6Zf9B0HdJwtb88R7znoNcsIn68py5g/FDWIg1HB8JzxHDQQIwHZgNP7dYetN5SQxrp9vvniOGggfhnYExm/mj3joi4b+jLkYad/f454j0HSVLFF8FJkiqGgySpYjhIkiqGgySpYjhIkir/H+DWFFYWxutlAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] @@ -5675,6 +1751,13 @@ "plt.show()" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "code", "execution_count": null, @@ -5699,7 +1782,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.9" } }, "nbformat": 4, diff --git a/scripts/__init__.py b/scripts/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/test.py b/test.py index e31890f..2a105ea 100644 --- a/test.py +++ b/test.py @@ -1,5 +1,5 @@ import urllib.parse -f = 'Pandas count values in a column of type list' +f = '20. Pandas - value_counts - multiple columns, all columns and bad data' ff = urllib.parse.quote_plus(f) print(ff.replace('+', '_')) \ No newline at end of file From afaf463a970cb2dea4d3e82f68867b2189f7ab93 Mon Sep 17 00:00:00 2001 From: softhints Date: Mon, 16 Mar 2020 08:03:30 +0200 Subject: [PATCH 54/76] Think_Python_Chapter_8__Strings --- .../Books/Think Python/strings_in_python.png | Bin 0 -> 34991 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 notebooks/Books/Think Python/strings_in_python.png diff --git a/notebooks/Books/Think Python/strings_in_python.png b/notebooks/Books/Think Python/strings_in_python.png new file mode 100644 index 0000000000000000000000000000000000000000..4f072ff5e3572c10ba7107ddf6a897d141309d7b GIT binary patch literal 34991 zcmeFZbyQY;7cIIm5Ghealx{>Mq#FSN>Fy5c5Rgul4(TrG21#iQ5Rj7YE=5AR;Vi!U zoion4W86RQKX=?`yzlTCz~0aP)mn4SIoA^;FDs6PPJoUe2$sal7m5gSO$9GkBgba~*A*Ae*vOevsj;k?>{EPOT5lVWee;wr} z9?H!dc!W;t!8*!M$JE;@8a6dYYE;+hgPT5A8+@NZ!IKxdiFYIW@PTmDR(Mx$G80u` zl(8^Z%c|#$sgvXiRj9*CZ;E@s*VSDVa$lUn%Y-+#d~k%(B>eA9qmbQpIX~I&jHX4= z|2Oxww6ye10Hwian`!hrcWxnqf`at)^!48}+ILs`n6zsVykV-7A3HYWG~j$Zev ztE*ecsS*z-iL_l^UZzftIXKz>JKf-{mR?p;Vr^#jdtqT9M+zSg5AV?nVb}dNyyLsY z#l_4y{CcdAwqX6qS@HXJs`sG=4OwDk=_)jO1QK_n5@T#a(+YCidg( zcsDCE6PuVv`LH-HE>7%&@7dWIkJo)+ETY^hp5vWmDS3IEL~JxP_0Kk`FvvM{4*|g^ zPuAwXzVGbp9KbMY4ZKS!78)_LI%xR^CM%vX=hdiAgHw~|DaU>~9|;o$g(2L9({|>D z52g4A7gtxAB(|WqF&yDxgEpg7B7V23#zrEfv8m~^?JQ$VAL8>>pNpG&W2}4;9_`JI zK&qflW2d#DKNHoKO>X-X4jeX9FKu6StS73> z84ma1%7lajxL8F+MdUv(LPEmZ7x#y$X=r}5wZSe(Oh|aMIi&%=TWHWPSNA^KZZFaL zjLdCJetz`m5e*H^z1c2j6;)L+F)e149Bj-HvB$-=^-tgI|8 zBa@n%>Yd-aJo(uco({GlJP15p7y*0Xch2W8(U0=6>zy_vy?z+Ae;D?nSA4DaK2I(^ zianj*U2LU0jX$|>m6X>>`qNThdwW2!)820d7Sq#%4dK&@PoEIP$jC?_{St`{>@iZ;RVCHq#as7Efw(V`GQmd|Hm}Cv)1OVi8d7@hNI)xgV^LCnhFRaF*B9 zpd!S?#4twqtFP~GRjNPz`rh>HXgj+ZpPc(GC&^#mdP&c-qw=c5{l8@`ABcGm7Td#z zh%UlI&vxL1!*17Za*J(WRA8;s-h6xg7VJK5BH4YJfq?;S9UXYqFD8|pM6loj>E5fo z#HOaEiv45*KV#{|KM)-sY-m}0sITWqeEj%vt;=@i+~ofof7G&IIx zuhPn`pcR8oo3COQ_{?ctwl@X9DQM9KmY9}Eo~kgrQ7EjA;G~;n^WT%A^~uOhCa2!Eu4P;nxFN)T(++pl4gu| zU+Rd2SfChiOL#AfH*dri=8Q&=GNf&y!P%C9@CK|BtbruzRn{MRo zwY9Yg?~|hWV_%jBJ^dtgNaU>hFh`r3b;j!FkKx!2trCYN@uw z^!!=I^tNTcr2FLs|31^>XU`(L*ZZ?>CF0`ZQrgq6m5t}gk+RTIQ&Y3B*#8K9WPB0F zpym7F21&U1NVcZ=!NCDCx4!PNIaN0_G&D6e1)+X_b2|AcJ1eVE<+$)^Sa5LWCL` zaqUYIyQSCfRneDj%sImny1HqRk#`-PEa3p9r_V-Gio=9HPYrFYd-LbsxJ6qq-oi+R z?7OA!(NR$^N)M)EP?4WMf6mX(3rpEf)e6N;jEx1JN<-vYrQEYK92U!`PV#-YcJx`1if|5 zw8X{5E6hfUHH1SXB_(;BH=o+jNth#F49mZL`{o-r)8HJ=CJmc{hSAE`7*k18=W4$* z7u`-XzJ*0_|K2@{O^M;~+$!(ObG0dpY67#vd>_nu$n1!J%M7cbxZ9g#7X zOy+i=Nt9#PTg&I5W1ynSKJkZm!6t%*iP`M#N1pE=^TVW)(`xKF|4Xe3F0R@|G6?ib)?n$nLM)*PBDxfOLFgRnX%oqT3e60k3_m z#f62JMVd-Vtb&4iukXeVtaK+*vOIZ047*-MdkL-co`6&ly_(u6gnpPa2rFNY;mE`@ z>Ll?vTI{a$I77g1Z%Rl?!hXp7kujawb;mezw=?W1@={tl+axX~Cc?_Xz@Wd${gCwJ zIX^!?3;AZ1COj1NDxUNC)hwuCCJ{xewvthnL~7Jw6J(@9yb&|L)z9 zhX67*KF**~;j0NxoXAJs)z>HJc~bRDoS4_CI6M0Wk}I947@s6ZN=mx+A)PGJTO6tN zI_F7D;B&3$NIT--;8=HT=&TEhGkfoIs)EhVCDMQbQXlZG`O16>a6c$<-8By9t*41$*;t>6!=isQRsJMq{ zYiqa0`YSot`?|QhM+dp?u7nYB@4!)SYikoEcC}X5&`?;W6@KB>Fd-!)bG|Bg3!A9c zKFg{&-gBAN5R3l>&ZT<+Py!NI;h+slh{K9a3&I2gSt3aYA3 zZ{DmA=JkDfd!5&5{qgpEvfoZHS)pFi+~*>GIPO2f2r;~p%FD~6qX$13e1{n*EVH$- z5dgS{fzQb2dBVoQ5ogiL=Y7G)!$W|aK+Y`Is7w#^rFg^ooSB1#d{3r;NIG^`S)Acs zVI~zDJNr$`!<#{TE;yNYYX-94wHd+&vz}?>rlIj)OOTG`y71+}o;k|W@Wl+ehP>k+ zupk`(+=orX^;~Ddfh%RTFTG5=ZeV=8P9~WnJG$bA#1`Hy8ylOTYlcQf-@ku{Nn&D2 zzpH$lsV@?ibZ~fBX7p0<~(8YJ}-YMnDqWB~xYQPk~BDxb>`jVMK!CdZjsEO|yrL?)CIKM+-e*RG% z#m(`oHL9Du`nQ9^!!I{i*=%P#AfE%M>FwyK^}cM*%cDj9xio(H^86xv54TMGAttJM zu!1OsvO!>adb;wh>$pKr+Rx99D{CF|MTIG&$dtGAC8K%1k_YBXnIIq2RusH5}Wxw*NzvPoKcsi60B_Pm*HSzSjLn;t~|J%&MVgci&f@c6-i7bs=+WOYk<^12L%NU>+!0os$P{3Vm2vUc7|Bh zS!lx;>aBXTaCP>e;%aYdQ&W?Uj!rU%bzx3UEC8^S6ru+Y0E3X*n_B^Tuo`FiYMsnx zZfI*u3V0b8cd*j5Z>GszDK!QFWN28J`H{diD=RBV=XEaIhS&kDfHL6shik)95S7Wv z=a04*|NQwgm@9+J{bI`>ix8qcjOUrzFETjRiZ}Gs)sNt$$I`2r7#kyB@)fe~H{8o` zSnd7r;R94tR61Nq9M+O<&Vftw^BDyN8j6bc0Eqx;$foir30Z>K5`g(}*NhbukFL&VC+L&*K` zRzk8!mpCmxUYrGrJN2hfS$wr`06=OpU9Unj>m$o*(*04>gx!>em^c-d_$DfP7nv;nY_ zTz8hn%8lCgd24EFbgL~?w6#+@w(;E#H;M7^*eyq%!fXO;I=_^6w1fO^{Gqo085*o| zh30;mw6H7}=eSLjv>y5=lhwX(nh@$d+ zcME<{-^IaUc3M|24CeqOxC2{b93l%T3|z!ij-8!7@-E9mOqLdZ)b!k38jaO}d)N2f z(c^Z{@}C*~i13MxiD||2mw)h79fyQOz-FqJ!+PQ&0fD0Uwd!|;g-;TdMC85_c!kY> z4-gjP?Xc1n&m>BczI2W%?5!s>l)e%$Lm3>g$9%s!PV-=3aBy*P zv9h89js+AkSXfva>+5E&v*lL40Re~W|_~ETK0tDua82P7pF|q^U9rlec_>@-5nkBs;V5UtPb|}h2-KC%3E_` z;z7^Y*{$WL-C$M4Xj-uj%nS@N#@NZ&vwjrj>dKJa+1uOe@Iw(SMGh`ihlXcYZ>XUp|*pJ38m@= z1SdX{JIIm81pws-44P~%|MXc*mw(14^!YY=UwU;XJj@6`-MsFy&9 znQaj)I9Ytn1tqky0F0Z*`@-Y(YvvVJNnjpybxC6TAOSjDoVj+J0+^~9_~rbXE2Xx! zwz&8yi*{f}M#lU1@2{!ZUsQ#LVk(ZC!E8f$YH2Cu`~r=gD`5b*A3C(g?LZvimqPlI zB535m5BS zYAXYekW(q@khYJ{eBNbBT6;EoQBGXz=veqo_&$lp8YP=nZJ^rn_O}=sriYmN18#%~ zo!Q(7>NLu)TIT_t98Sb7{COSPbLA%Cyi&zDr7&8C+iHN)1Ym%dl9ZR>u+ck% zPV5%%!tGBO|GTRXq(sn$)wCHo=~-zhDH{|h`6-P=R8;WlDWf6*`a%ji7_T%<7xZR) z@?_xGFJ=}Ni_y|)C>^Gzr~g3Nb5F_~BKRE=J_HdF`3{WN^)EmW$fgUVMnuS~tB*mR zfP1*Px(@R#0O05c){Byoa(8zZDn(#AP7k-rG8K!y$MmhXcpUGHlxWe>(LrPamJBGZ zt=X2A$jEB@N?uj_O;0LxP*kG<6Tqa*Wln)5R^e!;-XWX$(CAM@NGS0Brk#Q)=gOwvcl+dl?xUU)lBZr=_mAjEstoPCca*zt{b8)6w})^z)*k)I-wl zwRDcO8ea~895>0!OAhc3*4SxH! z0BLfz$0D4BU-Of`kg;+0FAprNFg>s1<#53Z93t+p_Qm6)BOo&J3kvu>PhzO_g&?`5 zrKJIxh>5xgEa&X$8e6TyX@9oeE-TB$p;lA4oCj(8!}`mishdip=S`t>sX}PAu~Uaz zxO!r3&7r}bczCBQPih)du#iPstUI@-slxb2$hf)p=g(JdXB+!|{!~y^WMQ)y`dVJ1 z*2h?1&+?>|x>kF$;WKF_h0&nmU}dE}Mo4fK8#_aCazj10MP+X=sCCE)Mtdv@+TT&h)N)(?a=jZ6}dKaK8(n=pS+KmpYTXev)O zHPf;7<;2V7AB&KTAuruTU{yHT*t&arX&BM~ia@D62Q>b$_g~;IWfGX*Dk~5?d%*`Jm zV`DW0e63nKIt(ZKlT!G$0I#f*56jiNs|n*Ij7Cd)uV?DDME{^2)wTHf8MmELmka*^ z-Xo>+?A(H!c|nJ>9lW;#HqyA4S$ggLR>W_r5bY0JvkNKT3=+szG zisiyw)|>THd<_|;l}m%vodsEwhvx)fJ=7O5LtcOHcIH*Vgg&@`ADDCN+2-lq-Vfc4 z&Ra7bk(7XObQ@jl04CZk{uKd9G)w+*1W0;?R(!(`)=JyP*&8gKi+D( zj=FEZweZxh>9TCqzrH@H`K_*|MSbg`V&(T+eHX{4Vqdwb+#?$dmj2GRoU*MS*gaw! zH8h;Ly~6tH74?Gy1RK0# zGhbDgGG$YVbE*afy=PzUBh&l+D*IBK+1VE2YR9)wiuf4znsPzi-jQjh?#y2hzId zG}rKZ^yzNvg~_H0jQr|A5Rw`1i{)Llpsn*S^9)~k)#Q>`1~O8u_nW%>g9`d{OT0ZK z!p15*MQyiil$DBORd!{xF2|Z|`H%ZpX_>sw{z*)H)ovtk+ms}c8rBhM0h(1JM-A2q@={f?|vp$e;SvQLyZ`l znyM))%kSznx&RA?dX+YDtV9)JVq?WaAClYa85kHCe;XZD)8+%(jv5!_i3gg-P}UO> zVIny>Iay+1Nz4Y{fNnz32kr*sSKL#qKail${(WUd^y$h`u{L!)Z>p%oHlH_k>{S=d zvtY@p=1LP2Gus9yB^7q$MCXRSvA4N}cnDmM^xPW2&h=lr1Angf&LrYbyh3@@G*f)n ze)7W(6E&{7s^n+x?~Dbt1@Im`NEdX+LWYKvX-~s)c~g>nHw7hVXk;EykIMl^Nr4Oa zghI__cLgi)Q%#(xr3$vh#Sf#dIEFR`i>X>WX=zLZ6byEYp}_9Yu#%I}nQ|j>JHlZ4 ztULk_*cGr>9@Jsi?PDS!Xb#6iwgHKt0mK<^P+XzNLOzNFU?+2Q*1MF&&9we-;EF}& z=Jbm)3K;EEdrOJKiFhy+xxz;9Nzzg&C^J#W*hNSBj(z1FotMgJgtRy=a39_6Q8e6` zXefPqzoyCMAc~K1V|_wFucc79skoSrk{{M+^5b3Wt;GjB0s_w#jx7;+H8ny3#cyq| zKyR@DMDiOV-}zYpwP^gU zwe`gm0Pb9hRVBPjTpEu{VRm4@m(BoA@<9TH@)`M9QsN26pm3YfwnK>xF?&UQkZTsA zs*=}#XFI4OP~t-Q>_kE=@PB#%M)kx+MBa?VKT;$^5WtBq{}~B-9aXutsUqr)Z=4zq zobH`1kt%9yyPdAQXVO!2(N4>Vd@I5`Nr2r|AL4RXh9*UOV*B{){Vmz88M}x=TefG< z)&|QaGutYuPL(cw&rf9|f($56{{H#%^5DD1(-FY3AfYA1#x8GdZTre`ecZxRIUfHbhZC z6qJK}ByyjaQX4aVVk9R7T`(fQs?2d&B88m^{TSs)RH~;ZtE^>=9^&Dp^muPk4}KTh ziyvy=apAix5PJ1$S#^Zr0TV{VQ2T}NBPBogv}&titFKIh=egmLVxK#h(u+1L*leu$ zjK9QuvUI)?4@?y?HowEdZ@-O(QR#!|e;dscMfr0nT$=k;T9VG$W2 z1dPeQ`Ahy>P>z8vyj<#xZVMyp$N7~$kgvN5aC~8e?(t(BZaqCcL3VaL1o*MNJ+GBs z5d(vHh;d`DE#lLw|)R@Ah>mob*{(}&9lVUm2EgQEJo}eI!TZidx=W7fB zO56?;hMw1T(#dC{rrjBInml+}SqYJ(&!1i&MtCq0P|(x=1tSNj2*SJI>i}vN82>do`i%%u-w_(`_k9<2J&xrx2vm*7w8Bp zt91~54%Q>vH;bq*t0gTThWf0G-4{Jepc6-W(4goN%OS6&IM znpKba_&gwvNlNae&%6r1h&) za6WqySH*4le*Tm!wIQY+BM$3Z?MrFS&aNG*D&JnKmZI7(7qr-k=}&JKm?Sz|4OF;3 zJ(a3iRrO}wOz_p(Z};%N%cf&zdy*E{^O^BMHL3=^ulLz6jim7e<=;Fx09G}51bRe^Ur4uLwk+5OOF9}l7=@b^GtMu~NTesXoHL{7uP!csYH zNf0wLvlJdjNyw3UdY2$_Q&|Ha&i_nd^Cx|Na`4d6{b2=>pB5o&cQf*VTWj>0Ez@nr zLS(w7ob;%kVmx{@ewz7XZFjj$Qt8~nWp4*L&&w@g%Xwmc@%gE;R~KycSwro6+T?c z63jt)uCGt_A0be_z*AKfbixUjwNV;VN^b z2O+VksppXFVPb)6PiS9+qZ2amgp7=gkr5m5@bv7ISC8?-zz5la^Xb#4;57gm97v(a zHRnt6kr5TYR0)ZKg%ODh+ek;ta^f3!-8*`hGgSt6kl$+-dQEmiBfY_!fj*ta9}b9% z^U7FjXQ_}Q%Vmxh^X7&6dm}#%tZ(h?Vm!hW!@m>1<~&S6S%f~N)SMfcLqM?RdFpmJCX--?IEomli$fzEZT=8(+^&mvWsc zMb5p>-Cr8vLw#dB_Z&yB1q%ry4&&03(?1e*egI#Ki0zWIfPigz-xp)UM4Cxm>es3_D$1qKit6hmN9 z0h0xA)Nm^-qQ`@8*Dao>7VEXReP7)b74W9n_j*T9UsGN5Iod9xs+yh2k2vm!e6V0o zgRL_VKw}BDjg9Z}B=fsKqbD0=b1U$nNNA$s9x%`>sOx_WD>4#t}n5rJbL zlu=x~4ur+lmLOExkau?Ghi9!(kta`{fEJC8hBgM0x{Hg8hX-F!M4RDNN!Qs~08A8s z#$j;b`BYR?0D%%Ir&a%^pd<2EXlSTx3Rk)3sS{G`w2}1rvon_GigAIcRM!f_Qvq+u z-p<~Tpbxm?JqNN!lq~&OYWU+~;_m_e8!LNZVeLD_A4EUHTMjNsRk4yLLOjk-!Zy{7 zGpeed(z7N@6Fv$OSiK->|8S5S7B<-OYGq}KmO3Xw5{Gher$JKNg@&9Z5w+Vo-0;zeH?hlB?4ayzbcF2wFXifQ&aIMobdLp&Q61J^@)jzVWmirl-^#S_5K77 zmu-ipYuVNja_PXmpnSRTY=7@u`aqN5{6uFd#0+6(WBK>*_X`mZmMWvkx1w*0kmou& z8)wVsIt{l48RBm#M+;YCI}@Ymd(90I1phA8ZI=0g`=`IlXn&#&AQ+ z%D`ZxLtgJ)dbcr)cql=w=U*>`tM+n;l-^H|tIb4>h+kXfc&aE~x`8T9z%Hn9Do<$@ zTd|3EAWJ+P)*2bBcPwgXX!!Lv=Gfyx&-+|d@HN zf`fO0DjOQy>jsG<(QN@)p2ZrW**1ZfT8tR*v`0T$S8qCw`wE zzT4vC@b-q+jNMIxb3-wz9ZM%+#iG-=h(>dvej0akx8Gr+wpAj;TAvE zw0B|h>92^Ub`+mB!wvpVLUCb3pKWKN4Gh`XhN%T)%W;4Nhs_zG)@^KJ0sy-a${`QSPOCNXtpRr zk)Zr=?(N`(S4d%>p#8t~$oH2;KNDFT+4g5xf_T05zIB_2bS@5J%#U!R=qs&~XF{H!W^bG85dDk0}O`E&=OKb%C?R%;B5tikA} zuTH~vMZ@v2DvPOB(ERCXd6{TVbaZu9RNhro4D>7U$`BGtQTh}NdOA4lTxd53eJ{Wq zcTdTF7VMwv_VPDP83~JLum@T8nhMKR*&?J749*qM%iq0!4~)3v<}-AAJG(@7%eaOH zzL}~BENS!oJJFqd_J$1&0lWUy4f$_FQJzD%?D*82Tyr5YbFDBRKuoq9j#D@BC zBl*yF7VVDV=*W!8)&ObbTu z?Oep~+*_h!aSPFcj6Mo1nPFQnaAL3?-@XY0Wpn-db$9pEbiy^&NPWqo_=lK3y2o+P zO+SRY+)dp(wshs592@$iWGcSKuc&CarC<4ayL+pYW|%eU%xvacW>6wsUEyOU)972= z?~04HiZxn{Bf4RY{4R z#b#-&|7(NZ=S|&B7PD}kx+-2f6V3po#6%D{)b!2F$a2KyopW+>qYpg_3kvET@1ARl z(}C(@4g(P(d80?=|H6Mw$T220wIExyFhg`Qt7lBDfT1Lejjip2|5NL`astllQ=y$_ z7vs!3qHh!xC9MT2gAxmhlE)5zqTl*=R9Si2cb~t#oqDQi_ejam-b75*#NFi@LA5TgHhks|fgT_<;L_ew!D({W} z=STiDny6uBf4lIAV+SUBU&y4^TnL-%JxONII<|Ij3@Bk!?gJ*q!9FsJUWss0b?D0j zG;%kCHW0V|A4J$sd@mUPp-2WF;aQc7D7+*ryNAB(g120C^UJD`5O%miNU%8F{5H8@ zKuwvR?vF*~e`c5;vjh7WR|k(VVs;Xxc`k3HpnFN6q~C^1mjA)U%Q<_gH(Y|{&p(Ks z*}v^Eao{2%xc-=)UI{iVi@9Yr7}+h%%sjmLZ({khFP4v6?WCb=Dyymj(Z5#K5s?@i zjE?$VXpx=?>=Pkvii(*VPtp&(-y`h(upT|sYIoBnq-$dfTV z@Q|&@^!!HuSeu^>e&Dgzz*3|()`Oq_I=GMOuSK9qo9_sV#=f!=o7`8B<;aTa0=YL> za&GtS4+K5rp<7r2s(;nZh{#ShW)lmufd|hA@OiwwNy5`pq)id(w(c`wnlKb*mWN+5 zS$*DVGwL+NeLR`|@dC^RkN*#+!3%zuxP#Sn@3{DQ!bgwJcecX`6oWK>A+#QV4=OY`xUjf*$-a&VTik4NY;0_437i(U_Z+|w zW5oPW(He~cTonl12#E3skCA%EzmOEca4u`+my(hKLg~gtwNx<5HIIukjl%SBoD=iq z<@JgRd*oODh#_4+_s&gWc@aqva!){Ny~@Bk%^n)Rn<@H%gZcw<5{S-SD!u}4`2xYA zfz5;$Aos<#FQfRpV%`0iQ~3v5QZtR;Q$thnr}47}N9QcwxF_Calhs!YVl9zV{%2>- z`tL$msbet*Ys2(W`9#FTqARRIQ3UZ%Sy@?WXjZ`U2`u-_%nUHxf>qqfJs$!CUrI_U zs;F3lkl568?#wyA?E>Kz+>3@RiNJz5+S_N^MR4wkNJ{gagP%bP51YGQ(p^_>T-Ng&2Wn46le zLc7V;X_AnT00Yf!42&$XEg1=kXRNH(5l>G~#qEdq__Rj~Ae5;$x;#Tt1U%h-|H=R% zd$Qt{6!=oB-<6bXz#78T(=ZM}4RU$%w+xi^p%k#3GHKO3g@TieOdmXV2+pHNDJ>RL9*HU$g5nYsBx0(M1}9Bbd_;~1K?!Aq<%3}I08CL|C%dx%LOPWJ-qXC14W zptmHdkA?${xmo@oF?JyZN6r=ZE#3O#2l)K{{*kcb8;9)y&jCd8?cSx`FjeY(MjT57yCe@-L55Q&}c_Ih4SSVzGi0MgWr4`VfVS3A*@-w=fk%apV z2gQP;xx^!OpeIL5OB|h<4nfn==5dyv!&K8$bmH2>L=reSye@VM3+VW-GlGn|ITpE# z{97&bOXYF?NR8^YzrF!y2?Z&!*6b$^k-lP1`PUf{=_D3(rKB^Qew≀{djCi}$5Ka!(pq&%g@dHZyix&x;s7iv;hzzs|!@U*qw4J={JrXa!yD4lxGF2Cw?Te5aVSG*Pk+ z3L@fvkVBu2@?}-7)7eJwnh*DG?k~LSP1@5u_4S=w)|f-T0%gphUV?fP3U5|KyTxmI zV&X~;ywV=PY7l@;3{qjM^-FZhPF@x;C0IC&G71QI!7hQuw#*feMFnbd9?Zkgi4_MgGcyhXs0oNd=m6l^Dk?0*hWIl)Tn(|n zUdtpgIeNC?^!L(|G6G#>B+U70x7uq=R>FpdlpQK|4UHBzr(JhIlCA$9Q=hV~&0=6E zIoZX9lX)|}FFZIn7*3>w1i&Q-IfJ94oE#j$WI6zW304&o_E=$%jX9u$V~Af>S$PW9 z2Xut1?gAnrqQ=HX8tH_&xjFemGO&|TCiHp5l$UQqrRgiFtgNB}gXOiJU_)H?egp8k z;(Sd^Xb{XtEWSTY-=)`p{td>p%Qt+ASMemO4@k+g!!@D5BMgOQy9Yg|p6(ya&4+6e zPqU6ZyvBxk;~8f_F7lLhO3TglhxnA%#P89LH9R~VHz4cxnoFfmTcg-tK$h-vl$N}F zc1}q_rttSuR#L%Gm!r8yNGJhott$pb!RFtc7S7pr+G}5wFU#HBc+v%_w`QFeIo&Q8yg@@snhx8R9pVt9kaQ|fHDhu(0lVaqj%48iRB#Hfya7}7P#)iVXb95U9KqkjLK8u5wDw5 z>+a|~RwJW@Qtf(}wh`inQOrjpYLX2!Rmz%*z}b4~)yyBhBFw++X7tD_k(spQdj+v* z^K!Kn3V67#PMa~hbA4{i$>d9`9j<@p&hVJ?H_}K7FW?^rWX&l;^Qj|a>}3C0WqH1pugrX(;~5tufD$;SL;j)JdxDJptnWDoXB63p5#)E7G=o> zSs58ft4|B20qg-SV?C6A_x;xn$UsNiM&{;pb?UYq%}Sqo^_hX9gBP@V*vvMcgMI=^ zFL-7n3#Xw$%h1?(2^_Ux5>|&#v<6E;qSvp#JWd)Z>H1VP^ac`QKf%&q9#rn!+^(G8Ha*Y& z(ecucIxiKU2XLHzAfz}QGqxH|*K$G)h|>LmFxyD`{Jbo~l9dgQ(yfF}BIBiqKQL1Y z4laDU%EsM85-ohL_siX9-RL&dWK*~ukHcpP35LF-|j8db3&+0$ZV^jRpalep-DkE(!JN#`_lp@xm zW8}s4o!r`5U+LFev7n_XPNP`%*)A|qrPtLv^10gFQHd`%7__C2v(;@exfR}ykx zPit{6@=Ka}I{L3`0kg$q6rj`KaIww~-FRe6#*knljZizE5D}ampVB~8lGnRoZN1wf zE}qP}sZ(!;g%s=H@!9R5Wdt+dPU7(_cAnh~5SEJ$80qp#FDf{0X?Q_Hqv%m9J z#-?3v=qE1~13ejldoO!g&q<vi}u_luLtJf`R_{e1TUA;^eBmd$K#BM>ikn*GS*3SEHkS^QJ0Ar%-Gnh2`15y&x&r z1}5%;9=|aY)jyfl>kDKRpIP2Lj(6G_6Sc{C7alqCv#<`_Y|0$gqcgolrubhkCCXwE zM3=vzc>I&H8Ghx8bVY>?3c{3S*okE!y39GP_Bj16_#wz}DH>`Y`+xxrc`ht`tzbfA zc=(TAD)_?LVgBI`^LT=3 z$x`{@$6$@XuM>tIRx7JwOv)1!es_0JQ9tIYI_j};(q2LogavbQAhjqStq~u+3$n@+ zuC%~xV7ldkB#@6qMKU($K|`gUDgocZN$#`R2Dg3o$8a3-~s`xz5YWdwP1B&f{2^{lcWrVL5&*=1Ml1 zCNF^2ae&8?xg{kfH}TCaEw4ZK@@gi+9uRcfGh+c#`T#m$*XXUrfLi0+^BVH)q%>UE zl1}HA{pw>s{6D<_zc=;%!)4(+n4DW!F4M6k*iM50*r+FGxHL4;bs$9mM~($KoFMxXknE5TTh0%(z;sbS zB8e3Tg#ZL{5Ej5=n46g?eAQEY)wvA3$?s;$GN{Ox&{V3PD@*wb6Xk@kIp8ry2L^W9v_*JpKGnWPB%A| z!?@$7fj`ul8E0hsdt>m%_+vg=0G9$Z3mY3NYifdma*vp}3Dgt0^gBjvWY^O14#?@KMM=r%*Ux~aygGXmAD zG=txN7PUI=Mp3W?M?;5QmH8liCjfLfUUja)=?1NpL96B$F|l=PE?Z&36X*z}x_Z4> zvUvL7FIZv|UwW0fHq2tVaZFucaMa%w354vxnHn6Hr>A%f-`FD zV8`DLZDtxmOJrTx*x2Cw8m}^s&dbY#@zvG2!UTJSnWPZucBTChDWXjTy?a6352 zJ}ClmQSx2-`id^QZl~X_0fpKTMD*bJk^$q8+h{g&#y^!LIJ0CvF}@H#kgz~6DzZOg;M0BzIILe$fenv^sFo4&(`l4982MaJ&^ z0;rnMhFZsNm0wn-x62ill$7-L?ORZU$d|mT($aK^#W^zkz-*H|y7e~OlI@2a(cSvX z`-s(at!>4u89)DgNa%9Qg-lvi7_ngS+p~tu=3JsUA@Tx6rLsaE#ttTEXpH$v0;N5C zxdNBn!V3(HyG9s9yiTFg@i?Md2+hf5Nqq6n;3aguUmFP-KK?P zWo5!2MlL)%Eo*t#l$e=GQq@*A92_wES@ldQ?{uQN zytG-^Tq5zZiY!V8Ztst|MLUjkcN5OKd)Q=dO#vN^fe|7{a_OD515V71vE5O!bb+Yp z0=~Zs{SqoFem$jJ_uUA<59s(&*P7K>U9aFnkl$~1eR(DTNZWR|?Y0E4iT7Ik6#y&t zWtK`Bg~J}YLdo#$48Yx7bxc)OcGi+ZIDpCC?9U1)D=T~ZHdHJ*MhlBG2b#66$SsJ? zS3*tC;`R;Rd<`uv$R(N@iT=({M!<^UVfy-a@0@&qQw=!bh$ki=5pse9mQJsw$vfR; zb8mQWuQ0OHOGlvZoAAyS?x#=84ZHu=!~PyJ>NOtIa0t-Tt0^fpfes3mQ}OWq$nl1s zZ0y4l7!rEvy(h$;*QeXMuBl#>CxZ5Iyz0^+*j^wix_$!>uT9Rpsi`6Iuo+sX+!pnR z5`;E*t5CMj&ta`DEvN4xuN|XLisE5XOJ@a|t+ppZ@JpqP8q?MuV)}zQBrL24+#sDP z3k!yf@kK@Si63}^-zs#aRR)`OMp5H4YUc}dXJur}L#V`O(4^|p6vqyTj!d+$I!I;J z_wq9MyRr%1AoemyQH|?|u2SB(S_w)zvQ82-F2yyu3_NOOw#tY-I0UAQVM#A`_))xGUA- zBO_yzqOg%3E%h_A_}0Mo^wjC(%U?A2z`&E0=4t1Y#o!aQ5ZrfL!+P@ap9)!gm(1}A z18tr~iWwG`V{-b>O^ruwqJFO!inEVwl9KS2>{ZlJ_xXi5rc@{XuR~y3aJHSUz1-Xg2oAn%k_J0NO@+mX_w&Aji{_GI zGhTMK=kGmEI#!&DK%MTPhkYus@2Qi|^C0Bacz5GGpJ1E=Y^A&K{V(KOmibKTlEsW^ zj5>Tes%1LeB}=*4UX*2b`-g{Zrp8`~s4{&4YLI%fLin)57MuJLnx)5p_=>y^MU#`nn*&b5EjQii&2JCl3)+`F7-;Klb$N3lE)Z z@;Z`>SG51!L*x;;Z+y*-+$y&_T_QNb2?KDuVC2p#Uji;^^Z)!9 z-s4cfxV*2%Q2azYPjTf@YBT5N>?& zeFw*vfUmKN`o9xnG50T#=l}e~s{#IZM&~m^|2tp*xezCp4S|_?cr`O`zpMV=o>Y~O z;(yIztpER-=lCu%U*`YO-dBZHxqfke$PtlHRHPA*5|NN@6cp)3kd$ueE>hs_K@FfWZLjl-U?MC=d54y{%R}wor_N~uJQ6@cy_jf}uIn}_%|9xZ3 zRZQ$eOw2S%_!=%b$?5*;?oW9sQqpSC)N2*}Yq|8gx&gj85*J6e5NFLd&zz6!<$C>h zZL8Y`r{Lbri$0t@ZV1+hCbpl92ti=cE|g^CL=zw6yvgn*rEZu!82q zsVjKad&f^4#&fR*V^h&6jm^}{Slf2UN>N|(TC@Z<*EkK3-VQ2?da09bZdPG2QNhYx zNl8!7&Kaepy~XLyGxe0p$Dq5&W`3!n!a%+Er?vEJ%DSDO*h=bxT+VCCcR4xPET>Bh z8^9zUbdBm+K!Nu)_E(P(6DYKG4WB#l*n>?!#DSRg%c_^36-4_eL#O!nh3}Lyh+<>W z-fGO$+qa6a!#zzD(O$Ot6AhH)Lw>yp5>7loW@ee2sh63hr|0t9x8PLkClhCsomg(L zG-PFEbDB=q)@9O96;t|{u)7~xFkZW6DJWQto=kiwl{=%(%R>Z(n!}n;UH&bzw(66k zvdzJdTMt&i#Xn8TYkXsv@BU!p<5=lkp8R*;J^$lkMLo6R%wUk6tR-`$paSE+xtUo% ziAdt#r@l)5)vNZ|NYnL!=)n%DV~Yy53=SGJu!6V?g|r}_q@xQ8n1j$LebgkS9dO4L zNh-syMcHp^vazPO2h^TvYI%&GDF1$^^XVq)xjG=p;NVS3m(r1(e`+HXQ&30k6Z-76 zs`>u|8!futkUoGHEZVjHZhwj?8$LAa#l?}35_6kPH#N0X+?`5>OPXAS`T-@oDiq#Z zY70%hnPs3{26d^2$2rOH$AExSaF`>nf$ahZ*U_p671h|Qs!p#d>$#)m{_v7DDNsE) zt#9$!?1~qD)-s?V6%ikYH-L_CA2#E&Gk3NZ5)$puyfinbcV3n|&Dh9jol`V6NDy6j z^WR{aLvF2$;OC~?RaGgvp?-HgFZaehCnj|ZuY7+8#pU_w5B_dc{{HBA^lny7_hK7_ zgQw|bM3CnqO@GdJLptl0>HNfZ+j;*a*qt7ba+=|Le+#w3T};0oMu(3DX&fj4AXI!t zMn=O!Lt*C-MF7>&J&0=w2?>EJbp_?31WF780p*NQo;@ic#h;c|H_4|@!&YSZ5}EPb zzjDiHzxno2pAc+7kNfcyg`lCCsj!Tfe}F&LKEDG^W)nnveE;r;L!1KMjl#8dRekDB z@GC(Mhv(r&>_YO7zdyht#mB|H87PPJ*IQ<3KAq{=imUnc0+bHLqPMx2M@%>#lJeNgXm5Lz~?7V z-a-anX67bYW#{I0rkdr8d=Eph928z>VHS|mWexZxC}Z^u z&CJb#Mg^NsWJCn@Fqv5Qy4er%l494X`+Wj;=6{*2|1z0Sj1s{t4~>k(2bhhEpp)I- zTQvS2`S6pQx_WwZppa@myeWK!5*N1%pd)kO_c4;a`00ESMz0V@1j z#v%VU|E#Vq>71Fv!iEd8F6O_=!g#_WWWM52HI)J zj)Rt<`NreeE}(WHok~GYPO^7GPWap&rTcU~-c1h+>w_j0WbAFrd+iF))q?($>nNa1 z{VW zzXO?x2IAttcL5tDv=>XjqAdXCh;U$;m=L&c!j0|uw19AY@xqTR zCqI9x%FcLux(>3^XynqsB!cB;~!oJf!eI1XZ0#@ zFAzbn#~pfE$gA@{Q&Up1gEs&%v^ivlXlrXjznzw$S}s#^pNWkv+ycy(9!BPrj~-b7 zX$3bpU;{aD=fPds-3>H2ugfTa4QXjK?WG`rh7)@V`8aT`;R65XGDFi0XB@o|C_SUz z&rI_YC^vF^{QB}TAu;hQb@dLTQuz4Xl@Zy7q_uV4FN9;k9y=Xg{qEQi@OQ4gSZWpA z`=b-wirQj6p=>tR)?h$90-zT3P$2Pc)hlkh*x1+r6h}Ia1EQxS>p3H-;{sm>)YP1U z$pY;)98$^mR3Qk01J4)b1Q414DaHBv`kK5u-U9=2Jvjn-S$p6b^}nZAItGRdJ-f<} zD3}5Lnh+?W#ZpnvK6vu>R{%W{Z>2^ceZ~y1u_XDu=$N=@y*C3{ymyCGIHd>+g^RXn zI4jkaUl{wj>}ATt?afQyy&%*RgS`m1~nu6P+P99-c)+-U==K-Lm=vLk9muhGVw zkA5eQQUUvu?Sg?kA~1*V_S@!Z#L4Huv9VZuB{q_{yu{@AIqezs!=$Rj8lI1wUuNnh z;)I=4E`{{l_Bnk*ZGm0d1plkN{7-n?+dcz#+EyuPOT*X?m$0p6xiG&ULXV0o@O7=M z6hur8UvBRe`UAe>{oW@i&{^8nl#Km=`orRb_G@o>zy0eD zT15LlNZ-?sk)ffXpkW3^vd!6mm{0`z<9NjvhOhO_T`u}BbiL`OYDjUBbZ1Lt(X_rneknQpv&CY}v7 z6G&752Ln#98&n`|Z6A7<;Rga-LIu_#Ei^M|N$Ez({qBJQ_D7F!5WtImDjpnDa&yPv zyajvpceBbPAEZER#u=ex#0)aQPMt9sGz|&2nbs^ z`p)=`S7BTRZFnU+u<`&UJLDv|A@$XE5i6Dz{ws;vEc&Va(geKrH}$8Du7z4!~Yk5UKJKI4(TBhn{}F*=~2 zx;$Q{Kt^j__Pj1nXMo%Q@Ns5xGNIi7#4LbQKYR8JWR!f0;cacgjb*B;+3BBYmUki(V%4 z*5i{h=S`^6Ka&J=V0c4P*Eob1!s(2TiUP?yTw+))J;G_B(D6E1rO!^~jgx%w!Wc@H z8IM2YNxcmK;iFPKL-=@jP==#ELdpw{PSq}rdOdAd*Wm-kKc8EE(ACQ^J$yK8{S{~e zF)<7+O+HV-HvP7rXfo=*pw*5P)fYa3U?f( zBiyk2kLV&mmvjwbYnfG3SGTRwZ0Ao3q7c4-f&^lJJa*=;mJc*GzV1(xfJF@;Fxn6c z2v~JY0qGcT+K;{dynH6l1i|Chv4};jTRwPZ&kgc_=umLQ18%9Mrq;G-3-L+hWMm|@ z><7zjP(?^e-p2d~4#~;hUU~S7*_q^-&Xs~6A)%8Vz0QtIRMJG`=F#P?9H| z$xHW&9lD+2W#M2l-L&%i3DuDC>3Z3u(%bi4=y-pLG|wOJP+yDDZ$aWrJQA2nUK2b&g zGhKL1=#~HD^QdUwp*|lA$ItJ>-;cfOJMw>jG5@{i|MmTDg{1uFFm)Ma{^uj>KrS-? z^kMCfM2F+GB&0;@9ki^wY%GSq`FXFt4vZ_$A?WKazn!54K=O}_h0TnXcd z6F0d6r=d>~UWr1V%u0W6vi#?}5R1iO267m_)fxD2|0=`qYlcI_^6rhBx88?k;8;6j zidd4=p9?p^8;$P0m5+aK<@KL))d=TGH;n=PEM6>Yv#GM-?;C~r!YjUuqJD8Tki%!3 z)%{cn^|8Uu&$7DVMGsJ)MjnD${_5zE@bAUKf{TSO7RT1uVCpl@?0y>1D6WKc#2qBB z@tzuWVi?dZ(y0bBW+yJGQGYA?8Fi$j`S3qP{;G>iav+PU!rDdLwaB_^4Sb z_ys;`%@^dCm(GBXi0;*Rre!`VpZZ$={(jRsH{p8E#MZ@sq1P{N{#ab@mpDo_6)ndMWA@PC}Ru}FrBiK~?&E3R2tg}1%xR#b^Nxh5K zx$=ZsVm;31iAHUPd8sqVBCMTRi=X0?@N`2A1 zgoG3Ky`b7$Z#*+zYR{v0$ZoyR%%UF5QfP!plKTj6@(l^MI`h`pS+#pux*cINF2(=9HPvXEg=O^to z1eoQf4|6!8#WWfssqs75dy&Gw`frj^NpmJ2k>n?o%Pc(kc+m7a-GwI+ojSi9oNv&( zMF50Aj*X2C{-14iN*{c5G4*vly}scXlT^&3rY6nLwdU{e6lsV|^RMF$#yU@w-aLZq zjU*dIg@xHb>)O`V2Kk9XNq`(I&CWgsuI<`2KO^^OkoZB%%USXp%Ijy>Cb8c)!M{K@ z8!Y{Eu--zs2tS(4fTc|I^%cVw1pWy2$B#{0#<}=wtE9~pAUHNA$=I-vZri_%z`dT8 zKPrGcM4Sqmp=`D)SKkBe6G4ZyK}c7mSuTJMff?M3>U&d9sOjj8V0Q?5nK7Sw~N zbYnhy0AgT+#`@<9NL1HH3gcsA%b%#I7HGPh?3*G-7pL1Jd9SElKYX8s2(M-^6GH^F z-gXUiW>DsW*peL3z(I&28qyxWt!ZnOK3fYb!pXl7rXleeLZCh@f!>ddH+%yShzaO2 zv6V9*`WwQcEly8P{K-|6l(2DdHbx4Q7XFB%qkDq@A*|CAa;pq97=2KSxQ@U41Gt9w z$3BDo#^4PI z)Nb&US&E02jL)l9EAs(6dnqWrt~`1Eyctw&S?7=xCx{Mg<)~$~D(2AywNd@^g?tI{?}OfN^Mf zd1EcqAWKoT1tM83K1@*>L5hWexCcn_fJS_N6H>Jx2|6DEzPuDrqq|XJL0a^wB1j{c zb??Nzf-odVonl7CBI8=*bN978J;S8hPlxol2wY!6;2u&1Ni39Bmoo$ zoaHss-l*v43z)CqE<}NV`-zFfNuS|fh!X^=mrQ-qZiVbAu!A74OUJ;Y3W(8%kdSK# zgg^omk(E=??+}dI1(JGMSPBOR29o>KPv@vLnF~uw{M(g{i>*zB6g@#BY5*Sc)Oxl- zsvOqlrY0~MJA-i7UC0Y=L~uF+Xal5vZ+G`q1j3liYQ?;_rt1J_!9;*ygTnMp(HTT) z!bza7ZXF$Uo^5&$EO$ylR=Eg~DOO%>?e2E|sca&qoO@Eo@*WhKd`>oTyXRkH?aqdV z;_W;&w8)L)F7J;Sa&V}*-MpZ!5@zCL-`z@G@srd4`~%cmMN%DYo}$#}pX(;+WG);N zl5R3rhD6>i*Zfdj?doBeB^n-vfk6KRFcIWR^|ZIYs&_wD)$wqEoO@(tvTD!kVlSy> z#AnTcjQh=GrpF(cIe`)o%Rfk9)fLG-Kb&k-HV0uTAS~H_kSha_g10xi=RMAV4B2we z9Rb~@(N_Tfg!@%j^CaQUm|IR+{HX${q;JIHmswFkf#Xxte-|!HF9L$ed0FOvY;9?Q z!QRr6T3aK*EC^rRLn$Bq%9Z=n)M%7&aUm8%u{S-sXl=ZF9lYu^*bojNEZhKX7#bju zcd$^&5MWWREG{-RH~?)E#BT#u?)~kn-EwetwyAgK!s4P$m8MXVj)DRls0~!<05*hA z_rMJfA@mYvIqe=f4(R!1fYMNj`nEmzlAX<~_u~V=UrOGNam|o{21*&U*D%B*zbcH8 z1psP&F=@~ZAa8?vZU{^7v9ntx#f2ANCe4FhpJjd~a3|Jrd)5V#j+Q*M>6E4hrB7(l zEG=b-=~0fbKi0*?o$GS0W%p34 z0lDizR5DnId9yPz2yWkIGwS*T6@Nb_ERbgIpV1njBak3*adzgQ>wU6-MS<9Fjt{lA zw$985Q&N5@E8B;Q2Ddd_M9q1|pL#--^ThxlLJtcO*r+T+(1pgQ%OXWymlvQlM{5iu zW@qiEVbVHdY%iUq#z^ETjbAfL*c-S>?R(rR5bSZj=JDIt@|#zl#q{!dj%sDRlN{%2 zWZ%qLhY*g%PG0V#VEK1ye{g^MXg`&4dExhe5$G%V&c)%LrFXDso6nvn6`PRYo?LLkZnkm+y@ zCSekG$I&7y3`tncaZZh1|E9=oa~!S8Wzt~rQNMER^Uqswz?^i17T-&GPn)k@$H%z- zUBwq}iYUhT-F9VtRB({6LjTWX(e|heXhb7a#MR?l#7GilDbB1w@}v%$ zS`m?d{rrHsecuF&r+a%&%1zm%+Gl(e^+{{SCTT2=gVW!P7EG)V$QG<~p2a9n>YXng zGh9dt#LiMCX0D`t)4;1Y=*O*QB<6@MKrX@*SWQS$b(>BF-d9ccF=XOEx;h)e1M3Vg zFB)oKMDRVq@isnrbWwXD*L|z@{Q`J<0P}(@ z@d^^qj0M=w_3Kn4daxeP)Vd73z6y3TU{FLVgfmCNU`3{ExEdrjFhY1KG)2PkeT`~C z10DHLE=u@F?bq@$w{DY9NeLTq=JrHo@86%0eT@Q*%AVd{7G~yPGF(i|-*>1SewqQe z0vW{sq(I#Sn<^-KVN(Nx0e#J$D?ozqC>Rv0f$9sSo}Xzt!!qE7wx%`N8PDAQO+LQmY3Jg+M3x5xh+T-!+6Wd=o#Wd#^{K8ZlwtGuOS2h zE*pcg*;?=Mg$iA^cs)*E;*VLcPp7}ShnM{FIcyS8rXh{9);&ovZ2J&*ox5B!0lXZt9%1V2Ra|B?2CPq%u*tGG~z zqqeJSVnj6~F1|@#7l9?0G25Bib!DafW@o(mgxW&X52e$wD62$1Pa>z9^`x2Rd!bqf z(PUEi8A|ZJMtkNg)L~^69xP8_oJ@29DlkL9JPB=t#1h5ria=HtR-~G4m+yNF; z{o?XUat0u-T`ZDeCr%fXOL`uP#qzg^m zBM7-OCG7Q)WeomPxZyOGwIe9eF522WGZKE;qdgg>^1G})m3xcaaV8JM%+fz6)qWj1UaIY~J(H;NC(e04}hb$53M1HI*!SyFrl zq=_#`9HEkjgq{>X&+~(kqoX4zseywE?R}grmk2T-Qc`R{<~v(kGr#40{!gMAR6<)5 zm4(0(CMQ!R$`%4=AuSCj-bi`SPbdmtJCDorKQdml>P?w1)NO2RU~m;wy}D`=2l1wcPQXLpffgXs;{bH4?YJZ# zfP5ko_MDuVN%}r}bb1QYCE}`$?4VcZ-`9jNSaIP<3k~^_vnwj>onchB zzW(7}ejfgI%#mGZb!%Vm{QR^}Vnemv={KM_$c3Dj+Pr$Ny!B4q>=BG#bCV_sTAEw4 zFgG`RV=9^2r>MlXHRGnNt$iJ74e?D+H`14UL+|DvN(Tk;<+TwoXnn<>sonTJy!lBmEGgg7XYA zk-$6(X?Lfvqe;`*K`jmb$#5x|-dJ`&3;90Uhk}BM(n5ANHYk5*#2OPmmTJEGRaEu~ zRJtiHV$v`&04i7{6fQCWf!Jnd3%P>BodIA1>AH0-bccp0hDQ`4suu2YmNeu@3iN-H zKG4r}3?~@!H*6s1xApucmpeP(%$3c=1gpVWYll;|f}FM{ugja>rBI!V@#-zjqx#4^ zdVo(;2UZch^WS49<%$G9!k<#&ZA^MvpZpP~ua`6J|A@&zKoH;nhYv+j0B$z;b7CR2 z6sor$!_CXKRp!Ha3Q9@?Lql3swuTV7N<%R)Hs%6QHW)r&OMqX8ku9|>??LQKYA)&r zc)5{JuCno5-6JD0nmD&bQ~Ro&wlCn;2T^RE!wnoN5;eTstSs?e|MX9gCJ#|k9xzu% zZ-yFqLWyS&vk`cCU%_w#etr#$`?C-^!DTUuAkab)uE&ybSrAFofOa)IzYJDi+adUj zKZ6Xb5nf=Lli&g0swHTb4VF`$2Z+z3MDf5zmUxv%54K$`X4d{H)<6ISY-`^-13?qb_OD3^I-oAHuv!N&U&v*L4vqE9VLL3|n*s5aV4xY_CIkE4n z=DX3ZFBFoLq;quC7cj+tqq(^r1MI6)>rS%OUoX}%K7OSnP^3Iy`atX3Iu8ri;?mOY zu{)5Z@BntGohk%FUM9#BV5fk9DJWBcB7%)n+;_Wkc%1M@3Q!SjrU@bh7ygh>EiQg^ zyqx5;Iez!nEvn(8^nq1W;t&o_UQP}l)N@dyW{ZNm{6}C$#!@7MQk=S#*J;A19M5MsJMc9(?VDd{2^bfWh5l6s>oI$_W%+PnaTo|ZOH3mob05K125JbV#z{#_ zpDo2(!4Q__aWi;8V0x7nl&4VJnxrY9i-Mt%QlI$K30O0Aai$?+sZcxLY4$psYZOW4 zb)>$&-azPut7|prY%3}bQPnOE(XY+Tsq}y=fz;frXFnHuI~nr|aft7VR0IXCkxwv; zJFJfBbVT)t>9G=t`nhaQdg3A44X!gjcT|X|__-#lZvn`3{%{txri18%%N?-mp?SL( z0&KBeKMcLML;ozF4dMiD_k zvU01b**B3zw8O%@1Pwyl{&%?QzfnKdNKt8NJZuLr@k@wGPea=8h9gk&lz0Zli@PrF zY-&2v+Ham}kzYf`%h_`ZNq}d*OZyI%2L@4;&zg!m+=&x-))*Ojo@Fu#osa`XuLE|2~F5+gzK zsFJ8Ck535&Q~TwY4#LchEWF`YseIznTh>pO{4l(5<5LOAUP9W;nS5{zV zm?|aS+qTj3jFgr@9CE(0$9qgP5Ejn_NsusM5AG{~BrvhDCmOuGAnU};Yz@2@&H(iJ zQi#a<;%rh>uVW&Uwf7h}`wMhVt&{uQMv?YZvKvkvPQqb$Cl2$wNip3p;epp?qv_O% z0edNb+R0>(UU~J~IlY1CZiO+fq=YOzDIFay&NB$`7>iprb2nY}+sZh3b4^mLE8;2W z)$o{@m~7|yiPuO&$MQ9}OQ`fOo>7Ae4B$RgR0yV_QR+yjE3ycxK*l^!;z9PKrVcfQuTu0#Fy-9Tq_4 zAn8CF!xuIl1q&{W_tkf`J=R5CVKS8=3xtwAa@)|E=jG>Da=II8LyuWSdQ{%3T33zs zMAG)8dgzkgRh$y9E2$=@tc(n4IX)TXAN`Q%c0AYJPzfaw``+Q{4NJPhNx5KXUj<}< zp89mE+R0{<&Tto31d~&n-CjT&uI)nmP5p0xJ2I6A0tnR5a0w(HtQy(v2pfj4{2wjQ z^hEMCe}g><01pu5_w@8=ml(9etJd5eB+G+1Mbyv$a&i~Wc(EMj?^CA^*L;SaY2tQ? z=TBs&ny+W+av~z^^ZEQndvvr>yM!skEz<_bf;K>6T7ex#3%p+I6-f^L_WJD43Kz9L z`dX;QABf@6EO)-BUdxGVzq2&uh2NrgZpEeclSRW58;h8=$ZA_?DluW>VBdj_^4xX!hk5i zm(JWcYyY&K+DCq#(=n$U^pddHWS@&9vx0F)P!N^52UIyVVGH!%a7NbSiQ+>;DXUxj zK)3=c>bl!BL@z*@7NG}twT--Pc7}>z>4jln^s9UO2M1GN+EF8gC=3|xf|?I@ z0DOo>!&taRxxwzjRz@EXJou3~e04jI8 z!zL)n-Tb!}_Kv*k&h800VXC%zwgqv4lx zCp->i!j5LfAjE1u^h>Pa)73RMpLz^XRNw&0XVwQ&1 znCs(|t9s@+KQ5SK9MY^8Og3Li@hYbl>12n?s(ssk5z?P;jq^FZN>w)Gh~w}$8%C6@nay~<9I2k z&CMxg%wor`#@-=rLD;dSt6fob7HCW?8h-wO z-eT^zC@o;NWm~*W%*~e<7TUD60Y-M|OL-1p{_NhRlR0zyp%RoHkcj~359I-&E-MiL znXnP3MsvP`=LP7CwVLLyxw+<0*!F&&WAY{VeLYO9Xw0zvqp5{sc35$;cWVB|;%{NU zCzNfyjG)V4Xk$t1GjTns4kq8QC*d|r>bd|sR8MEOs)E8}4mNA+qA)>Sgc_G2bYsJU zjS#cXq->bfRFgSI-Bl7NQ5J4_Cu?hyg)6;?^->ox2-1*GKr6sqO=^c#Rrpjv;nuMN zj9oIDh7mt7$xaZ4>J=4*w$87@w4sTKnyM<80rbT7y^JmC7)Ve+!gY3b29s03g$`*W zP!hO0JGZU-_t3wHD?yz?pVs;Ii$|793tu~4qlXRGV%)qZ-696Txs%{G1Azjyfd(8l#ErzpUPpkFin8G+ii}~XCjlQpve-#U7y*QwxC)yyGUmb~HXo>- z4Fm2pHr3m0m1l07gJWT}Q!EZ#H6SZB<%oO#{yBiI0O>*iETM;3^Yhd`Wo7j&3L;QB zCAgKnevEOpS9tVeH@6od3W{0fI0SWuk`p&#?LytHP zO5}hR(kZxqx^39p?7KO^UsYwgwt~DZu5uC6n;X-rL0)gQ1g$-auFXCEw*Kjpd22}S zO4-oPs^#g)Uk+sUywEg<^MGHs6 zSFwBn>B;GaKyGg?q*+-!f+UdThL9i`uhWX0+61x8brPachgH4xjYsNqiwEoBwYB?| zR%mZq7lrCP;x|Uv*&Z5~VwaS(g|OVGRv8*>m{j!H zx*@6gc!XW_yJOqRx`ri&|TJx25bu8~QxkcE4IgPZ8K&stfU zD!Kc0a`f~Mdqg<$ws&5k{3@bFP5pw9u!QN?-W?dMrl`ayC1q}IrUp{_@$n@`$Cu=M zMI4-@mx{(9kFIr9QCDlr&K-b3xtttj5DZfn6{Yt0-IGv!ilmbz8N*k^xKA{6W`H$6 zW!5s`h*N7<4WSrOR*9V$iya@uUtFkl-A&^WPLj>6EG@F$+cz*Z4by|mcs=Ca-NmKd zubR|KN+X`<`+}r~xcA6GAcie0XF?QVa_xk;0w~w#7CXehRYJe3@)w|{y#v^@(LJSwgj14P4D018yuZHTMMJsyzyc`Qb^R#bADT{ z!u%nlw{dUu$-2|Zuw_SoIwGGC<7Bb7)RaKh&UQnq#*>2R!GZnb_hVzrqN1hY5wtgP zV_Co1Er_qRPL`D^O;i#tFMhnsUR<&p&)qvZscFAF^X?tI+PfuWdCxg)cJ|770gRY}jt6E1=s;;`z3`lO_5vCkGNBoYto6zL`N@Bvl$K1hUf#c;@x!;U4bUt&FRP zYQ%TWM#e-M(9$;qf4s#M8C;m0 zf6ET`k=H+uc0BxW-H6H0DesF&sm7#TL0CCC*38UG!XsGj@gT^+%X~#+wsxwVDly(- zOHw58^W*P>!7YObnMrqUj+?3;;~pCa`23=a;z z+JF%!XKjrWqW5WOC0KOA+pptxNXg9*;w!6az}~vTYn!H(DNDk~ch|!4_uZQm2!i=o z;@MMWg*RL4pNRTav0yDZ<+jMm&F8e3@hmDVjEabme<1C4ezLSWa7;|>=KZhHCq=j~t&ywG=F-s?v$^s_032Jrr%5^@FHKVEb^=@5-zsL5msg8t_wly)QlW| zi%;qrou5)cY;RPcBX_V$4D$!d%qzX-#T<2YcRORqVYF|Tm1=m>h+$@83Qv*$!B=fQ=c0m&m%XD$tva5`lGV=6 zqxNfM=0iC-ySpF0`_Z$!x_^kj@$Z#O!%i|G#|VMmg?b97?3h|sX#(eX#EZZ_7(tLzp; z2#tU%T=@yjyL)7hMK(yHf=WvZlBtk|jC9W3r-9Nk*#gwC-~<5awUCj)wfi)SfQI<@ z?sW(rxvz@%LOM|r;_jPWTuf0jSw{3*{i6o^M(7hGw)OQHC8e5WK@pU)8^~H-+{ey^ zfxr;6Y=P6?@)#IPhlfAEs@{gxKaG{|VB7}c&?Q-Bsf>IYdG!EaQad!_Bz}#A2zmQ4J$-?SEvPP|q~u$Nek-+z zxUMq}MNfWi06_icQ~doZSveU-Q^ooq!LnVSNxIkxS5a{R*q~HRk)>|MQ*$X5GWn_= zie(rD0J4gPB)4H=j-~Zbd|}qSe)WpUBq#4ol4rI1%FkSSxj719jhRh}g(TpgLfKz^9;qrC{(bW81a+$NwzYRf#0a@7l$hZT%#T!NYnX zrRvo5?^W1B;g(+Ao-wZX(us_hn#+&RT&`p{?&x{AyLLT|(&FhtY*>~5%KX6@uY|~ZvtUhm})Q?LtBS`y{ki;&-LHM9*{fPAtjkrapgV9 z)O1_6ESBU8otb;7$DJ+8W97 zZOjisH&`P(*8@jT&$_gVO#bq(Z#2`k0|V`{6S9foq6-HDnc}>s* z&0XtyruG0|ou3z%jKAWl4iQM1Lg%dQ9lD(_FNPWbgdSK8W@Z+pm$|Lw2G20(UX0G0 z&wK}GTT@^S5~^{F8K3zRZzb>|(By0s5A_7v9a{8v#*8sAETqo-=(jc5I}=IzAeQO@ zfhPF_neA7DW;rXnA;^~&gF_NFgQBD){~R;#kis4VS&j0C zYrBgtMhfaef;=clrO5lOAkU)Bke!o`{dITTgmI5Rod-YH!-rfC*S=F1snz-7?3^4< zv$pJ>;%czAO}mPq+T$N7+_Ss?WNx-zjc7GsB{HLk*_hjNe$UH$H$A;vMFk4@&tYNjK7KT)w&o?f?Fauq z_(ly%&7?2|b$Vb9^)Ho=9@5z@A;rkToY7|j>y!}}m!j34OioS?jQb_t9hdaD x#pB>$AWwUMJ$wyS#D|7A&Qd|c7y087U2>i?$~I{i%-;wi@m%& Date: Tue, 17 Mar 2020 11:55:45 +0200 Subject: [PATCH 55/76] Think_Python_Chapter_9 --- ...on_Chapter_9__Case_study_A_word_play.ipynb | 1405 +++++++++++++++++ 1 file changed, 1405 insertions(+) create mode 100644 notebooks/Books/Think Python/Think_Python_Chapter_9__Case_study_A_word_play.ipynb diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_9__Case_study_A_word_play.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_9__Case_study_A_word_play.ipynb new file mode 100644 index 0000000..6eac903 --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_9__Case_study_A_word_play.ipynb @@ -0,0 +1,1405 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 9  Case study: word play" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9.1 Reading word lists" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This chapter presents the second case study, which involves\n", + "solving word puzzles by searching for words that have certain\n", + "properties. For example, we’ll find the longest palindromes\n", + "in English and search for words whose letters appear in\n", + "alphabetical order. And I will present another program development\n", + "plan: reduction to a previously solved problem." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For the exercises in this chapter we need a list of English words.\n", + "There are lots of word lists available on the Web, but the one most\n", + "suitable for our purpose is one of the word lists collected and\n", + "contributed to the public domain by Grady Ward as part of the Moby\n", + "lexicon project (see http://wikipedia.org/wiki/Moby_Project). It\n", + "is a list of 113,809 official crosswords; that is, words that are\n", + "considered valid in crossword puzzles and other word games. In the\n", + "Moby collection, the filename is 113809of.fic; you can download\n", + "a copy, with the simpler name words.txt, from\n", + "http://thinkpython2.com/code/words.txt.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This file is in plain text, so you can open it with a text\n", + "editor, but you can also read it from Python. The built-in\n", + "function open takes the name of the file as a parameter\n", + "and returns a file object you can use to read the file.\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "fin = open('words.txt')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "<_io.TextIOWrapper name='words.txt' mode='r' encoding='UTF-8'>" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fin" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "fin is a common name for a file object used for input. The file\n", + "object provides several methods for reading, including readline,\n", + "which reads characters from the file until it gets to a newline and\n", + "returns the result as a string: \n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'aa\\n'" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fin.readline()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first word in this particular list is “aa”, which is a kind of\n", + "lava. The sequence \\n represents the newline character that \n", + "separates this word from the next." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The file object keeps track of where it is in the file, so\n", + "if you call readline again, you get the next word:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'aah\\n'" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fin.readline()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The next word is “aah”, which is a perfectly legitimate\n", + "word, so stop looking at me like that.\n", + "Or, if it’s the newline character that’s bothering you,\n", + "we can get rid of it with the string method strip:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'aahed'" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "line = fin.readline()\n", + "word = line.strip()\n", + "word" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "You can also use a file object as part of a for loop.\n", + "This program reads words.txt and prints each word, one\n", + "per line:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "zymurgy\n" + ] + } + ], + "source": [ + "fin = open('words.txt')\n", + "for line in fin:\n", + " word = line.strip()\n", + " #print(word)\n", + "print(word)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9.2 Exercises" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are solutions to these exercises in the next section.\n", + "You should at least attempt each one before you read the solutions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 1** Write a program that reads words.txt and prints only the words with more than 20 characters (not counting whitespace)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
word
0aa
1aah
2aahed
3aahing
4aahs
\n", + "
" + ], + "text/plain": [ + " word\n", + "0 aa\n", + "1 aah\n", + "2 aahed\n", + "3 aahing\n", + "4 aahs" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "words = pd.read_csv('words.txt', names=['word'])\n", + "\n", + "words.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(113809, 1)" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "words.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
word
21685counterdemonstrations
47408hyperaggressivenesses
60406microminiaturizations
\n", + "
" + ], + "text/plain": [ + " word\n", + "21685 counterdemonstrations\n", + "47408 hyperaggressivenesses\n", + "60406 microminiaturizations" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "words[words['word'].str.len() > 20]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 2** \n", + "In 1939 Ernest Vincent Wright published a 50,000 word novel called Gadsby that does not contain the letter “e”. Since “e” is the most common letter in English, that’s not easy to do.\n", + "\n", + "In fact, it is difficult to construct a solitary thought without using that most common symbol. It is slow going at first, but with caution and hours of training you can gradually gain facility.\n", + "\n", + "All right, I’ll stop now.\n", + "\n", + "Write a function called has_no_e that returns True if the given word doesn’t have the letter “e” in it.\n", + "\n", + "Write a program that reads words.txt and prints only the words that have no “e”. Compute the percentage of words in the list that have no “e”." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(37641, 1)" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "words[~words['word'].fillna('_').str.contains('e')].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(76168, 1)" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "words[words['word'].fillna('_').str.contains('e')].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
word
113800zymogenes
113801zymogens
113802zymologies
113804zymoses
113807zymurgies
\n", + "
" + ], + "text/plain": [ + " word\n", + "113800 zymogenes\n", + "113801 zymogens\n", + "113802 zymologies\n", + "113804 zymoses\n", + "113807 zymurgies" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "words[words['word'].fillna('_').str.contains('e')].tail()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 3**\n", + "Write a function named avoids that takes a word and a string of forbidden letters, and that returns True if the word doesn’t use any of the forbidden letters.\n", + "\n", + "Write a program that prompts the user to enter a string of forbidden letters and then prints the number of words that don’t contain any of them. Can you find a combination of 5 forbidden letters that excludes the smallest number of words?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 4** \n", + "Write a function named uses_only that takes a word and a string of letters, and that returns True if the word contains only letters in the list. Can you make a sentence using only the letters acefhlo? Other than “Hoe alfalfa”?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 5** \n", + "Write a function named uses_all that takes a word and a string of required letters, and that returns True if the word uses all the required letters at least once. How many words are there that use all the vowels aeiou? How about aeiouy?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercise 6**\n", + "Write a function called is_abecedarian that returns True if the letters in a word appear in alphabetical order (double letters are ok). How many abecedarian words are there?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9.3 Search" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pixiedust database opened successfully\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " \n", + " Pixiedust version 1.1.18\n", + "
\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pixiedust" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All of the exercises in the previous section have something\n", + "in common; they can be solved with the search pattern we saw\n", + "in Section 8.6. The simplest example is:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "def has_no_e(word):\n", + " for letter in word:\n", + " if letter == 'e':\n", + " return False\n", + " return True" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "has_no_e('letter')" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "has_no_e('xxxx')" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "pixiedust": { + "displayParams": {} + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "%%pixie_debugger\n", + "has_no_e('letter')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The for loop traverses the characters in word. If we find\n", + "the letter “e”, we can immediately return False; otherwise we\n", + "have to go to the next letter. If we exit the loop normally, that\n", + "means we didn’t find an “e”, so we return True.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "You could write this function more concisely using the in\n", + "operator, but I started with this version because it \n", + "demonstrates the logic of the search pattern." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "avoids is a more general version of has_no_e but it\n", + "has the same structure:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "def avoids(word, forbidden):\n", + " for letter in word:\n", + " if letter in forbidden:\n", + " return False\n", + " return True" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "avoids('hintw', 'wz')" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "pixiedust": { + "displayParams": {} + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "%%pixie_debugger\n", + "avoids('hint', 'wz')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "We can return False as soon as we find a forbidden letter;\n", + "if we get to the end of the loop, we return True." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "uses_only is similar except that the sense of the condition\n", + "is reversed:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "def uses_only(word, available):\n", + " for letter in word: \n", + " if letter not in available:\n", + " return False\n", + " return True" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "uses_only('hinth', 'inth')" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "pixiedust": { + "displayParams": {} + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "%%pixie_debugger\n", + "uses_only('hint', 'inh')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Instead of a list of forbidden letters, we have a list of available\n", + "letters. If we find a letter in word that is not in\n", + "available, we can return False." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "uses_all is similar except that we reverse the role\n", + "of the word and the string of letters:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def uses_all(word, required):\n", + " for letter in required: \n", + " if letter not in word:\n", + " return False\n", + " return True" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "uses_only('hinth', 'inth')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pixiedust": { + "displayParams": {} + } + }, + "outputs": [], + "source": [ + "%%pixie_debugger\n", + "uses_only('hintt', 'inth')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Instead of traversing the letters in word, the loop\n", + "traverses the required letters. If any of the required letters\n", + "do not appear in the word, we can return False.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you were really thinking like a computer scientist, you would\n", + "have recognized that uses_all was an instance of a\n", + "previously solved problem, and you would have written:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def uses_all(word, required):\n", + " return uses_only(required, word)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This is an example of a program development plan called reduction to a previously solved problem, which means that you\n", + "recognize the problem you are working on as an instance of a solved\n", + "problem and apply an existing solution. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus\n", + "\n", + "How to check performance in Jupyter Notebook" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The slowest run took 9.60 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "10000000 loops, best of 3: 143 ns per loop\n" + ] + } + ], + "source": [ + "%%timeit \n", + "a = \"abc\"\n", + "b = \"abcdefghijklmnopqrstuvwxyz\"\n", + "for i in a:\n", + " if i in b: \n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1000000 loops, best of 3: 678 ns per loop\n" + ] + } + ], + "source": [ + "%%timeit \n", + "b = \"abc\"\n", + "a = \"abcdefghijklmnopqrstuvwxyz\"\n", + "for i in a:\n", + " if i in b: \n", + " pass" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9.4 Looping with indices" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I wrote the functions in the previous section with for\n", + "loops because I only needed the characters in the strings; I didn’t\n", + "have to do anything with the indices." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For is_abecedarian we have to compare adjacent letters,\n", + "which is a little tricky with a for loop:" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "def is_abecedarian(word):\n", + " previous = word[0]\n", + " for c in word:\n", + " if c < previous:\n", + " return False\n", + " previous = c\n", + " return True" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_abecedarian('hintt')" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "pixiedust": { + "displayParams": {} + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "%%pixie_debugger\n", + "is_abecedarian('hintt')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An alternative is to use recursion:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_abecedarian(word):\n", + " if len(word) <= 1:\n", + " return True\n", + " if word[0] > word[1]:\n", + " return False\n", + " return is_abecedarian(word[1:])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another option is to use a while loop:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_abecedarian(word):\n", + " i = 0\n", + " while i < len(word)-1:\n", + " if word[i+1] < word[i]:\n", + " return False\n", + " i = i+1\n", + " return True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The loop starts at i=0 and ends when i=len(word)-1. Each\n", + "time through the loop, it compares the ith character (which you can\n", + "think of as the current character) to the i+1th character (which you\n", + "can think of as the next)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the next character is less than (alphabetically before) the current\n", + "one, then we have discovered a break in the abecedarian trend, and\n", + "we return False." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we get to the end of the loop without finding a fault, then the\n", + "word passes the test. To convince yourself that the loop ends\n", + "correctly, consider an example like 'flossy'. The\n", + "length of the word is 6, so\n", + "the last time the loop runs is when i is 4, which is the\n", + "index of the second-to-last character. On the last iteration,\n", + "it compares the second-to-last character to the last, which is\n", + "what we want.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is a version of is_palindrome (see\n", + "Exercise 3) that uses two indices; one starts at the\n", + "beginning and goes up; the other starts at the end and goes down." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def is_palindrome(word):\n", + " i = 0\n", + " j = len(word)-1\n", + "\n", + " while i Date: Wed, 18 Mar 2020 12:39:02 +0200 Subject: [PATCH 56/76] 24-pandas-check-value-column-contained-another-column-same-row --- ...mn-contained-another-column-same-row.ipynb | 896 ++++++++++++++++++ 1 file changed, 896 insertions(+) create mode 100644 notebooks/pandas/24-pandas-check-value-column-contained-another-column-same-row.ipynb diff --git a/notebooks/pandas/24-pandas-check-value-column-contained-another-column-same-row.ipynb b/notebooks/pandas/24-pandas-check-value-column-contained-another-column-same-row.ipynb new file mode 100644 index 0000000..d0c667c --- /dev/null +++ b/notebooks/pandas/24-pandas-check-value-column-contained-another-column-same-row.ipynb @@ -0,0 +1,896 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 24. Pandas: Check If Value of Column Is Contained in Another Column in the Same Row" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titleplot_keywordscountry
0Avataravatar|future|marine|native|paraplegicUSA
1Pirates of the Caribbean: At World's Endgoddess|marriage ceremony|marriage proposal|pi...USA
2Spectrebomb|espionage|sequel|spy|terroristUK
3The Dark Knight Risesdeception|imprisonment|lawlessness|police offi...USA
4Star Wars: Episode VII - The Force Awakens  ...NaNNaN
5John Carteralien|american civil war|male nipple|mars|prin...USA
6Spider-Man 3sandman|spider man|symbiote|venom|villainUSA
7Tangled17th century|based on fairy tale|disney|flower...USA
8Avengers: Age of Ultronartificial intelligence|based on comic book|ca...USA
9Harry Potter and the Half-Blood Princeblood|book|love|potion|professorUK
\n", + "
" + ], + "text/plain": [ + " movie_title \\\n", + "0 Avatar  \n", + "1 Pirates of the Caribbean: At World's End  \n", + "2 Spectre  \n", + "3 The Dark Knight Rises  \n", + "4 Star Wars: Episode VII - The Force Awakens  ... \n", + "5 John Carter  \n", + "6 Spider-Man 3  \n", + "7 Tangled  \n", + "8 Avengers: Age of Ultron  \n", + "9 Harry Potter and the Half-Blood Prince  \n", + "\n", + " plot_keywords country \n", + "0 avatar|future|marine|native|paraplegic USA \n", + "1 goddess|marriage ceremony|marriage proposal|pi... USA \n", + "2 bomb|espionage|sequel|spy|terrorist UK \n", + "3 deception|imprisonment|lawlessness|police offi... USA \n", + "4 NaN NaN \n", + "5 alien|american civil war|male nipple|mars|prin... USA \n", + "6 sandman|spider man|symbiote|venom|villain USA \n", + "7 17th century|based on fairy tale|disney|flower... USA \n", + "8 artificial intelligence|based on comic book|ca... USA \n", + "9 blood|book|love|potion|professor UK " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[['movie_title', 'plot_keywords', 'country']].head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Check If String Column Contains Substring of Another with Function" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlecountry
196AustraliaAustralia
2504McFarland, USAUSA
\n", + "
" + ], + "text/plain": [ + " movie_title country\n", + "196 Australia  Australia\n", + "2504 McFarland, USA  USA" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def find_value_column(row):\n", + " return row.country in row.movie_title\n", + "\n", + "df.country.fillna('_', inplace=True)\n", + "df[df.apply(find_value_column, axis=1)][['movie_title', 'country']].head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "for row in df.loc[df.plot_keywords.isnull(), 'plot_keywords'].index:\n", + " df.at[row, 'plot_keywords'] = []" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titleplot_keywords
0Avataravatar|future|marine|native|paraplegic
22Robin Hood1190s|archer|england|king of england|robin hood
25King Konganimal name in title|ape abducts a woman|goril...
26Titanicartist|love|ship|titanic|wet
33Alice in Wonderlandalice in wonderland|mistaking reality for drea...
130Thorbattle|marvel cinematic universe|scientist|tho...
145Pan1940s|child hero|fantasy world|orphan|referenc...
147Troygreek|mythology|prince|trojan|troy
150Ghostbustersghost|ghostbuster|ghostbusters|male objectific...
160Star Trekbox office hit|future|lifted by the throat|sta...
\n", + "
" + ], + "text/plain": [ + " movie_title plot_keywords\n", + "0 Avatar  avatar|future|marine|native|paraplegic\n", + "22 Robin Hood  1190s|archer|england|king of england|robin hood\n", + "25 King Kong  animal name in title|ape abducts a woman|goril...\n", + "26 Titanic  artist|love|ship|titanic|wet\n", + "33 Alice in Wonderland  alice in wonderland|mistaking reality for drea...\n", + "130 Thor  battle|marvel cinematic universe|scientist|tho...\n", + "145 Pan  1940s|child hero|fantasy world|orphan|referenc...\n", + "147 Troy  greek|mythology|prince|trojan|troy\n", + "150 Ghostbusters  ghost|ghostbuster|ghostbusters|male objectific...\n", + "160 Star Trek  box office hit|future|lifted by the throat|sta..." + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def find_value_column(row):\n", + " return row.movie_title.lower().strip() in row.plot_keywords\n", + "\n", + "df[df.apply(find_value_column, axis=1)][['movie_title', 'plot_keywords']].head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Check If Column contains another column with lambda" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlecountry
196AustraliaAustralia
2504McFarland, USAUSA
\n", + "
" + ], + "text/plain": [ + " movie_title country\n", + "196 Australia  Australia\n", + "2504 McFarland, USA  USA" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df.apply(lambda x: x.country in x.movie_title, axis=1)][['movie_title', 'country']].head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "(\"'Series' objects are mutable, thus they cannot be hashed\", 'occurred at index 0')", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Warning for common error\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mrow\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcountry\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmovie_title\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36mapply\u001b[0;34m(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)\u001b[0m\n\u001b[1;32m 6904\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6905\u001b[0m )\n\u001b[0;32m-> 6906\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mop\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6907\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6908\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mapplymap\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/apply.py\u001b[0m in \u001b[0;36mget_result\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 184\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply_raw\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 185\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 186\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply_standard\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 187\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 188\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mapply_empty_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/apply.py\u001b[0m in \u001b[0;36mapply_standard\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 290\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 291\u001b[0m \u001b[0;31m# compute the result using the series generator\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 292\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply_series_generator\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 293\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 294\u001b[0m \u001b[0;31m# wrap results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/apply.py\u001b[0m in \u001b[0;36mapply_series_generator\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 319\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 320\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mv\u001b[0m \u001b[0;32min\u001b[0m \u001b[0menumerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseries_gen\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 321\u001b[0;31m \u001b[0mresults\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mv\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 322\u001b[0m \u001b[0mkeys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 323\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m(row)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Warning for common error\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mrow\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcountry\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmovie_title\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__contains__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1935\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__contains__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1936\u001b[0m \u001b[0;34m\"\"\"True if the key is in the info axis\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1937\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mkey\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_info_axis\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1938\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1939\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0mproperty\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/indexes/range.py\u001b[0m in \u001b[0;36m__contains__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 362\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 363\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__contains__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mUnion\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minteger\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0mbool\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 364\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 365\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 366\u001b[0m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mensure_python_int\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__hash__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1885\u001b[0m raise TypeError(\n\u001b[1;32m 1886\u001b[0m \u001b[0;34m\"{0!r} objects are mutable, thus they cannot be\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1887\u001b[0;31m \u001b[0;34m\" hashed\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__class__\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1888\u001b[0m )\n\u001b[1;32m 1889\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mTypeError\u001b[0m: (\"'Series' objects are mutable, thus they cannot be hashed\", 'occurred at index 0')" + ] + } + ], + "source": [ + "# Warning for common error\n", + "df.apply(lambda row: df.country in df.movie_title, axis=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: Fastest Way to Check If One Column Contains Another" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "df['country'].fillna('Uknown', inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlecountry
196AustraliaAustralia
2504McFarland, USAUSA
\n", + "
" + ], + "text/plain": [ + " movie_title country\n", + "196 Australia  Australia\n", + "2504 McFarland, USA  USA" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[[x[0] in x[1] for x in zip(df['country'], df['movie_title'])]][['movie_title', 'country']]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4: For Loop and df.iterrows() Version" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Australia Australia \n", + "USA McFarland, USA \n" + ] + } + ], + "source": [ + "for i, row in df.iterrows():\n", + " if row.country in row.movie_title:\n", + " print(row.country, row.movie_title)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus Step: Check If List Column Contains Substring of Another with Function" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "df['keywords'] = df.plot_keywords.str.split('|')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 [avatar, future, marine, native, paraplegic]\n", + "1 [goddess, marriage ceremony, marriage proposal...\n", + "2 [bomb, espionage, sequel, spy, terrorist]\n", + "3 [deception, imprisonment, lawlessness, police ...\n", + "4 NaN\n", + " ... \n", + "5038 [fraud, postal worker, prison, theft, trial]\n", + "5039 [cult, fbi, hideout, prison escape, serial kil...\n", + "5040 NaN\n", + "5041 NaN\n", + "5042 [actress name in title, crush, date, four word...\n", + "Name: keywords, Length: 5043, dtype: object" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['keywords']" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlekeywords
0Avatar[avatar, future, marine, native, paraplegic]
9Harry Potter and the Half-Blood Prince[blood, book, love, potion, professor]
33Alice in Wonderland[alice in wonderland, mistaking reality for dr...
68Monsters vs. Aliens[alien, alien invasion, alien space craft, gia...
77G.I. Joe: The Rise of Cobra[cobra, gi joe, snake, train, warhead]
\n", + "
" + ], + "text/plain": [ + " movie_title \\\n", + "0 Avatar  \n", + "9 Harry Potter and the Half-Blood Prince  \n", + "33 Alice in Wonderland  \n", + "68 Monsters vs. Aliens  \n", + "77 G.I. Joe: The Rise of Cobra  \n", + "\n", + " keywords \n", + "0 [avatar, future, marine, native, paraplegic] \n", + "9 [blood, book, love, potion, professor] \n", + "33 [alice in wonderland, mistaking reality for dr... \n", + "68 [alien, alien invasion, alien space craft, gia... \n", + "77 [cobra, gi joe, snake, train, warhead] " + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def find_value_column(row):\n", + " if isinstance(row['keywords'], list):\n", + " for keyword in row['keywords']:\n", + " return keyword in row.movie_title.lower()\n", + " else:\n", + " return False\n", + "\n", + "df[df.apply(find_value_column, axis=1)][['movie_title', 'keywords']].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "df['keywords'] = df['keywords'].apply(lambda d: d if isinstance(d, list) else [])" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlekeywords
0Avatar[avatar, future, marine, native, paraplegic]
9Harry Potter and the Half-Blood Prince[blood, book, love, potion, professor]
33Alice in Wonderland[alice in wonderland, mistaking reality for dr...
68Monsters vs. Aliens[alien, alien invasion, alien space craft, gia...
77G.I. Joe: The Rise of Cobra[cobra, gi joe, snake, train, warhead]
\n", + "
" + ], + "text/plain": [ + " movie_title \\\n", + "0 Avatar  \n", + "9 Harry Potter and the Half-Blood Prince  \n", + "33 Alice in Wonderland  \n", + "68 Monsters vs. Aliens  \n", + "77 G.I. Joe: The Rise of Cobra  \n", + "\n", + " keywords \n", + "0 [avatar, future, marine, native, paraplegic] \n", + "9 [blood, book, love, potion, professor] \n", + "33 [alice in wonderland, mistaking reality for dr... \n", + "68 [alien, alien invasion, alien space craft, gia... \n", + "77 [cobra, gi joe, snake, train, warhead] " + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def find_value_column(row):\n", + " for keyword in row['keywords']:\n", + " return keyword in row.movie_title.lower()\n", + " return False\n", + "\n", + "df[df.apply(find_value_column, axis=1)][['movie_title', 'keywords']].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Performance" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10 loops, best of 3: 154 ms per loop\n" + ] + } + ], + "source": [ + "%%timeit\n", + "def find_value_column(row):\n", + " return row.country in row.movie_title\n", + "\n", + "df[df.apply(find_value_column, axis=1)]" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10 loops, best of 3: 155 ms per loop\n" + ] + } + ], + "source": [ + "%%timeit\n", + "df[df.apply(lambda x: x.country in x.movie_title, axis=1)]" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1000 loops, best of 3: 1.76 ms per loop\n" + ] + } + ], + "source": [ + "%%timeit\n", + "df[[x[0] in x[1] for x in zip(df['country'], df['movie_title'])]]" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1 loop, best of 3: 599 ms per loop\n" + ] + } + ], + "source": [ + "%%timeit\n", + "for i, row in df.iterrows():\n", + " if row.country in row.movie_title:\n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From fac4d7a39f2f9a9b42a4944320c44255365f5e1c Mon Sep 17 00:00:00 2001 From: softhints Date: Thu, 19 Mar 2020 14:34:32 +0200 Subject: [PATCH 57/76] Think_Python_Chapter_10__Lists --- .../Think_Python_Chapter_10__Lists.ipynb | 1691 +++++++++++++++++ 1 file changed, 1691 insertions(+) create mode 100644 notebooks/Books/Think Python/Think_Python_Chapter_10__Lists.ipynb diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_10__Lists.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_10__Lists.ipynb new file mode 100644 index 0000000..1653288 --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_10__Lists.ipynb @@ -0,0 +1,1691 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 10  Lists\n", + "\n", + "* A list is a sequence\n", + "* Lists are mutable\n", + "* Traversing a list\n", + "* List operations\n", + "* List slices\n", + "* List methods\n", + "* Map, filter and reduce\n", + "* Deleting elements\n", + "* Lists and strings\n", + "* Objects and values\n", + "* Aliasing\n", + "* List arguments\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.1 A list is a sequence" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This chapter presents one of Python’s most useful built-in types, lists.\n", + "You will also learn more about objects and what can happen when you have\n", + "more than one name for the same object." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Like a string, a list is a sequence of values. In a string, the\n", + "values are characters; in a list, they can be any type. The values in\n", + "a list are called elements or sometimes items.\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are several ways to create a new list; the simplest is to\n", + "enclose the elements in square brackets ([ and ]):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[10, 20, 30, 40]\n", + "['crunchy frog', 'ram bladder', 'lark vomit']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first example is a list of four integers. The second is a list of\n", + "three strings. The elements of a list don’t have to be the same type.\n", + "The following list contains a string, a float, an integer, and\n", + "(lo!) another list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "['spam', 2.0, 5, [10, 20]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A list within another list is nested.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A list that contains no elements is\n", + "called an empty list; you can create one with empty\n", + "brackets, [].\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you might expect, you can assign list values to variables:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cheeses = ['Cheddar', 'Edam', 'Gouda']\n", + "numbers = [42, 123]\n", + "empty = []\n", + "print(cheeses, numbers, empty)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.2 Lists are mutable" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The syntax for accessing the elements of a list is the same as for\n", + "accessing the characters of a string—the bracket operator. The\n", + "expression inside the brackets specifies the index. Remember that the\n", + "indices start at 0:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cheeses[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Unlike strings, lists are mutable. When the bracket operator appears\n", + "on the left side of an assignment, it identifies the element of the\n", + "list that will be assigned.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "numbers = [42, 123]\n", + "numbers[1] = 5\n", + "numbers" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "numbers[4] = 5" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The one-eth element of numbers, which\n", + "used to be 123, is now 5.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Figure 10.1 shows \n", + "the state diagram for cheeses, numbers and empty:\n", + "\n", + "![](http://greenteapress.com/thinkpython2/html/thinkpython2011.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lists are represented by boxes with the word “list” outside\n", + "and the elements of the list inside. cheeses refers to\n", + "a list with three elements indexed 0, 1 and 2.\n", + "numbers contains two elements; the diagram shows that the\n", + "value of the second element has been reassigned from 123 to 5.\n", + "empty refers to a list with no elements.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "List indices work the same way as string indices:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The in operator also works on lists." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cheeses = ['Cheddar', 'Edam', 'Gouda']\n", + "'Edam' in cheeses" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "'Brie' in cheeses" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.3 Traversing a list" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The most common way to traverse the elements of a list is\n", + "with a for loop. The syntax is the same as for strings:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for cheese in cheeses:\n", + " print(cheese)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This works well if you only need to read the elements of the\n", + "list. But if you want to write or update the elements, you\n", + "need the indices. A common way to do that is to combine\n", + "the built-in functions range and len:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for i in range(len(numbers)):\n", + " numbers[i] = numbers[i] * 2\n", + "numbers" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for i, e in enumerate(numbers):\n", + " print(i , e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This loop traverses the list and updates each element. len\n", + "returns the number of elements in the list. range returns\n", + "a list of indices from 0 to n−1, where n is the length of\n", + "the list. Each time through the loop i gets the index\n", + "of the next element. The assignment statement in the body uses\n", + "i to read the old value of the element and to assign the\n", + "new value.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A for loop over an empty list never runs the body:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for x in []:\n", + " print('This never happens.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Although a list can contain another list, the nested\n", + "list still counts as a single element. The length of this list is\n", + "four:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "['spam', 1, ['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus flatten list of list and list compehension\n", + "\n", + "https://docs.python.org/3.0/tutorial/datastructures.html?highlight=list%20comprehension" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_list = [['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[element for sublist in my_list for element in sublist ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_list = ['spam', 1, ['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[element for element in my_list if not isinstance(element, list) ]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[element for sublist in my_list if isinstance(sublist, list) for element in sublist ]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.4 List operations" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The + operator concatenates lists:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [1, 2, 3]\n", + "b = [4, 5, 6]\n", + "c = a + b\n", + "c" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The * operator repeats a list a given number of times:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[0] * 4" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[1, 2, 3] * 3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The first example repeats [0] four times. The second example\n", + "repeats the list [1, 2, 3] three times." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.5 List slices" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The slice operator also works on lists:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c', 'd', 'e', 'f']\n", + "t[1:3]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[:4]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[3:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you omit the first index, the slice starts at the beginning.\n", + "If you omit the second, the slice goes to the end. So if you\n", + "omit both, the slice is a copy of the whole list.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Since lists are mutable, it is often useful to make a copy\n", + "before performing operations that modify lists.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A slice operator on the left side of an assignment\n", + "can update multiple elements:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c', 'd', 'e', 'f']\n", + "t[1:3] = ['x', 'y']\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Bonus: can you reverse list with slicing?\n", + "\n", + "['f', 'e', 'd', 'y', 'x', 'a']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[::-1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.6 List methods" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Python provides methods that operate on lists. For example,\n", + "append adds a new element to the end of a list:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c']\n", + "t.append('d')\n", + "t.index()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "extend takes a list as an argument and appends all of\n", + "the elements:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t1 = ['a', 'b', 'c']\n", + "t2 = ['d', 'e']\n", + "t1.extend(t2)\n", + "t1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This example leaves t2 unmodified." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "sort arranges the elements of the list from low to high:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['d', 'c', 'e', 'b', 'a']\n", + "t.sort()\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Most list methods are void; they modify the list and return None.\n", + "If you accidentally write t = t.sort(), you will be disappointed\n", + "with the result.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.7 Map, filter and reduce" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To add up all the numbers in a list, you can use a loop like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def add_all(t):\n", + " total = 0\n", + " for x in t:\n", + " total += x\n", + " return total" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "total is initialized to 0. Each time through the loop,\n", + "x gets one element from the list. The += operator\n", + "provides a short way to update a variable. This \n", + "augmented assignment statement,\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + " total += x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "is equivalent to" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + " total = total + x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As the loop runs, total accumulates the sum of the\n", + "elements; a variable used this way is sometimes called an\n", + "accumulator.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Adding up the elements of a list is such a common operation\n", + "that Python provides it as a built-in function, sum:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = [1, 2, 3]\n", + "sum(t)\n", + "6" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "**An operation like this that combines a sequence of elements into\n", + "a single value is sometimes called reduce.**\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Sometimes you want to traverse one list while building\n", + "another. For example, the following function takes a list of strings\n", + "and returns a new list that contains capitalized strings:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def capitalize_all(t):\n", + " res = []\n", + " for s in t:\n", + " res.append(s.capitalize())\n", + " return res" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "res is initialized with an empty list; each time through\n", + "the loop, we append the next element. So res is another\n", + "kind of accumulator.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**An operation like capitalize_all is sometimes called a map because it “maps” a function (in this case the method capitalize) onto each of the elements in a sequence.**\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another common operation is to select some of the elements from\n", + "a list and return a sublist. For example, the following\n", + "function takes a list of strings and returns a list that contains\n", + "only the uppercase strings:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def only_upper(t):\n", + " res = []\n", + " for s in t:\n", + " if s.isupper():\n", + " res.append(s)\n", + " return res" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "isupper is a string method that returns True if\n", + "the string contains only upper case letters." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**An operation like only_upper is called a filter because\n", + "it selects some of the elements and filters out the others.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Most common list operations can be expressed as a combination\n", + "of map, filter and reduce." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.8 Deleting elements" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are several ways to delete elements from a list. If you\n", + "know the index of the element you want, you can use\n", + "pop:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c']\n", + "x = t.pop(1)\n", + "t" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "pop modifies the list and returns the element that was removed.\n", + "If you don’t provide an index, it deletes and returns the\n", + "last element." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you don’t need the removed value, you can use the del\n", + "operator:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c']\n", + "del t[1]\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you know the element you want to remove (but not the index), you\n", + "can use remove:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'b', 'c']\n", + "t.remove('b')\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The return value from remove is None.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To remove more than one element, you can use del with\n", + "a slice index:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['a', 'b', 'c', 'd', 'e', 'f']\n", + "del t[1:5]\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As usual, the slice selects all the elements up to but not\n", + "including the second index." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.9 Lists and strings" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A string is a sequence of characters and a list is a sequence\n", + "of values, but a list of characters is not the same as a\n", + "string. To convert from a string to a list of characters,\n", + "you can use list:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'spam'\n", + "t = list(s)\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Because list is the name of a built-in function, you should\n", + "avoid using it as a variable name. I also avoid l because\n", + "it looks too much like 1. So that’s why I use t." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The list function breaks a string into individual letters. If\n", + "you want to break a string into words, you can use the split\n", + "method:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'pining for the fjords'\n", + "t = s.split()\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "An optional argument called a delimiter specifies which\n", + "characters to use as word boundaries.\n", + "The following example\n", + "uses a hyphen as a delimiter:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'spam-spam-spam'\n", + "delimiter = '-'\n", + "t = s.split(delimiter)\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "join is the inverse of split. It\n", + "takes a list of strings and\n", + "concatenates the elements. join is a string method,\n", + "so you have to invoke it on the delimiter and pass the\n", + "list as a parameter:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ['pining', 'for', 'the', 'fjords']\n", + "delimiter = ' '\n", + "s = delimiter.join(t)\n", + "s" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In this case the delimiter is a space character, so\n", + "join puts a space between words. To concatenate\n", + "strings without spaces, you can use the empty string,\n", + "'', as a delimiter. \n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.10 Objects and values" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we run these assignment statements:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 'banana'\n", + "b = 'banana'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "We know that a and b both refer to a\n", + "string, but we don’t\n", + "know whether they refer to the same string.\n", + "There are two possible states, shown in Figure 10.2.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In one case, a and b refer to two different objects that\n", + "have the same value. In the second case, they refer to the same\n", + "object.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To check whether two variables refer to the same object, you can\n", + "use the is operator." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 'banana'\n", + "b = 'banana'\n", + "a is b" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "In this example, Python only created one string object, and both a and b refer to it. But when you create two lists, you get\n", + "two objects:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [1, 2, 3]\n", + "b = [1, 2, 3]\n", + "a is b" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "So the state diagram looks like Figure 10.3.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this case we would say that the two lists are equivalent,\n", + "because they have the same elements, but not identical, because\n", + "they are not the same object. If two objects are identical, they are\n", + "also equivalent, but if they are equivalent, they are not necessarily\n", + "identical.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Until now, we have been using “object” and “value”\n", + "interchangeably, but it is more precise to say that an object has a\n", + "value. If you evaluate [1, 2, 3], you get a list\n", + "object whose value is a sequence of integers. If another\n", + "list has the same elements, we say it has the same value, but\n", + "it is not the same object.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.11 Aliasing" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If a refers to an object and you assign b = a,\n", + "then both variables refer to the same object:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [1, 2, 3]\n", + "b = a\n", + "b is a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The state diagram looks like Figure 10.4.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The association of a variable with an object is called a reference. In this example, there are two references to the same\n", + "object.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An object with more than one reference has more\n", + "than one name, so we say that the object is aliased.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the aliased object is mutable, changes made with one alias affect\n", + "the other:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "b[0] = 42\n", + "a" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "b" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Although this behavior can be useful, it is error-prone. In general,\n", + "it is safer to avoid aliasing when you are working with mutable\n", + "objects.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For immutable objects like strings, aliasing is not as much of a\n", + "problem. In this example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 'banana'\n", + "b = 'banana'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "It almost never makes a difference whether a and b refer\n", + "to the same string or not." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10.12 List arguments" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you pass a list to a function, the function gets a reference to\n", + "the list. If the function modifies the list, the caller sees\n", + "the change. For example, delete_head removes the first element\n", + "from a list:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "def delete_head(t):\n", + " del t[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Here’s how it is used:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['b', 'c']" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letters = ['a', 'b', 'c']\n", + "delete_head(letters)\n", + "letters" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The parameter t and the variable letters are\n", + "aliases for the same object. The stack diagram looks like\n", + "Figure 10.5.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since the list is shared by two frames, I drew\n", + "it between them." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is important to distinguish between operations that\n", + "modify lists and operations that create new lists. For\n", + "example, the append method modifies a list, but the\n", + "+ operator creates a new list.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s an example using append:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "t1 = [1, 2]\n", + "t2 = t1.append(3)\n", + "t1" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "t2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The return value from append is None." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s an example using the + operator:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "t3 = t1 + [4]\n", + "t1" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3, 4]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "t3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result of the operator is a new list, and the original list is\n", + "unchanged." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This difference is important when you write functions that\n", + "are supposed to modify lists. For example, this function\n", + "does not delete the head of a list:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "def bad_delete_head(t):\n", + " t = t[1:] # WRONG!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The slice operator creates a new list and the assignment\n", + "makes t refer to it, but that doesn’t affect the caller.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "t4 = [1, 2, 3]\n", + "bad_delete_head(t4)\n", + "t4" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "At the beginning of bad_delete_head, t and t4\n", + "refer to the same list. At the end, t refers to a new list,\n", + "but t4 still refers to the original, unmodified list." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An alternative is to write a function that creates and\n", + "returns a new list. For\n", + "example, tail returns all but the first\n", + "element of a list:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "def tail(t):\n", + " return t[1:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This function leaves the original list unmodified.\n", + "Here’s how it is used:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['b', 'c']" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letters = ['a', 'b', 'c']\n", + "rest = tail(letters)\n", + "rest\n", + "['b', 'c']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 2d1dbc1e153841e393c44177e2cbc3cd9751a703 Mon Sep 17 00:00:00 2001 From: softhints Date: Thu, 19 Mar 2020 17:28:11 +0200 Subject: [PATCH 58/76] 25_Pandas_Create_A_Matplotlib_Scatterplot_From_A_Dataframe --- ...plotlib_Scatterplot_From_A_Dataframe.ipynb | 964 ++++++++++++++++++ test.py | 2 +- 2 files changed, 965 insertions(+), 1 deletion(-) create mode 100644 notebooks/pandas/25_Pandas_Create_A_Matplotlib_Scatterplot_From_A_Dataframe.ipynb diff --git a/notebooks/pandas/25_Pandas_Create_A_Matplotlib_Scatterplot_From_A_Dataframe.ipynb b/notebooks/pandas/25_Pandas_Create_A_Matplotlib_Scatterplot_From_A_Dataframe.ipynb new file mode 100644 index 0000000..8e593db --- /dev/null +++ b/notebooks/pandas/25_Pandas_Create_A_Matplotlib_Scatterplot_From_A_Dataframe.ipynb @@ -0,0 +1,964 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 25. Pandas: Create A Matplotlib Scatterplot From A Dataframe " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Datasets:\n", + "* https://www.kaggle.com/statchaitya/country-to-continent\n", + "* https://www.kaggle.com/erikbruin/countries-of-the-world-iso-codes-and-population\n", + "* https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset\n", + "\n", + "\"Drawing\"\n", + "\"Drawing\"" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import chardet\n", + "import pandas as pd\n", + "\n", + "df = pd.read_csv(\"../csv/covid/covid_19_clean_complete.csv\")\n", + "population = pd.read_csv(\"../csv/covid/countries_by_population_2019.csv\")\n", + "\n", + "with open('../csv/covid/countryContinent.csv', 'rb') as f:\n", + " result = chardet.detect(f.read()) # or readline if the file is large\n", + "\n", + "continent = pd.read_csv(\"../csv/covid/countryContinent.csv\" , encoding=result['encoding'])" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Province/StateCountry/RegionLatLongDateConfirmedDeathsRecovered
15061NaNBarbados13.1939-59.54323/17/20200
15062NaNMontenegro42.500019.30003/17/20200
15063NaNThe Gambia13.4667-16.60003/17/20100
\n", + "
" + ], + "text/plain": [ + " Province/State Country/Region Lat Long Date Confirmed \\\n", + "15061 NaN Barbados 13.1939 -59.5432 3/17/20 2 \n", + "15062 NaN Montenegro 42.5000 19.3000 3/17/20 2 \n", + "15063 NaN The Gambia 13.4667 -16.6000 3/17/20 1 \n", + "\n", + " Deaths Recovered \n", + "15061 0 0 \n", + "15062 0 0 \n", + "15063 0 0 " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.tail(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Ranknamepop2019pop2018GrowthRateareaDensity
01China1433783.686NaN1.00399706961.0147.7068
12India1366417.754NaN1.00993287590.0415.6290
23United States329064.917NaN1.00599372610.035.1092
\n", + "
" + ], + "text/plain": [ + " Rank name pop2019 pop2018 GrowthRate area Density\n", + "0 1 China 1433783.686 NaN 1.0039 9706961.0 147.7068\n", + "1 2 India 1366417.754 NaN 1.0099 3287590.0 415.6290\n", + "2 3 United States 329064.917 NaN 1.0059 9372610.0 35.1092" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "population.head(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countrycode_2code_3country_codeiso_3166_2continentsub_regionregion_codesub_region_code
0AfghanistanAFAFG4ISO 3166-2:AFAsiaSouthern Asia142.034.0
1Åland IslandsAXALA248ISO 3166-2:AXEuropeNorthern Europe150.0154.0
2AlbaniaALALB8ISO 3166-2:ALEuropeSouthern Europe150.039.0
\n", + "
" + ], + "text/plain": [ + " country code_2 code_3 country_code iso_3166_2 continent \\\n", + "0 Afghanistan AF AFG 4 ISO 3166-2:AF Asia \n", + "1 Åland Islands AX ALA 248 ISO 3166-2:AX Europe \n", + "2 Albania AL ALB 8 ISO 3166-2:AL Europe \n", + "\n", + " sub_region region_code sub_region_code \n", + "0 Southern Asia 142.0 34.0 \n", + "1 Northern Europe 150.0 154.0 \n", + "2 Southern Europe 150.0 39.0 " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "continent.head(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #1: Combine covid and continent data" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "df = df.merge(continent, left_on='Country/Region', right_on='country', how='inner')" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Province/StateCountry/RegionLatLongDateConfirmedDeathsRecoveredcountrycode_2code_3country_codeiso_3166_2continentsub_regionregion_codesub_region_code
0NaNThailand15.0101.01/22/20200ThailandTHTHA764ISO 3166-2:THAsiaSouth-Eastern Asia142.035.0
1NaNThailand15.0101.01/23/20300ThailandTHTHA764ISO 3166-2:THAsiaSouth-Eastern Asia142.035.0
2NaNThailand15.0101.01/24/20500ThailandTHTHA764ISO 3166-2:THAsiaSouth-Eastern Asia142.035.0
\n", + "
" + ], + "text/plain": [ + " Province/State Country/Region Lat Long Date Confirmed Deaths \\\n", + "0 NaN Thailand 15.0 101.0 1/22/20 2 0 \n", + "1 NaN Thailand 15.0 101.0 1/23/20 3 0 \n", + "2 NaN Thailand 15.0 101.0 1/24/20 5 0 \n", + "\n", + " Recovered country code_2 code_3 country_code iso_3166_2 continent \\\n", + "0 0 Thailand TH THA 764 ISO 3166-2:TH Asia \n", + "1 0 Thailand TH THA 764 ISO 3166-2:TH Asia \n", + "2 0 Thailand TH THA 764 ISO 3166-2:TH Asia \n", + "\n", + " sub_region region_code sub_region_code \n", + "0 South-Eastern Asia 142.0 35.0 \n", + "1 South-Eastern Asia 142.0 35.0 \n", + "2 South-Eastern Asia 142.0 35.0 " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #2: Get last value for Confirmed per country" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ConfirmedCountry/RegionRecovered
022Afghanistan1
155Albania0
260Algeria12
339Andorra1
41Antigua and Barbuda0
\n", + "
" + ], + "text/plain": [ + " Confirmed Country/Region Recovered\n", + "0 22 Afghanistan 1\n", + "1 55 Albania 0\n", + "2 60 Algeria 12\n", + "3 39 Andorra 1\n", + "4 1 Antigua and Barbuda 0" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "last_confirmed_number = df[df.Confirmed > 0].groupby('Country/Region', as_index = False).last()[['Confirmed', 'Country/Region', 'Recovered']]\n", + "last_confirmed_number.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #3: Get first date of Confirmed per country" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
DateCountry/Regioncontinent
02/24/20AfghanistanAsia
13/9/20AlbaniaEurope
22/25/20AlgeriaAfrica
33/2/20AndorraEurope
43/13/20Antigua and BarbudaAmericas
\n", + "
" + ], + "text/plain": [ + " Date Country/Region continent\n", + "0 2/24/20 Afghanistan Asia\n", + "1 3/9/20 Albania Europe\n", + "2 2/25/20 Algeria Africa\n", + "3 3/2/20 Andorra Europe\n", + "4 3/13/20 Antigua and Barbuda Americas" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "first_date = df[df.Confirmed > 0].groupby('Country/Region', as_index = False).first()[['Date', 'Country/Region', 'continent']]\n", + "first_date.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #4: Combine last values and first date" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ConfirmedCountry/RegionRecoveredDatecontinent
92236Pakistan22/26/20Asia
97238Poland133/4/20Europe
109266Singapore1141/23/20Asia
111275Slovenia03/5/20Europe
40321Finland101/29/20Europe
\n", + "
" + ], + "text/plain": [ + " Confirmed Country/Region Recovered Date continent\n", + "92 236 Pakistan 2 2/26/20 Asia\n", + "97 238 Poland 13 3/4/20 Europe\n", + "109 266 Singapore 114 1/23/20 Asia\n", + "111 275 Slovenia 0 3/5/20 Europe\n", + "40 321 Finland 10 1/29/20 Europe" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df3 = last_confirmed_number.merge(first_date, on='Country/Region', how='inner')\n", + "df_final = df3.sort_values(by=['Confirmed', 'Date']).tail(20)\n", + "df_final.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #5: Convert dates to datetime and sort" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "df_final['Date'] = pd.to_datetime(df_final['Date'])" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "63 2020-01-22\n", + "109 2020-01-23\n", + "75 2020-01-25\n", + "44 2020-01-27\n", + "40 2020-01-29\n", + "61 2020-01-31\n", + "118 2020-01-31\n", + "114 2020-02-01\n", + "15 2020-02-04\n", + "60 2020-02-21\n", + "119 2020-02-25\n", + "9 2020-02-25\n", + "90 2020-02-26\n", + "92 2020-02-26\n", + "46 2020-02-26\n", + "19 2020-02-26\n", + "99 2020-02-29\n", + "98 2020-03-02\n", + "97 2020-03-04\n", + "111 2020-03-05\n", + "Name: Date, dtype: datetime64[ns]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_final['Date'].sort_values()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #6: Plot Data as Scatterplot\n", + "* x axis - Current Active Cases\n", + "* y axis - First Date Confirmed\n", + "* size of points - Current Recovered " + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAABCYAAAKeCAYAAAB54hodAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAIABJREFUeJzs3Xl4Tef6//H3zrSzM4shCTIRQauooYQeBBU9Smn6Lb4dqKnGNlXao20M1R6dOKXa4+hBnKKGo6pFqUYTNaWkYgopyglNgiKJBEkk6/eHn/XtrhhP2x18Xte1r8tez/Pc614r8ce68zzPshiGYSAiIiIiIiIi4gBOjk5ARERERERERO5cKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDuPi6ATk91FWVkZWVhbe3t5YLBZHpyMiIiIiIiK3OcMwOHPmDNWrV8fJ6frnQagwcZvKysoiODjY0WmIiIiIiIjIHebIkSPUrFnzuvurMHGb8vb2Bi7+Qvj4+Dg4GxEREREREbnd5efnExwcbD6PXi8VJm5Tl5Zv+Pj4qDAhIiIiIiIif5gb3U5Am1+KiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIiIiIiIiIiMM4tDAxadIkmjdvjre3N9WqVaN79+5kZGTY9Tl//jzDhg2jcuXKeHl5ERsby7Fjx8z2HTt20Lt3b4KDg7HZbNSvX5+pU6dedq6kpCSaNGmC1WolIiKChISEa+ZnGAZjx44lKCgIm81Gx44d2b9/v12fbt26ERISgru7O0FBQTz55JNkZWVdM/a18iktLSU+Pp7w8HBsNhu1a9dm4sSJGIZxzdgiIiIiIiIitwqHFiaSk5MZNmwYW7ZsYe3atZSUlNCpUycKCwvNPs8//zxffPEFS5YsITk5maysLB555BGzPTU1lWrVqjFv3jz27NnDK6+8wpgxY5g+fbrZ59ChQ3Tp0oXo6GjS0tKIi4tjwIABrFmz5qr5vf3220ybNo0ZM2aQkpKCp6cnMTExnD9/3uwTHR3N4sWLycjIYOnSpRw8eJBHH330qnGvJ5+33nqLv//970yfPp29e/fy1ltv8fbbb/P+++9f9/0VERERERERqegsRgX6E/yJEyeoVq0aycnJtGnThry8PKpWrcqCBQvMh/19+/ZRv359Nm/eTMuWLcuNM2zYMPbu3cu6desAeOmll1i5ciW7d+82+/Tq1Yvc3FxWr15dbgzDMKhevTovvPACo0aNAiAvL4+AgAASEhLo1atXueM+//xzunfvTlFREa6uruX2uZ58HnroIQICApg1a5bZJzY2FpvNxrx588qN+0v5+fn4+vqSl5eHj4/PNfuLiIiIiIiI/Ddu9jm0Qu0xkZeXB4C/vz9wcTZESUkJHTt2NPvUq1ePkJAQNm/efNU4l2IAbN682S4GQExMzFVjHDp0iJycHLtxvr6+tGjR4orjTp06xfz582nVqtUVixLXm0+rVq1ITEzkhx9+AC4uWdmwYQMPPvhguTGLiorIz8+3+4iIiIiIiIhUdBWmMFFWVkZcXBytW7emQYMGAOTk5ODm5oafn59d34CAAHJycsqNs2nTJhYtWsSgQYPMYzk5OQQEBFwWIz8/n3PnzpUb51L88sb9+twvvfQSnp6eVK5cmczMTJYvX37Va72efP7yl7/Qq1cv6tWrh6urK/feey9xcXE8/vjj5cacNGkSvr6+5ic4OPiqOYiIiIiIiIhUBBWmMDFs2DB2797NwoULbzrG7t27efjhhxk3bhydOnW67nHz58/Hy8vL/Hz77bc3dN7Ro0ezfft2vvrqK5ydnXnqqafMTSp/GXfw4MHXHXPx4sXMnz+fBQsW8P333zN37lzeffdd5s6dW27/MWPGkJeXZ36OHDlyQ9cgIiIiIiIi4ggujk4AYPjw4axYsYL169dTs2ZN83hgYCDFxcXk5ubazZo4duwYgYGBdjHS09Pp0KEDgwYN4tVXX7VrCwwMtHuTx6UYPj4+2Gw2unXrRosWLcy2GjVqkJ2dbfYLCgqyG9e4cWO7WFWqVKFKlSpERkZSv359goOD2bJlC1FRUaSlpZn9Lq2xuVY+cLHYcWnWBMA999zDf/7zHyZNmkSfPn0uu4dWqxWr1XrZcREREREREZGKzKGFCcMwGDFiBMuWLSMpKYnw8HC79qZNm+Lq6kpiYiKxsbEAZGRkkJmZSVRUlNlvz549tG/fnj59+vDGG29cdp6oqChWrVpld2zt2rVmDG9vb7y9ve3aw8PDCQwMJDEx0SxE5Ofnk5KSwpAhQ654TWVlZcDFPR8AIiIibjgfgLNnz+LkZD+hxdnZ2YwvIiIiIiIicjtwaGFi2LBhLFiwgOXLl+Pt7W3u3eDr64vNZsPX15f+/fszcuRI/P398fHxYcSIEURFRZlv5Ni9ezft27cnJiaGkSNHmjGcnZ2pWrUqAIMHD2b69Om8+OKL9OvXj3Xr1rF48WJWrlx5xdwsFgtxcXG8/vrr1KlTh/DwcOLj46levTrdu3cHICUlha1bt3L//fdTqVIlDh48SHx8PLVr17YrMvza9eTTtWtX3njjDUJCQrj77rvZvn07U6ZMoV+/fv/dTRcRERERERGpSAwHAsr9zJkzx+xz7tw5Y+jQoUalSpUMDw8Po0ePHkZ2drbZPm7cuHJjhIaG2p3rm2++MRo3bmy4ubkZtWrVsjvHlZSVlRnx8fFGQECAYbVajQ4dOhgZGRlm+86dO43o6GjD39/fsFqtRlhYmDF48GDj6NGj14x9rXzy8/ON5557zggJCTHc3d2NWrVqGa+88opRVFR0zdiGYRh5eXkGYOTl5V1XfxEREREREZH/xs0+h1oM4//v0ii3lZt9f6yIiIiIiIjIzbjZ59AK81YOEREREREREbnzqDAhIiIiIiIiIg6jwoSIiIiIiIiIOIwKEyIiIiIiIiLiMCpMiIiIiIiIiIjDqDAhIiIiIiIiIg6jwoSIiIiIiIiIOIwKEyIiIiIiIiLiMCpMiIiIiIiIiIjDqDAhIiIiIiIiIg6jwoSIiIiIiIiIOIwKEyIiIiIiIiLiMCpMiIiIiIiIiIjDqDAhIiIiIiIiIg6jwoSIiIiIiIiIOIwKEyIiIiIiIiLiMCpMiIiIiIiIiIjDqDAhIiIiIiIiIg6jwoSIiIiIiIiIOIwKE3JLsVgsfPbZZ45Ow87hw4exWCykpaU5OhUREREREZFbjgoTUqGcOHGCIUOGEBISgtVqJTAwkJiYGDZu3Ojo1K4oODiY7OxsGjRo4OhUREREREREbjkujk5A5JdiY2MpLi5m7ty51KpVi2PHjpGYmMjJkycdndoVOTs7ExgY6Og0REREREREbkmaMSEVRm5uLt9++y1vvfUW0dHRhIaGct999zFmzBi6detW7phdu3bRvn17bDYblStXZtCgQRQUFADw1Vdf4e7uTm5urt2Y5557jvbt25vfN2zYwJ/+9CdsNhvBwcE8++yzFBYWmu1hYWH89a9/pV+/fnh7exMSEsLMmTPN9l8v5SgtLaV///6Eh4djs9moW7cuU6dO/c3uk4iIiIiIyO1EhQmpMLy8vPDy8uKzzz6jqKjomv0LCwuJiYmhUqVKbN26lSVLlvD1118zfPhwADp06ICfnx9Lly41x5SWlrJo0SIef/xxAA4ePEjnzp2JjY1l586dLFq0iA0bNpgxLpk8eTLNmjVj+/btDB06lCFDhpCRkVFuXmVlZdSsWZMlS5aQnp7O2LFjefnll1m8ePHN3hoREREREZHblsUwDMPRSchvLz8/H19fX/Ly8vDx8XF0Otdt6dKlDBw4kHPnztGkSRPatm1Lr169aNiwIXBx88tly5bRvXt3PvroI1566SWOHDmCp6cnAKtWraJr165kZWUREBBAXFwcu3btIjExEbg4i6Jbt27k5OTg5+fHgAEDcHZ25h//+IeZw4YNG2jbti2FhYW4u7sTFhbGn/70Jz7++GMADMMgMDCQCRMmMHjwYA4fPkx4eDjbt2+ncePG5V7X8OHDycnJ4d///vfveftEREREREQc5mafQzVjQiqU2NhYsrKy+Pzzz+ncuTNJSUk0adKEhISEy/ru3buXRo0amUUJgNatW1NWVmbOZnj88cdJSkoiKysLgPnz59OlSxf8/PwA2LFjBwkJCeZsDS8vL2JiYigrK+PQoUNm3EuFEbhYHAkMDOT48eNXvI4PPviApk2bUrVqVby8vJg5cyaZmZn/1b0RERERERG5HakwIRWOu7s7DzzwAPHx8WzatIm+ffsybty4m4rVvHlzateuzcKFCzl37hzLli0zl3EAFBQU8Mwzz5CWlmZ+duzYwf79+6ldu7bZz9XV1S6uxWKhrKys3HMuXLiQUaNG0b9/f7766ivS0tJ4+umnKS4uvqlrEBERERERuZ3prRxS4d1111189tlnlx2vX78+CQkJFBYWmrMmNm7ciJOTE3Xr1jX7Pf7448yfP5+aNWvi5OREly5dzLYmTZqQnp5ORETEb5bvxo0badWqFUOHDjWPHTx48DeLLyIiIiIicjvRjAmpME6ePEn79u2ZN28eO3fu5NChQyxZsoS3336bhx9++LL+jz/+OO7u7vTp04fdu3fzzTffMGLECJ588kkCAgLs+n3//fe88cYbPProo1itVrPtpZdeYtOmTQwfPpy0tDT279/P8uXLL9v88kbUqVOHbdu2sWbNGn744Qfi4+PZunXrTccTERERERG5nWnGhFQYXl5etGjRgr/97W8cPHiQkpISgoODGThwIC+//PJl/T08PFizZg3PPfcczZs3x8PDg9jYWKZMmWLXLyIigvvuu4/vvvuO9957z66tYcOGJCcn88orr/CnP/0JwzCoXbs2PXv2vOnreOaZZ9i+fTs9e/bEYrHQu3dvhg4dypdffnnTMUVERERERG5XeivHbepWfSuHiIiIiIiI3Jr0Vg6544WFhV02I+L30LdvX7p37/67n0dEREREROROoMKEVBh9+/bFYrFgsVhwc3MjIiKC1157jQsXLjg6NREREREREfmdaI8JqVA6d+7MnDlzKCoqYtWqVQwbNgxXV1fGjBnj6NRERERERETkd6AZE1KhWK1WAgMDCQ0NZciQIXTs2JHPP/8cgKVLl3L33XdjtVoJCwtj8uTJV401ZcoU7rnnHjw9PQkODmbo0KEUFBSY7QkJCfj5+bFmzRrq16+Pl5cXnTt3Jjs72+xTWlrKyJEj8fPzo3Llyrz44otoWxYREREREZHfjgoTUqHZbDaKi4tJTU3lscceo1evXuzatYvx48cTHx9PQkLCFcc6OTkxbdo09uzZw9y5c1m3bh0vvviiXZ+zZ8/y7rvv8vHHH7N+/XoyMzMZNWqU2T558mQSEhKYPXs2GzZs4NSpUyxbtuz3ulwREREREZE7jpZySIVkGAaJiYmsWbOGESNGMGXKFDp06EB8fDwAkZGRpKen884779C3b99yY8TFxZn/DgsL4/XXX2fw4MF8+OGH5vGSkhJmzJhB7dq1ARg+fDivvfaa2f7ee+8xZswYHnnkEQBmzJjBmjVrfuvLFRERERERuWNpxoRUKCtWrMDLywt3d3cefPBBevbsyfjx49m7dy+tW7e269u6dWv2799PaWlpubG+/vprOnToQI0aNfD29ubJJ5/k5MmTnD171uzj4eFhFiUAgoKCOH78OAB5eXlkZ2fTokULs93FxYVmzZr9lpcsIiIiIiJyR1NhQhzmzPkSdhw5zdfpOWw9fJLzJaVER0eTlpbG/v37OXfuHHPnzsXT0/OGYx8+fJiHHnqIhg0bsnTpUlJTU/nggw8AKC4uNvu5urrajbNYLNpDQkRERERE5A+kwoQ4xPmSUrb95zR7s89wvqSMwz+f5eeCYtzcbURERBASEoKLy/+tNKpfvz4bN260i7Fx40YiIyNxdna+LH5qaiplZWVMnjyZli1bEhkZSVZW1g3l6OvrS1BQECkpKeaxCxcukJqaeoNXKyIiIiIiIleiPSbEIU6cKeJ4/nmCK3ng5GQBoKS0DONC+csyXnjhBZo3b87EiRPp2bMnmzdvZvr06Xb7RfxSREQEJSUlvP/++3Tt2pWNGzcyY8aMG87zueee480336ROnTrUq1ePKVOmkJube8NxREREREREpHyaMSEOcaHs4nKJS0UJAAtQdoVVFE2aNGHx4sUsXLiQBg0aMHbsWF577bUrbnzZqFEjpkyZwltvvUWDBg2YP38+kyZNuuE8X3jhBZ588kn69OlDVFQU3t7e9OjR44bjiIiIiIiISPkshhbU35by8/Px9fUlLy8PHx8fR6dzmdyzxaz/4QROFguVPN04W1RK7rliWtaqTLC/h6PTExERERERkRt0s8+hmjEhDuHn4UaDGr44OVn46fQ5zpVcoF6gD9X9bI5OTURERERERP5A2mNCHKZWVS8Cfd0pOH8Bq6szvjbXaw8SERERERGR24oKE+JQHm4ueLjp11BEREREROROpaUcIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIOEBCQgJ+fn6OTkNERERERMThVJiQCq9v375YLBYsFgtubm5ERETw2muvceHChZuOefjwYSwWC2lpab9hpiIiIiIiInKjXBydgMj16Ny5M3PmzKGoqIhVq1YxbNgwXF1dGTNmzA3HKi4u/h0yFBERERERkZuhGRNyS7BarQQGBhIaGsqQIUPo2LEjn3/+OQBLly7l7rvvxmq1EhYWxuTJk+3GhoWFMXHiRJ566il8fHwYNGgQ4eHhANx7771YLBbatWsHQLt27YiLi7Mb3717d/r27Wt+z87OpkuXLthsNsLDw1mwYAFhYWG89957Zp8pU6Zwzz334OnpSXBwMEOHDqWgoOB3uDMiIiIiIiK3NhUm5JZks9koLi4mNTWVxx57jF69erFr1y7Gjx9PfHw8CQkJdv3fffddGjVqxPbt24mPj+e7774D4OuvvyY7O5tPP/30us/91FNPkZWVRVJSEkuXLmXmzJkcP37cro+TkxPTpk1jz549zJ07l3Xr1vHiiy/+19ctIiIiIiJyu9FSDrmlGIZBYmIia9asYcSIEUyZMoUOHToQHx8PQGRkJOnp6bzzzjt2sxzat2/PCy+8YH53dnYGoHLlygQGBl73+fft28fXX3/N1q1badasGQD//Oc/qVOnjl2/X866CAsL4/XXX2fw4MF8+OGHN3zNIiIiIiIitzPNmJBbwooVK/Dy8sLd3Z0HH3yQnj17Mn78ePbu3Uvr1q3t+rZu3Zr9+/dTWlpqHrtURPhvZWRk4OLiQpMmTcxjERERVKpUya7f119/TYcOHahRowbe3t48+eSTnDx5krNnz/4meYiIiIiIiNwuVJiQW0J0dDRpaWns37+fc+fOMXfuXDw9Pa97/PX2dXJywjAMu2MlJSU3lOvhw4d56KGHaNiwIUuXLiU1NZUPPvgA0MabIiIiIiIiv6bChFRIPxcUceTUWc6XXJz14OnpSUREBCEhIbi4/N8KpPr167Nx40a7sRs3biQyMtJcrlEeNzc3ALtZFQBVq1YlOzvb/F5aWsru3bvN73Xr1uXChQts377dPHbgwAFOnz5tfk9NTaWsrIzJkyfTsmVLIiMjycrKupHLFxERERERuWOoMCEVzvEz59l88CQbD/zM9/85za8mMNh54YUXSExMZOLEifzwww/MnTuX6dOnM2rUqKueo1q1athsNlavXs2xY8fIy8sDLu5FsXLlSlauXMm+ffsYMmQIubm55rh69erRsWNHBg0axHfffcf27dsZNGgQNpsNi8UCXFzaUVJSwvvvv8+PP/7Ixx9/zIwZM/77GyMiIiIiInIbUmFCKpyC8xcoKLqAl9WF02eLKbtKZaJJkyYsXryYhQsX0qBBA8aOHctrr71mt/FleVxcXJg2bRr/+Mc/qF69Og8//DAA/fr1o0+fPjz11FO0bduWWrVqER0dbTf2X//6FwEBAbRp04YePXowcOBAvL29cXd3B6BRo0ZMmTKFt956iwYNGjB//nwmTZr0390UERERERGR25TF+PWCerkt5Ofn4+vrS15eHj4+Po5O54bkny9h2+FTFJy/QHgVLxrU8DFnI1RER48eJTg42NzwUkRERERE5E50s8+hel2oVDg+7q60jqhC8YUyvKwuFa4osW7dOgoKCrjnnnvIzs7mxRdfJCwsjDZt2jg6NRERERERkVuOChNSIVldnLG6XHnzSkcqKSnh5Zdf5scff8Tb25tWrVoxf/58XF1dHZ2aiIiIiIjILUdLOW5Tt/JSDhEREREREbn13OxzqDa/FBERERERERGHUWFCRERERERERBxGhQkRERERERERcRgVJkRERERERETEYRxamJg0aRLNmzfH29ubatWq0b17dzIyMuz6nD9/nmHDhlG5cmW8vLyIjY3l2LFjZvuOHTvo3bs3wcHB2Gw26tevz9SpUy87V1JSEk2aNMFqtRIREUFCQsI18zMMg7FjxxIUFITNZqNjx47s37/frk+3bt0ICQnB3d2doKAgnnzySbKysq4Z+1r5rF+/nq5du1K9enUsFgufffbZNWOKiIiIiIiI3GocWphITk5m2LBhbNmyhbVr11JSUkKnTp0oLCw0+zz//PN88cUXLFmyhOTkZLKysnjkkUfM9tTUVKpVq8a8efPYs2cPr7zyCmPGjGH69Olmn0OHDtGlSxeio6NJS0sjLi6OAQMGsGbNmqvm9/bbbzNt2jRmzJhBSkoKnp6exMTEcP78ebNPdHQ0ixcvJiMjg6VLl3Lw4EEeffTRq8a9nnwKCwtp1KgRH3zwwXXfTxEREREREZFbTYV6XeiJEyeoVq0aycnJtGnThry8PKpWrcqCBQvMh/19+/ZRv359Nm/eTMuWLcuNM2zYMPbu3cu6desAeOmll1i5ciW7d+82+/Tq1Yvc3FxWr15dbgzDMKhevTovvPACo0aNAiAvL4+AgAASEhLo1atXueM+//xzunfvTlFREa6uruX2udF8LBYLy5Yto3v37uXGAygqKqKoqMj8np+fT3BwsF4XKiIiIiIiIn+I2+J1oXl5eQD4+/sDF2dDlJSU0LFjR7NPvXr1CAkJYfPmzVeNcykGwObNm+1iAMTExFw1xqFDh8jJybEb5+vrS4sWLa447tSpU8yfP59WrVpdsShxs/lcy6RJk/D19TU/wcHBNx1LRERERERE5I9SYQoTZWVlxMXF0bp1axo0aABATk4Obm5u+Pn52fUNCAggJyen3DibNm1i0aJFDBo0yDyWk5NDQEDAZTHy8/M5d+5cuXEuxS9v3K/P/dJLL+Hp6UnlypXJzMxk+fLlV73Wm8nnWsaMGUNeXp75OXLkyE3FEREREREREfkjVZjCxLBhw9i9ezcLFy686Ri7d+/m4YcfZty4cXTq1Om6x82fPx8vLy/z8+23397QeUePHs327dv56quvcHZ25qmnnuLSCplfxh08ePANxb0RVqsVHx8fu4+IiIiIiIhIRefi6AQAhg8fzooVK1i/fj01a9Y0jwcGBlJcXExubq7drIljx44RGBhoFyM9PZ0OHTowaNAgXn31Vbu2wMBAuzd5XIrh4+ODzWajW7dutGjRwmyrUaMG2dnZZr+goCC7cY0bN7aLVaVKFapUqUJkZCT169cnODiYLVu2EBUVRVpamtnvUrHgWvmIiIiIiIiI3CkcOmPCMAyGDx/OsmXLWLduHeHh4XbtTZs2xdXVlcTERPNYRkYGmZmZREVFmcf27NlDdHQ0ffr04Y033rjsPFFRUXYxANauXWvG8Pb2JiIiwvzYbDbCw8MJDAy0G5efn09KSorduX+trKwMwNyI8pdxq1Wrdl35yOWOHDlCv379qF69Om5uboSGhvLcc89x8uTJ646RlJSExWIhNzf3d8xUREREREREboRDZ0wMGzaMBQsWsHz5cry9vc29G3x9fbHZbPj6+tK/f39GjhyJv78/Pj4+jBgxgqioKPONHLt376Z9+/bExMQwcuRIM4azszNVq1YFYPDgwUyfPp0XX3yRfv36sW7dOhYvXszKlSuvmJvFYiEuLo7XX3+dOnXqEB4eTnx8PNWrVzffjpGSksLWrVu5//77qVSpEgcPHiQ+Pp7atWtftchwPfkUFBRw4MAB8/uhQ4dIS0vD39+fkJCQm7zjt6Yff/yRqKgoIiMj+eSTTwgPD2fPnj2MHj2aL7/8ki1btthtdvpHKCkpueoGpyIiIiIiInKdDAcCyv3MmTPH7HPu3Dlj6NChRqVKlQwPDw+jR48eRnZ2ttk+bty4cmOEhobaneubb74xGjdubLi5uRm1atWyO8eVlJWVGfHx8UZAQIBhtVqNDh06GBkZGWb7zp07jejoaMPf39+wWq1GWFiYMXjwYOPo0aPXjH2tfL755ptyr6tPnz7XjG0YhpGXl2cARl5e3nX1r8g6d+5s1KxZ0zh79qzd8ezsbMPDw8MYPHiwYRiG8a9//cto2rSp4eXlZQQEBBi9e/c2jh07ZhiGYRw6dOiK9/LLL780Wrdubfj6+hr+/v5Gly5djAMHDpjnuTR24cKFRps2bQyr1Xpdvz8iIiIiIiJ3kpt9DrUYxv/fpVFuKzf7/tiK5tSpU1SpUoU33niDMWPGXNY+aNAg/v3vf3Py5EnmzJlDUFAQdevW5fhNFDxEAAAgAElEQVTx44wcORI/Pz9WrVpFaWkpy5cvJzY2loyMDHM/D19fX5YuXYrFYqFhw4YUFBQwduxYDh8+TFpaGk5OThw+fJjw8HDCwsKYPHky9957L+7u7nZ7j4iIiIiIiNzpbvY5tEJsfilyJfv378cwDOrXr19ue/369Tl9+jQnTpygX79+5vFatWoxbdo0mjdvTkFBAV5eXuZyj2rVqtltphobG2sXc/bs2VStWpX09HTz1bUAcXFxPPLII7/l5YmIiIiIiNzxHLr55aRJk2jevDne3t5Uq1aN7t27k5GRYdfn/PnzDBs2jMqVK+Pl5UVsbKzdGy127NhB7969CQ4OxmazUb9+faZOnXrZuZKSkmjSpAlWq5WIiAgSEhKumZ9hGIwdO5agoCBsNhsdO3Zk//79Zvvhw4fp378/4eHh2Gw2ateuzbhx4yguLr5q3E8//ZQHHniAqlWr4uPjQ1RUFGvWrLHrc+bMGeLi4ggNDcVms9GqVSu2bt16zZxvV9ea2OPm5kZqaipdu3YlJCQEb29v2rZtC0BmZuZVx+7fv5/evXtTq1YtfHx8CAsLK3dcs2bNbv4CREREREREpFwOLUwkJyczbNgwtmzZwtq1aykpKaFTp04UFhaafZ5//nm++OILlixZQnJyMllZWXZ/tU5NTaVatWrMmzePPXv28MorrzBmzBimT59u9jl06BBdunQhOjqatLQ04uLiGDBgwGXFgF97++23mTZtGjNmzCAlJQVPT09iYmI4f/48APv27aOsrIx//OMf7Nmzh7/97W/MmDGDl19++apx169fzwMPPMCqVatITU0lOjqarl27sn37drPPgAEDWLt2LR9//DG7du2iU6dOdOzYkZ9++umG7vGt6EJpGT/lnuPIqbOEhtfCYrGwd+/ecvvu3buXqlWr4urqSkxMDD4+PsyfP5+tW7eybNkygGsWirp27cqpU6f46KOPSElJISUlpdxxnp6ev8HViYiIiIiIiJ3ffruLm3f8+HEDMJKTkw3DMIzc3FzD1dXVWLJkidln7969BmBs3rz5inGGDh1qREdHm99ffPFF4+6777br07NnTyMmJuaKMcrKyozAwEDjnXfeMY/l5uYaVqvV+OSTT6447u233zbCw8OvfJFXcNdddxkTJkwwDMMwzp49azg7OxsrVqyw69OkSRPjlVdeua54t/LmlzsyTxuLtmYaC7/7j7H10EmjU6dORo0aNa64+eXo0aONbdu2GYCRmZlptn/88ccGYGzfvt0wDMPYuHGjARg///yz2efnn382AGP9+vXmsW+//dYAjGXLlhmG8X+bX16KIyIiIiIiIpe72edQh86Y+LW8vDwAcy+A1NRUSkpK6Nixo9mnXr16hISEsHnz5qvG+eXrIzdv3mwXAyAmJuaqMQ4dOkROTo7dOF9fX1q0aHFD574eZWVlnDlzxhx34cIFSktLcXd3t+tns9nYsGFDuTGKiorIz8+3+9yKDMMgJ/88Hq7O+Li7cuJMEVPem0ZRURExMTGsX7+eI0eOsHr1ah544AEiIyMZO3YsISEhuLm58f777/Pjjz/y+eefM3HiRLvYoaGhWCwWVqxYwYkTJygoKKBSpUpUrlyZmTNncuDAAdatW8fIkSMddPUiIiIiIiJ3ngpTmCgrKyMuLo7WrVubGw7m5OTg5uZmt1EhQEBAADk5OeXG2bRpE4sWLWLQoEHmsZycHAICAi6LkZ+fz7lz58qNcyl+eeOudO4DBw7w/vvv88wzz1zlSi/37rvvUlBQwGOPPQaAt7c3UVFRTJw4kaysLEpLS5k3bx6bN28mOzu73BiTJk3C19fX/AQHB99QDhWFxWKhmreVgqIL5J0roYq3G3fVi2Tr1q3UqlWLxx57jNDQUB588EEiIyPZuHEjXl5eVK1alYSEBJYsWcJdd93Fm2++ybvvvmsXu0aNGkyYMIG//OUvBAQEMHz4cJycnFi4cCGpqak0aNCA559/nnfeecdBVy8iIiIiInLnqTBv5Rg2bBi7d+++4oyA67F7924efvhhxo0bR6dOna573Pz58+2KCV9++SXOzs43dO6ffvqJzp078z//8z8MHDjQPO7l5WX++4knnmDGjBl24xYsWMCECRNYvnw51apVM49//PHH9OvXjxo1auDs7EyTJk3o3bs3qamp5Z5/zJgxdn/pz8/Pv2WLE3fX8KWylxXDgABfKxaLhbCwMLsNS8eNG8eUKVPYuXMnLVu2BKB379707t3bLpbxq00z4+PjiY+PtzvWsWNH0tPTrzguLCzsmptvioiIiIiIyM2pEIWJ4cOHs2LFCtavX0/NmjXN44GBgRQXF5Obm2s3a+LYsWMEBgbaxUhPT6dDhw4MGjSIV1991a4tMDDQ7k0el2L4+Phgs9no1q0bLVq0MNtq1Khhzkw4duwYQUFBduMaN25sFysrK4vo6GhatWrFzJkz7drS0tLMf//6Pa4LFy5kwIABLFmy5LKlJrVr1yY5OZnCwkLy8/MJCgqiZ8+e1KpVi/JYrVasVmu5bbcaV2cngv09rtpnwoQJhIWFsWXLFu677z6cnCrM5B8RERERERG5AQ4tTBiGwYgRI1i2bBlJSUmEh4fbtTdt2hRXV1cSExOJjY0FICMjg8zMTKKiosx+e/bsoX379vTp04c33njjsvNERUWxatUqu2Nr1641Y3h7e+Pt7W3XHh4eTmBgIImJiWYhIj8/n5SUFIYMGWL2++mnn4iOjqZp06bMmTPnsgfkiIiIcq/9k08+oV+/fixcuJAuXbpc8R55enri6enJ6dOnWbNmDW+//fYV+95pnn76aUenICIiIiIiIv8li+HAOepDhw5lwYIFLF++nLp165rHfX19sdlsAAwZMoRVq1aRkJCAj48PI0aMAC7uJQEXl2+0b9+emJgYu70BnJ2dqVq1KnBxI8sGDRowbNgw+vXrx7p163j22WdZuXIlMTExV8zvrbfe4s0332Tu3LmEh4cTHx/Pzp07SU9Px93dnZ9++ol27doRGhrK3Llz7ZZ//HpGxy8tWLCAPn36MHXqVLtXn9psNnx9fQFYs2YNhmFQt25dDhw4wOjRo3F3d+fbb7/F1dX1mvc2Pz8fX19f8vLyLpupISIiIiIiIvJbu9nnUIcWJiwWS7nH58yZQ9++fQE4f/48L7zwAp988on5ZoYPP/zQfPAfP348EyZMuCxGaGgohw8fNr8nJSXx/PPPk56eTs2aNYmPjzfPcSWGYTBu3DhmzpxJbm4u999/Px9++CGRkZEAJCQkXPGv9le7re3atSM5Ofmy43369DH3UVi8eDFjxozh6NGj+Pv7ExsbyxtvvGEWLq5FhQkRERERERH5I92ShQn5/agwISIiIiIiIn+km30O1Y6BIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChIiIiIiIiIg4jAoTIiIiIiIiIuIwKkyIiIiIiIiIiMOoMCEiIiIiIiIiDqPChNw22rVrR1xc3BXb+/btS/fu3a/YnpCQgJ+fn/l9/PjxNG7c+DfNUUREREREROypMCEVRt++fbFYLJd93N3d8fDwwGq1EhAQQOvWrfn73//O2bNnbyj+1KlTSUhIuGJ7z5496dy5MxaLhaZNm7Jw4UKz7bPPPsNisdzspYmIiIiIiMgVuDg6AZFf6ty5Mz4+Phw7doz77ruPd955Bw8PD8LCwsjOzmbjxo3s2rWLmTNnUqNGDbp16wZASUmJGaOkpARXV9fLYvv6+l713DabDXd3d9zd3dm9ezchISG/2XUVFxfj5ub2m8UTERERERG5XWjGhFQoVqsVm82Gn58fO3fuxGq1EhERwcSJE8nJyaFVq1b07NmTVatWsXbtWtq2bYvFYqF///7s3LmTadOm4e/vj4eHB9WrV8fZ2Rl3d3cCAgIIDQ0td0bGL2dmzJs3j5CQEIqLizlw4AA7duzAYrHQs2dPAIKDgwkLC8NqteLm5oazszOurq64uLhQr149unTpQuPGjalatSp+fn64ubnh6urK008/zaOPPsrw4cPNa42Li8NisbBv3z7gYvHC09OTr7/+GoDVq1dz//334+fnR+XKlXnooYc4ePCgOb59+/Z28QBOnDiBm5sbiYmJv/ePSkRERERE5DehwoRUSMXFxXz11VfUqlWL4uJiFi5cSJUqVVi1ahVfffUVANOnT+fPf/4zAF9++SWVKlWiadOmLF68mPj4eLKzsykrK+Nf//oXq1evpqSkBH9/f7Kzs0lLSzMLEt26dWPVqlU0adKE0tJSKleuTKtWrbBYLDg5OfHnP/+ZCRMmANC8eXMGDhzIhQsXaN68OYZhcN999+Hq6krHjh355ptv2LdvH2fPnqW0tJSBAwfi5+eHr68vbdu2JSkpybzG5ORkqlSpYh7bunUrJSUltGrVCoDCwkJGjhzJtm3bSExMxMnJiR49elBWVgbAgAEDWLBgAUVFRWbMefPmUaNGDdq3b/97/4hERERERER+EypMSIW0Zs0aDMNg79697Nixg8TERNasWUOnTp148MEHAahcuTInTpwA4IknniA4OJjWrVvz448/MmHCBJo3b06XLl347rvvuPfee2ndujWnTp0iPz+f3NxcDMOgadOmLFu2jAcffJBBgwZhsVjIzMykTp06ODld/O/x2WefUa9ePQA+/fRT0tPT6dChA8eOHaN3794UFRUxYsQIvvnmG1q2bElJSQlVqlShY8eOfPDBBzz99NN8//33tGvXjvT0dE6cOMHp06dJT0/nueeeMwsTSUlJNG/eHA8PDwBiY2N55JFHiIiIoHHjxsyePZtdu3aRnp4OwCOPPALA8uXLzfuWkJBg7tUhIiIiIiJyK9AeE1KhrFixAovFYs4KuLTPxOzZs2nTpg1Vq1Y1ZwicOnWKzMxMAJo1a8b333/P7NmzOXPmDC4uLmzduhW4uCRixowZ5riDBw/y/vvvA7Bt2zbc3NxwcXHBMAwMw6CwsBAAV1dXzp8/z4EDB8z8Fi1aZBYCCgsLzaUVu3fvpqioiH379pkzMZo1awZAUFAQx48fp0GDBvj7+5OcnIybmxv33nsvDz30EB988AFwcQZFu3btzHPt37+fsWPHkpKSws8//2zek8zMTBo0aIC7uztPPvkks2fP5rHHHuP7779n9+7dfP7557/xT0VEREREROT3oxkTUmGUGQYtWrehdfsYmkXdj8VioUePHrRp04bIyEgKCwtp2LAhixYtAi4+8BcXFwPg6enJkSNHOH/+vLknQ6dOnWjXrh3t2rUjLS2Nrl270qFDBzZv3szGjRsBeP/99xkyZAihoaHmXhH33XcfAC4uLri4uDBmzBgzx8cffxwfHx969OgBXJzV4OXlxXvvvYerqytDhgyhTp06Zk6AWWixWCy0adOGpKQkswjRsGFDioqK2L17N5s2baJt27bmubp27cqpU6f46KOPSElJISUlBcC8Zri4nGPt2rUcPXqUOXPm0L59e0JDQ3+Xn4+IiIiIiMjvQYUJcaijp8+y6cDP7Dhymv+cPEteiRO42sDVg+at2zJ9+nQKCwvZtGkTrq6uREREcNdddwFw5swZu1h5eXnUqVOHlJQUCgoK+O677zh69CheXl5ERETg4+NDfn4+b775JvPmzcNqteLr68u7f3uPleu34uLqSllZGTabDTc3N/MtGl988QWbN28GIDQ0lOjoaI4dO0b16tU5dOgQzs7O/Pjjj0RGRlKlShWsVusVr/fSPhNJSUm0a9cOJycn2rRpwzvvvENRURGtW7cG4OTJk2RkZPDqq6/SoUMH6tevz+nTpy+Ld88999CsWTM++ugjFixYQL9+/X6Tn4uIiIiIiMgfRYUJcZiS0jJ2/5THjz8X8u0PJzhVUETxhTKcnSw4GRd4ZEAcRUXFNGzYkF27dlFSUoKHhwczZswA4Ny5c3bxbDYbmZmZ/Pzzz/z9738nNzeXgwcPcvr0aQ4ePEhGRgbbtm1j9OjRODs707RpU4YPH86YV8cz8a2/cSY/H7i4VCIsLIzi4mIMwyAoKIj33nsPgMOHDxMQEEBiYiL+/v58//33nD17lvfff59evXqxfft2c9+L8lzaZ2LPnj3cf//95rH58+fTrFkzc5ZFpUqVqFy5MjNnzuTAgQOsW7eOkSNHlhtzwIABvPnmmxiGYc7kEBERERERuVVojwlxGGeLBV+bK3nnSqjsZcXV2YliCzhZ4Ltv1/Hdt+sAyMvLNfdteP3113Fxufhr+8ADD9jFCw0NJScnh5iYGDw8POjVqxeLFy9mw4YN3HPPPQAYhsFf//pXu3F/e+t1ANzcrJSUXHxl58CBA5k1axaHDh3ip59+MvtWr16dOXPm4ObmRmZmJk5OTpSUlFBWVsaUKVPw8fHBzc0NwzDKveZ77rkHPz8/IiMj8fLyAi4WJkpLS+32l3BycmLhwoU8++yzNGjQgLp16zJt2jS7Ppf07t2buLg4evfujbu7+w38BERERERERBzPYlzpCUpuafn5+fj6+pKXl4ePj4+j07mi8yWl5J0rwcvNhSO5Z9mblU9JqYG7qxN3V/ehdjXvPySPoguluDo54eR0673N4vDhw9SuXZutW7fSpEkTR6cjIiIiIiJ3qJt9DtVSDnEod1dnAnzc8XR3oV6gDzENAmlXtyptI6v9YUUJAKuLs8OLEocPH8ZisZCWlnZd/UtKSsjJyeHVV1+lZcuWVyxK9O3bl+7du/+WqV63P+rcCQkJ+Pn5/e7nERERERGR354KE1KheLi5UM3HHV8P1z/0vJs3b8bZ2ZkuXbr8pnFv5IE5ODiY7OxsGjRowIkTJxgyZAghISFYrVYCAwOJiYkx3yYCsHHjRoKCgti6dau570Z5pk6dSkJCgvm9Xbt2xMXF3fQ1iYiIiIiI/Ja0x4QIMGvWLEaMGMGsWbPIysqievXqf+j5i4uLcXNzIzAwELj4GtLi4mLmzp1LrVq1OHbsGImJiZw8edIc065duyvuZfFLvr6+v1veV1JaWorFcustixERERERkT+eZkzIHa+goIBFixYxZMgQunTpYje7oLwZD5999pndQ/eOHTuIjo7G29sbHx8fmjZtyrZt20hKSuLpp58mLy/P3Lxz/PjxAISFhTFx4kSeeuopfHx8GDRokLmU49tvv+Xbb7/lr3/9K/PmzaNdu3a0bduWhIQEDh06xKhRo3jooYfM87/33ntYLBZWr15tHouIiOCf//wnYL+com/fviQnJzN16lQzp8OHD9O3b1/z+y8/SUlJABQVFTFq1Chq1KiBp6cnLVq0MNt+eZ8+//xz7rrrLqxWK5mZmZfd69WrV3P//ffj5+dH5cqVeeihhzh48KDZfukefPrpp0RHR+Ph4UGjRo3M17X+8nwhISF4eHjQo0cPu4KNiIiIiIjcWlSYkDve4sWLqVevHnXr1uWJJ55g9uzZ1zUT4ZLHH3+cmjVrsnXrVlJTU/nLX/6Cq6srrVq14r333sPHx4fs7Gyys7MZNWqUOe7dd9+lUaNGbN++nfj4ePO4zWbDy8uL5cuXExgYyJIlS0hPT2fs2LG8/PLLODs7s2HDBkpLSwFITk6mSpUqZqHgp59+4uDBg+W+wWPq1KlERUUxcOBAM6fg4GCmTp1qfs/Ozua5556jWrVq1KtXD4Dhw4ezefNmFi5cyM6dO/mf//kfOnfuzP79+83YZ8+e5a233uKf//wne/bsoVq1apedv7CwkJEjR7Jt2zYSExNxcnKiR48elJWV2fV75ZVXGDVqFGlpaURGRtK7d28uXLgAQEpKCv3792f48OGkpaURHR3N66+/ft0/LxERERERqVi0lEPueLNmzeKJJ54AoHPnzuTl5ZGcnFzug315MjMzGT16tPkQX6dOHbPN19cXi8ViLtH4pfbt2/PCCy+Y3w8fPgyAi4sLCQkJDBw4kHPnzpGUlETbtm3p1asXTz/9NHv37uXMmTNs376dpk2bsn79ekaPHs1nn30GQFJSEjVq1CAiIuKyc/r6+uLm5oaHh4ddTr6+vuaSj08//ZR//OMffP311wQGBpKZmcmcOXPIzMw0l7iMGjWK1atXM2fOHPP1qyUlJXz44Yc0atToivcqNjbW7vvs2bOpWrUq6enpNGjQwDw+atQoc7+PCRMmcPfdd3PgwAHq1avH1KlT6dy5My+++CIAkZGRbNq0yW7GiIiIiIiI3Do0Y0LuaBkZGXz33Xf07t0buFgU6NmzJ7NmzbruGCNHjmTAgAF07NiRN998025pwtU0a9bsim2xsbFkZWXRv39/Dh06xDvvvEOjRo2YMWMGOTk5NGrUiKSkJHbt2oWbmxuDBg1i+/btFBQUkJycTNu2ba87/1/avn07Tz75JNOnT6d169YA7Nq1i9LSUiIjI/Hy8jI/ycnJdtfq5uZGw4YNrxp///799O7dm1q1auHj40NYWBjAZcs+fhknKCgIgOPHjwOwd+9eWrRoYdc/Kirqpq5XREREREQcTzMm5I42a9YsLly4YLfZpWEYWK1Wpk+fjpOT02XLOkpKSuy+jx8/nv/93/9l5cqVfPnll4wbN46FCxfSo0ePq57b09PT7ntRSal5fri4l8WsWbOYPHkyUVFRTJo0iS+//JLi4mLat29PUlISVquVtm3b4u/vT/369dmwYQPJycl2MzGuV05ODt26dWPAgAH079/fPF5QUICzszOpqak4OzvbjfHy8jL/bbPZrrnhZdeuXQkNDeWjjz6ievXqlJWV0aBBA4qLi+36ubr+31tZLsX89XIPERERERG5PagwIXekkwVFnDlXxL/+9S8mT55Mp06d7Nq7d+/OJ598QmhoKGfOnKGwsNAsJKSlpV0WLzIyksjISJ5//nl69+7NnDlz6NGjB25ubuZeEFdzPP88W368uIHjjycKaGwYbNy4kVatWjF06FAAWrZsaS7XaNu2LbNnz8bFxYXOnTsDF9/S8cknn/DDDz9cdRlKeTmdP3+ehx9+mHr16jFlyhS7tnvvvZfS0lKOHz/On/70p2tey5WcPHmSjIwMPvroIzPOhg0bbjhO/fr1SUlJsTu2ZcuWm85LREREREQcS4UJueOcLyll2+FTJK5ZyenTp+nfv/9lr9SMjY1l1qxZrFmzBg8PD15++WWeffZZUlJS7N7ace7cOUaPHs2jjz5KeHg4R48eZevWreZeCmFhYRQUFJCYmEijRo3w8PDAw8PjspwO/VzImfMXN3fcl3mMdtHtCQmuSUpKCnPnziU3N5exY8ea/du0acOZM2dYsWIFb775JnCxMPHoo48SFBREZGTkFa8/LCyMlJQUDh8+jJeXF/7+/jzzzDMcOXKExMRETpw4Yfb19/cnMjKSxx9/nKeeeorJkydz7733cuLECRITE2nYsKG5F8S1VKpUicqVKzNz5kyCgoLIzMzkL3/5y3WN/aVnn32W1q1b8+677/Lwww+zZs0a7S8hIiIiInIL0x4TcsdxdrLgaXUh+YtFtGnX/rKiBFwsTGzbto2jR48yb948Vq1axT333MMnn3xivvITwNnZmZP/j707j466uv8//ppMtskygUA2IIFAiICiCLWIViVACYoIgguWFhAQ0WgbAmKRTVssFlBEqOLyFdoiVTjFnwsiUvYqUqSCJuyyClnYMpMEkkwyn98fyNQhAWZikgnJ83HOaOZz7+d+3jPncODzyv3ce+qUhg4dquTkZD3wwAO688479dxzz0mSbrnlFo0ZM0YPPvigoqKiNHPmzEprCvT3cz2qEBpq0c9//nNlZWWptLRUw4cPV0ZGhtq1a6ff/va3ks7f5Hfs2FFRUVGuRTdvv/12OZ3OK64vMX78eJnNZnXo0EFRUVE6cuSINmzYoOzsbHXo0EFxcXGu1xdffCFJWrhwoYYOHapx48bpmmuu0YABA7R161YlJCR4/L37+fnp3Xff1bZt23Tddddp7NixmjVrlsfnX3DzzTfrzTff1Ny5c3XDDTfos88+0+TJk70eBwAAAEDdYDK82RcRVw273a6IiAjZbDZZrVZfl1PnOMqdKis3ZAk0X7lzLThbWqY9OQUqdjjVOipUMdZgX5cEAAAAAF6p6n0oj3KgQQow+ymgbmQSkqSQQH/dmNDY12UAAAAAQK3jUQ4AAAAAAOAzBBMAAAAAAMBnCCYAAAAAAIDPEEwAAAAAAACfIZgAAAAAAAA+QzABAAAAAAB8hmACAAAAAAD4DMEEAAAAAADwGYIJAAAAAADgMwQTAAAAAADAZwgmAAAAAACAzxBMAAAAAAAAnyGYAAAAAAAAPkMwAQAAAAAAfIZgAgAAAAAA+AzBBAAAAAAA8BmCCQAAAAAA4DMEEwAAAAAAwGcIJgAAAAAAgM8QTAAAAAAAAJ8hmAAAAAAAAD5DMAEAAAAAAHyGYAIAAAAAAPgMwQQAAAAAAPAZggkAAAAAAOAzBBMAAAAAAMBnCCYAAAAAAIDPEEwAAAAAAACfIZgAAAAAAAA+49NgYsaMGbrpppsUHh6u6OhoDRgwQHv27HHrU1xcrLS0NDVp0kRhYWEaNGiQcnNzXe07duzQQw89pPj4eFksFrVv315z586tcK3169erc+fOCgoKUlJSkhYtWnTF+gzD0NSpUxUXFyeLxaJevXpp3759rvZDhw5p5MiRSkxMlMViUZs2bTRt2jSVlpZedtzly5frl7/8paKiomS1WtWtWzetWrXK6+8GAAAAAICrnU+DiQ0bNigtLU1ffvmlVq9eLYfDod69e9BvfQMAACAASURBVKuoqMjVZ+zYsfroo4+0bNkybdiwQcePH9fAgQNd7du2bVN0dLQWL16srKwsTZo0SRMnTtT8+fNdfQ4ePKi+ffsqJSVF27dvV3p6ukaNGlUhDLjYzJkz9corr2jBggXasmWLQkNDlZqaquLiYknS7t275XQ69frrrysrK0tz5szRggUL9Mwzz1x23I0bN+qXv/ylPvnkE23btk0pKSnq16+fvv76a6++GwAAAAAArnYmwzAMXxdxwYkTJxQdHa0NGzbo9ttvl81mU1RUlJYsWaL77rtP0vkwoH379tq8ebNuvvnmSsdJS0vTrl27tHbtWknS008/rRUrVigzM9PVZ/DgwcrPz9enn35a6RiGYahZs2YaN26cxo8fL0my2WyKiYnRokWLNHjw4ErPmzVrll577TUdOHDAq89+7bXX6sEHH9TUqVMrbb/4u7kSu92uiIgI2Ww2Wa1Wr2oBAAAAAMBbVb0PrVNrTNhsNklSZGSkpPOzIRwOh3r16uXq065dOyUkJGjz5s2XHefCGJK0efNmtzEkKTU19bJjHDx4UDk5OW7nRUREqGvXrl5d2xNOp1MFBQWXPe/i7+ZiJSUlstvtbi8AAAAAAOq6OhNMOJ1Opaen69Zbb9V1110nScrJyVFgYKAaNWrk1jcmJkY5OTmVjvPFF1/ovffe0+jRo13HcnJyFBMTU2EMu92uc+fOVTrOhfErO+9S196/f7/mzZunRx999DKftKLZs2ersLBQDzzwQKXtlX03F5sxY4YiIiJcr/j4eK9qAAAAAADAF+pMMJGWlqbMzEy9++67VR4jMzNT/fv317Rp09S7d2+Pz3vnnXcUFhbmem3atMnrax87dkx9+vTR/fffr0ceecR1/MfjjhkzpsJ5S5Ys0XPPPaelS5cqOjq60rE9+W4mTpwom83meh09etTrzwAAAAAAQG3z93UBkvTEE0/o448/1saNG9WiRQvX8djYWJWWlio/P99t1kRubq5iY2Pdxti5c6d69uyp0aNHa/LkyW5tsbGxbjt5XBjDarXKYrHonnvuUdeuXV1tzZs3V3Z2tqtfXFyc23mdOnVyG+v48eNKSUnRLbfcojfeeMOtbfv27a6fL37G5t1339WoUaO0bNmyCo+aXOm7uVhQUJCCgoIu2Q7vde/eXZ06ddLLL7/s61IAAAAAoN7y6YwJwzD0xBNP6P3339fatWuVmJjo1t6lSxcFBARozZo1rmN79uzRkSNH1K1bN9exrKwspaSkaNiwYXr++ecrXKdbt25uY0jS6tWrXWOEh4crKSnJ9bJYLEpMTFRsbKzbeXa7XVu2bHG79rFjx9S9e3d16dJFCxculJ+f+1f643F/PCPiH//4hx5++GH94x//UN++fb3+buq74cOHa8CAAb4uAwAAAABQw3w6YyItLU1LlizRBx98oPDwcNfaDREREbJYLIqIiNDIkSOVkZGhyMhIWa1WPfnkk+rWrZtrR47MzEz16NFDqampysjIcI1hNpsVFRUlSRozZozmz5+vCRMmaMSIEVq7dq2WLl2qFStWXLI2k8mk9PR0TZ8+XW3btlViYqKmTJmiZs2auW6YL4QSLVu21OzZs3XixAnX+RfP6PixJUuWaNiwYZo7d666du3qqvnCZ/bku0HlSktLFRgY6OsyAAAAAACeMnxIUqWvhQsXuvqcO3fOePzxx43GjRsbISEhxr333mtkZ2e72qdNm1bpGC1btnS71rp164xOnToZgYGBRuvWrd2ucSlOp9OYMmWKERMTYwQFBRk9e/Y09uzZ42pfuHDhJT/D5dxxxx2VnjNs2DCvvpvLsdlshiTDZrN51L+uGTZsmNG/f3/DMAxj2bJlxnXXXWcEBwcbkZGRRs+ePY3CwkK3ftOnTzfi4uKMVq1aGYZhGH/729+MLl26GGFhYUZMTIzx0EMPGbm5uW7X+Pbbb40+ffoYoaGhRnR0tPHrX//aOHHihKv9jjvuMH73u9/V0icGAAAAgKtbVe9DTYZhGLURgKB2VXX/2Lpi+PDhys/P12uvvaaEhATNnDlT9957rwoKCrRp0yYNHTpUYWFhGj58uP75z3/q3nvv1dNPPy1Juvbaa/X2228rLi5O11xzjfLy8pSRkaFGjRrpk08+kSTl5+crOTlZo0aN0tChQ3Xu3Dk9/fTTKisr09q1ayWxxgQAAAAAeKOq96F1YvFL4FKys7NVVlamgQMHqmXLlpKkjh07uvUJDQ3VW2+95fYIx4gRI1w/t27dWq+88opuuukmFRYWKiwsTPPnz9eNN96oP/3pT65+b7/9tuLj47V3714lJyfX8CcDAAAAAEh1aLtQoDI33HCDevbsqY4dO+r+++/Xm2++qTNnzrj16dixY4V1JbZt26Z+/fopISFB4eHhuuOOOyRJR44ckSTt2LFD69atc9vOtV27dpKk7777rhY+GQAAAABAIphAHWE751DmMZu2HT6tgyeLdOEBI7PZrNWrV2vlypXq0KGD5s2bp2uuuUYHDx50nRsaGuo2VlFRkVJTU2W1WvXOO+9o69atev/99yWdXxxTkgoLC9WvXz9t377d7bVv3z7dfvvttfOhAQAAAAA8ygHfKyh26D8HT+lUYakCzH7al1uo/LOlMv3QbjKZdOutt+rWW2/V1KlT1bJlS73//vvKyMiodLzdu3fr1KlTeuGFFxQfHy9J+uqrr9z6dO7cWf/85z/VqlUr+fvzxwAAAAAAfIUZE/C5XHuxThaUKCEyRM0aWRQZGqjCkjKVOQ1t2bJFf/rTn/TVV1/pyJEjWr58uU6cOKH27dtfcryEhAQFBgZq3rx5OnDggD788EP98Y9/dOuTlpam06dP66GHHtLWrVv13XffadWqVXr44YdVXl5e0x8ZAAAAAPADggn4nNOQ/EwmmUzn50gEmP1kSJJhyGq1auPGjbrrrruUnJysyZMn68UXX9Sdd955yfGioqK0aNEiLVu2TB06dNALL7yg2bNnu/Vp1qyZPv/8c5WXl6t3797q2LGj0tPT1ahRI/n58ccCAAAAAGoL24XWU1fTdqG59mJ9sf+kzH5+sgSYdaqoRPGRFnVNbCJ/MyEBAAAAAFwNqnofyl0ffC7GGqzOLRsrOMBPpeVOtWwSoutbNCKUAAAAAIAGgFX/UCe0bBKq5o0sKnMaCvL3cz3WAQAAAACo3wgmUGf4m/3kb/Z1FQAAAACA2sRceQAAAAAA4DMEEwAAAAAAwGcIJgAAAAAAgM8QTAAAAAAAAJ8hmAAAAAAAAD5DMAEAAAAAAHyGYAIAAAAAAPgMwQQAAAAAAPAZggkAAAAAAOAzBBMAAAAAAMBn/D3pNGHCBI8HnDlzZpWLAQAAAAAADYtHwcTmzZvd3u/YsUNlZWVKSkqSJO3fv18BAQG64YYbqr9CAAAAAABQb3kUTGzatMn189y5c2W1WvW3v/1NTZo0kSSdOnVKw4cPV8+ePWumSgAAAAAAUC+ZDMMwvDmhRYsW+vTTT3Xddde5Hf/222/Vp08fHTt2rFoLRNXY7XZFRETIZrPJarX6uhwAAAAAQD1X1ftQrxe/zM/P16lTpyocP336tGw2m7fDAQAAAACABszrYGLAgAEaMWKEPvzwQ+Xk5CgnJ0cffPCBRo0apQEDBtREjQAAAAAAoJ7yaI2JH3v99dc1duxY3X///SorK5Mkmc1mDR8+XC+99FK1FwgAAAAAAOovr9eYuMBut2v//v2SpKSkJNYxqGNYYwIAAAAAUJtqbY2JC06fPq0zZ86oQ4cO3PgCAAAAAIAq8TqYOH36tFJTU9W6dWv17t1bx48flyQNHz5c48ePr/YCAQAAAABA/eV1MJGRkSGn06kDBw4oJCTEdXzw4MFauXJltRYHAAAAAADqN68Xv1y1apVWrlypVq1auR1PTk7W4cOHq6suAAAAAADQAHg9Y6KgoEBhYWEVjp85c0aBgYHVUhQAAAAAAGgYvA4mfvGLX2jx4sWu9yaTSYZhaPbs2UpJSanW4gAAAAAAQP3m9aMcs2bNUo8ePbRt2zaVlpZq4sSJysrKUm5urj7//POaqBEAAAAAANRTXs+Y6Nixo/bu3auf/exn6tu3r06fPq2+ffvq66+/Vtu2bWuiRgAAAAAAUE+ZDMMwfF0Eqp/dbldERIRsNpusVquvywEAAAAA1HNVvQ/1+lEOSSotLVVmZqby8vLkdDrd2u66666qDAkAAAAAABogr4OJ1atX6ze/+Y3y8vIqtJlMJpWXl1dLYQAAAAAAoP7zeo2JtLQ0DRgwQEePHlVpaakcDofrVVpaWhM1AgAAAACAesrrGRM5OTl66qmn1Lx585qoBwAAAAAANCBez5gYOHCgNm7cWBO1AAAAAACABsbrXTmKior04IMPKjY2Vh07dlRAQIBb++OPP16tBaJq2JUDAAAAAFCbam1XjmXLlmnVqlUKCAhQZGSkTCaTq81kMhFMAAAAAAAAj3kdTEycOFFTpkzRpEmTZDaba6ImAAAAAADQQHi9xkRxcbGGDBlCKAEAAAAAAH4yr4OJoUOH6p///GdN1AIAAAAAABoYrx/l8PPz05/+9CetWrVK119/fYXFL2fOnFltxQEAAAAAgPrN62Diq6++UseOHVVaWqqvvvrKre3HC2ECAAAAAABcidfBxKZNm2qiDgAAAAAA0AB5tcaEw+FQcHCwMjMza6oeAAAAAADQgHgVTAQEBCguLk5Op7Om6gEAAAAAAA2I17tyTJw4UZMmTZLNZquJegAAAAAAQAPi9RoTb775pnbv3q24uDglJiYqNDTUrf0///lPtRUHAAAAAADqN6+DiT59+qhPnz41UQsAAAAAAGhgTIZhGL4uAtXPbrcrIiJCNptNVqvV1+UAAAAAAOq5qt6Her3GBAAAAAAAQHXx6FGO6Oho7dy5U02bNlVUVJRMJtMl++bl5VVbcQAAAAAAoH7zKJiYMWOGwsPDJUkvvPBCjRYEAAAAAAAaDo+CiT179qisrExBQUFq3769unbtKrPZXNO1AQAAAACAes6jNSbmzJmjwsJCSdJtt92mU6dO1WhRAAAAAACgYfBoxkTLli316quvqnfv3jIMQ1u3blXjxo0r7XvLLbdUa4EAAAAAAKD+8mi70OXLl+vRRx/VqVOnZDKZdKlTTCaTysvLq71IeI/tQgEAAAAAtamq96EeBRMX5OfnKzIyUllZWYqOjq60T5MmTTy+OGoOwQQAAAAAoDZV9T7Uo0c5LmjUqJFWr16ttm3byt/fq1MBAAAAAAAq8Dpd6NmzpwzD0IEDB5SXlyen0+nWzhoTAAAAAADAU14HE//5z380ZMgQHThwoMJaE6wxAQAAAAAAvOF1MPHoo4/q+uuv1/LlyxUXFyeTyVQTdQEAAAAAgAbA62Bi7969WrZsmZKSkmqiHgAAAAAA0ID4eXvCTTfdpAMHDtRELQAAAAAAoIHxesbE2LFjNW7cOD399NPq2LGjAgIC3No7dOhQbcUBAAAAAID6zWRcvILlFfj5VZxkYTKZZBgGi1/WIVXdPxYAAAAAgKqo6n2o1zMm9u3b5+0pAAAAAAAAlfI6mGjTpk1N1AEAAAAAABogrxe/lKRDhw5p7Nix6tOnj/r06aOMjAwdOnTI63FmzJihm266SeHh4YqOjtaAAQO0Z88etz7FxcVKS0tTkyZNFBYWpkGDBik3N9fVvmPHDj300EOKj4+XxWJR+/btNXfu3ArXWr9+vTp37qygoCAlJSVp0aJFV6zPMAxNnTpVcXFxslgs6tWrl9uMkUOHDmnkyJFKTEyUxWJRmzZtNG3aNJWWll523OzsbP3qV79ScnKy/Pz8lJ6eXqHPokWLZDKZ3F7BwcFXrBkAAAAAgKuJ18HEv/71L7Vr104bN25UcnKykpOTtWHDBrVv315r1qzxaqwNGzYoLS1NX375pVavXi2Hw6HevXurqKjI1Wfs2LH66KOPtGzZMm3YsEHHjx/XwIEDXe3btm1TdHS0Fi9erKysLE2aNEkTJ07U/PnzXX0OHjyovn37KiUlRdu3b1d6erpGjRqlVatWXba+mTNn6pVXXtGCBQu0ZcsWhYaGKjU1VcXFxZKk3bt3y+l06vXXX1dWVpbmzJmjBQsW6JlnnrnsuCUlJYqKitLkyZN1ww03XLKf1WpVdna263X48OHLjgsAAAAAwNXG68Uvu3Tpoh49emjWrFlux5966imtW7dOX331VZWLOXHihKKjo7VhwwbdfvvtstlsioqK0pIlS3TfffdJOh8GtG/fXps3b9bNN99c6ThpaWnatWuX1q5dK0l6+umntWLFCmVmZrr6DB48WPn5+fr0008rHcMwDDVr1kzjxo3T+PHjJUk2m00xMTFatGiRBg8eXOl5s2bN0muvvebxlqrdu3dXp06d9PLLL7sdX7RokdLT05Wfn+/ROBdj8UsAAAAAQG2q6n2o1zMmsrKyNHr06ArHH3nkEbcb/6qw2WySpMjISEnnZ0M4HA716tXL1addu3ZKSEjQ5s2bLzvOhTEkafPmzW5jSFJqauplxzh48KBycnLczouIiFDXrl29uvZPUVhYqJYtWyo+Pl79+/dXVlbWJfuWlJTIbre7vQAAAAAAqOu8DiaaNm2qb775psLxb775RlFRUVUuxOl0Kj09Xbfeequuu+46SVJOTo4CAwPVqFEjt74xMTHKycmpdJwvvvhC7733nlt4kpOTo5iYmApj2O12nTt3rtJxLoxf2XmXuvb+/fs1b948Pfroo5f5pJ655ppr9Pbbb+uDDz7Q4sWL5XQ6dcstt+j777+vtP+MGTMUERHhesXHx//kGgAAAAAAqGleBxMjR47UI488ohdffFGbN2/W5s2bNXv2bI0ePVqjRo2qciFpaWnKzMzUu+++W+UxMjMz1b9/f02bNk29e/f2+Lx33nlHYWFhrtemTZu8vvaxY8fUp08f3X///XrkkUdcx3887pgxYzwer1u3bho6dKg6deqkO+64Q8uXL1dUVJRef/31SvtPnDhRNpvN9Tp69KjXnwEAAAAAgNrm9Xahzz77rMLCwjRr1izl5eVJkqKjozVp0iRlZGRUqYgnnnhCH3/8sTZu3KgWLVq4jsfGxqq0tFT5+flusyZyc3MVGxvrNsbOnTvVs2dPjR49WpMnT3Zri42NddvJ48IYVqtVFotF99xzj7p27epqa968ubKzs1394uLi3M7r1KmT21jHjx9XSkqKbrnlFr3xxhtubdu3b3f9/FPWeggICNCNN96o/fv3V9oeFBSkoKCgKo8PAAAAAIAveB1MmEwmPfXUU3rqqad05swZSVLjxo2rdHHDMPTkk0/q/fff1/r165WYmOjW3qVLFwUEBGjNmjUaNGiQJGnPnj06cuSIunXr5uqXlZWlHj16aNiwYXr++ecrXKdbt2765JNP3I6tXr3aNUZ4eLjCw8Pd2hMTExUbG6s1a9a4ggi73a4tW7bosccec/U7duyYUlJS1KVLFy1cuFB+fu6TUJKSkrz9WipVXl6ub7/9VnfddVe1jAcAAAAAQF3gcTBRXFystWvX6rbbbnPdxF8IJOx2u/7973+rV69eCgwM9PjiaWlpWrJkiT744AOFh4e71m6IiIiQxWJRRESERo4cqYyMDEVGRspqterJJ59Ut27dXDtyZGZmqkePHkpNTVVGRoZrDLPZ7FrzYsyYMZo/f74mTJigESNGaO3atVq6dKlWrFhxydpMJpPS09M1ffp0tW3bVomJiZoyZYqaNWumAQMGSDofSnTv3l0tW7bU7NmzdeLECdf5F8/ouNiFmRSFhYU6ceKEtm/frsDAQHXo0EGS9Ic//EE333yzkpKSlJ+fr1mzZunw4cM/6XEZAAAAAADqGo+3C503b56WL1+udevWVdreo0cP3X///W6zCa54cZOp0uMLFy7U8OHDJZ0PRMaNG6d//OMfKikpUWpqql599VXXjf+zzz6r5557rsIYLVu21KFDh1zv169fr7Fjx2rnzp1q0aKFpkyZ4rrGpRiGoWnTpumNN95Qfn6+fvGLX+jVV19VcnKypPNbej788MOXPNfbz/7jmseOHavly5crJydHjRs3VpcuXTR9+nTdeOONlx33ArYLBQAAAADUpqreh3ocTHTt2lWTJk3SPffcU2n7Rx99pOnTp2vLli0eXxw1h2ACAAAAAFCbqnof6vGuHPv27auw6OOPXX/99dq3b5/HFwYAAAAAAPA4mHA4HDp58uQl20+dOiWHw1EtRQEAAAAAgIbB42CiQ4cOWrNmzSXbV69e7Vq4EQAAAAAAwBMeBxMPP/ywnnvuOX366acV2lauXKnp06dfciFIAAAAAACAyni8XeiYMWO0fv169e3bVx06dFC7du0kSbt379bOnTs1aNAgjRkzpsYKBQAAAAAA9Y/HMyYk6d1339Xf//53tWzZUt9884127Nihli1b6u9//7uWLl1aUzUCAAAAAIB6yuPtQnF1YbtQAAAAAEBtqvHtQgEAAAAAAKobwQQAAAAAAPAZggkAAAAAAOAzBBMAAAAAAMBnqhxMHDp0SGvWrFFxcXF11gMAAAAAABoQr4OJ06dPq0+fPmrdurV69+6t48ePS5KGDx+u8ePHV3uBAAAAAACg/vI6mMjIyFB5ebkOHDigkJAQ1/HBgwdr5cqV1VocAAAAAACo3/y9PWHVqlVauXKlWrVq5XY8OTlZhw8frq66AAAAAABAA+D1jImCggKFhYVVOH7mzBkFBgZWS1EAAAAAAKBh8DqY+MUvfqHFixe73ptMJhmGodmzZyslJaVaiwMAAAAAAPWb149yzJo1Sz169NC2bdtUWlqqiRMnKisrS7m5ufr8889rokYAAAAAAFBPeT1jomPHjtq7d69+9rOfqW/fvjp9+rT69u2rr7/+Wm3btq2JGgEAAAAAQD1lMgzD8OaE48ePq1mzZl63oXbZ7XZFRETIZrPJarX6uhwAAAAAQD1X1ftQr2dMxMfHKy8vr8LxU6dOKT4+3tvhAAAAAABAA+Z1MHGpCRZFRUUKDg7+yQUBAAAAAICGw+PFLydMmCDp/C4cf/jDHxQSEuJqKy8v15dffqkbbrih+isEAAAAAAD1lsfBxObNmyWdnzHx1VdfKSAgwNUWGBiodu3aucILAAAAAAAAT3gcTGzatEmS9Jvf/EZ/+ctfWFARAAAAAAD8ZB4HExf8/e9/r4k6AAAAAABAA+R1MCFJX3/9tZYtW6YjR46otLTUrW3p0qXVUhgAAAAAAKj/vN6VY9myZeratasrnCgoKNB///tfffbZZ+zKAQAAAAAAvOJ1MDF9+nS9+OKLWrlypQIDA/WXv/xFe/fu1aBBg5SUlFQTNQIAAAAAgHrK62Bi//79uvvuuyWd342jqKhIfn5+GjdunBYsWFDtBQIAAAAAgPrL62CicePGKiwslCQ1b95cO3fulCTZ7XbXcQAAAAAAAE94vfjlbbfdpjVr1qhjx44aNGiQfve732n9+vVatWqVevToURM1AgAAAACAesrrYGLevHk6d+6cJGny5Mkym8364osv1K9fP02dOrXaCwQAAAAAAPWXyTAMw9dFoPrZ7XZFRETIZrPJarX6uhwAAAAAQD1X1ftQr2dMFBQUaM2aNTp06JBMJpNat26tlJQUhYWFeTsUAAAAAABo4LwKJt599109/vjjys/PdzveuHFjvf7667rvvvuqtTgAAAAAAFC/ebwrx/bt2zV06FDddddd2rp1qwoKCmS32/Xll18qNTVVQ4YM0TfffFOTtQIAAAAAgHrG4zUmRowYofz8fC1fvrzS9oEDByoyMlJvvfVWtRaIqmGNCQAAAABAbarqfajHMyY+//xzPfbYY5dsf+yxx7Rp0yaPLwwAAAAAAOBxMHHs2DFdc801l2xPTk7W999/Xy1FAQAAAACAhsHjYOLs2bMKDg6+ZHtwcLCKi4urpSgAAAAAANAweLUrx5o1axQREVFp28U7dQAAAAAAAFyJV8HEkCFDLttuMpl+UjEAAAAAAKBh8TiYcDgcNVkHAAAAAABogDwOJsxmc03WAQAAAAAAGiCPF78EAAAAAACobgQTAAAAAADAZwgmAAAAAACAzxBMAAAAAAAAn/E6mEhOTtbp06crHM/Pz1dycnK1FAUAAAAAABoGr4OJ/fv3q6ysrMLxkpISHT58uFqKAgAAAAAADYPH24V+8sknrp/XrFmjiIgI1/vy8nL961//UqtWraq1OAAAAAAAUL95HEzcfffdkiSTyaQhQ4a4tZnNZiUkJGjOnDnVWx0AAAAAAKjXPA4mHA6HDMNQYmKitm7dqqioKFeb2WyukeIAAAAAAED95nEwcSF8OHr0aIW2wsJChYWFVV9VAAAAAACgQfB68cvZs2dr6dKlrvcPPfSQrFarEhIS9O2331ZrcQAAAAAAoH7zOph49dVX1bx5c0nnF8FcuXKlPv74Y/Xs2VPjx4+v9gIBAAAAAED95fGjHBdkZ2crISFBkvTRRx/pgQce0F133aWkpCR17dq12gsEAAAAAAD1l9czJho3bqzvv/9ekvTpp5+qV69errby8vLqqwwAAAAAANR7Xs+Y6N+/v4YMGaLk5GTl5eXpzjvvlCRt375dbdq0qfYCAQAAAABA/eX1jIm5c+fq0UcfVZs2bfTZZ58pPDxc0vndOsaMGVPtBQIAAAAAgPrLqxkTDodDTzzxhJ555hm1atXKrW3cuHHVWRcAAAAAAGgAvJoxERAQoPfee0+GYdRUPQAAAAAAoAHx+lGOe+65Rx9++GFN1AIAAAAAABoYrxe/7NChg5577jlt3rxZXbp0UWhoqFv7448/Xm3FAQAAAACA+s1kePlcRnx8/KUHM5l05MiRn1wUfjq73a6IiAjZbDZZrVZflwMAAAAAqOeqeh/q9YyJo0ePensKAAAAAABApbxeYwIAAAAAAKC6ooZp4QAAIABJREFUeDRjYsKECZo2bZpCQ0M1YcKEy/adOXNmtRQGAAAAAADqP4+Cic2bN8vhcLh+vhSTyVQ9VQEAAAAAgAbB68UvcXVg8UsAAAAAQG2q6n2ox2tMHDhwQGQYAAAAAACgOnkcTLRt21YnTpxwvX/wwQeVm5tbI0UBAAAAAICGweNg4uLZEp988omKiop+0sVnzJihm266SeHh4YqOjtaAAQO0Z88etz7FxcVKS0tTkyZNFBYWpkGDBrkFIjt27NBDDz2k+Ph4WSwWtW/fXnPnzq1wrfXr16tz584KCgpSUlKSFi1adMX6DMPQ1KlTFRcXJ4vFol69emnfvn2u9kOHDmnkyJFKTEyUxWJRmzZtNG3aNJWWll5x7CvVU15erilTpriN/cc//pFZKwAAAACAesWn24Vu2LBBaWlp+vLLL7V69Wo5HA717t3bLfAYO3asPvroIy1btkwbNmzQ8ePHNXDgQFf7tm3bFB0drcWLFysrK0uTJk3SxIkTNX/+fFefgwcPqm/fvkpJSdH27duVnp6uUaNGadWqVZetb+bMmXrllVe0YMECbdmyRaGhoUpNTVVxcbEkaffu3XI6nXr99deVlZWlOXPmaMGCBXrmmWcuO64n9fz5z3/Wa6+9pvnz52vXrl3685//rJkzZ2revHlefccAAAAAANRlHi9+aTablZOTo6ioKElSeHi4vvnmGyUmJlZbMSdOnFB0dLQ2bNig22+/XTabTVFRUVqyZInuu+8+SefDgPbt22vz5s26+eabKx0nLS1Nu3bt0tq1ayVJTz/9tFasWKHMzExXn8GDBys/P1+ffvpppWMYhqFmzZpp3LhxGj9+vCTJZrMpJiZGixYt0uDBgys9b9asWXrttdd04MCBS35OT+q5++67FRMTo//7v/9z9Rk0aJAsFosWL158ybEvYPFLAAAAAEBtqup9qEfbhUrnb9SHDx+uoKAgSecfsRgzZoxCQ0Pd+i1fvtzji1/MZrNJkiIjIyWdnw3hcDjUq1cvV5927dopISHhssGEzWZzjSGd3+L0x2NIUmpqqtLT0y9Zy8GDB5WTk+N2XkREhLp27arNmzdfMpi4+NqV8aSeW265RW+88Yb27t2r5ORk7dixQ//+97/10ksvVTpmSUmJSkpKXO/tdvtlawAAAAAAoC7wOJgYNmyY2/tf//rX1VqI0+lUenq6br31Vl133XWSpJycHAUGBqpRo0ZufWNiYpSTk1PpOF988YXee+89rVixwnUsJydHMTExFcaw2+06d+6cLBZLhXEujF/ZeZe69v79+zVv3jzNnj37sp/Vk3p+//vfy263q127djKbzSovL9fzzz+vIUOGVDrmjBkz9Nxzz132ugAAAAAA1DUeBxMLFy6syTqUlpamzMxM/fvf/67yGJmZmerfv7+mTZum3r17e3zeO++8o0cffdT1fuXKlTKbzV5d+9ixY+rTp4/uv/9+PfLII67jYWFhrp9//etfa8GCBR6Nt3TpUr3zzjtasmSJrr32WtdaFM2aNasQEknSxIkTlZGR4Xpvt9sVHx/v1WcAAAAAAKC2eRxM1KQnnnhCH3/8sTZu3KgWLVq4jsfGxqq0tFT5+flusyZyc3MVGxvrNsbOnTvVs2dPjR49WpMnT3Zri42NrbC1aW5urqxWqywWi+655x517drV1da8eXNlZ2e7+sXFxbmd16lTJ7exjh8/rpSUFNfjFz+2fft2188XnrG5Uj2S9NRTT+n3v/+965GRjh076vDhw5oxY0alwURQUJDrMRsAAAAAAK4WPg0mDMPQk08+qffff1/r16+vsJBmly5dFBAQoDVr1mjQoEGSpD179ujIkSPq1q2bq19WVpZ69OihYcOG6fnnn69wnW7duumTTz5xO7Z69WrXGOHh4QoPD3drT0xMVGxsrNasWeMKIux2u7Zs2aLHHnvM1e/YsWNKSUlRly5dtHDhQvn5uW90kpSU5HU9knT27NkKY5nNZjmdzgrjAQAAAABwtfJpMJGWlqYlS5bogw8+UHh4uGvthoiICFksFkVERGjkyJHKyMhQZGSkrFarnnzySXXr1s218GVmZqZ69Oih1NRUZWRkuMYwm82uHUTGjBmj+fPna8KECRoxYoTWrl2rpUuXuq1DcTGTyaT09HRNnz5dbdu2VWJioqZMmaJmzZppwIABks6HEt27d1fLli01e/ZsnThxwnX+xTM6fsyTevr166fnn39eCQkJuvbaa/X111/rpZde0ogRI6r4bQMAAAAAUAcZPiSp0tfChQtdfc6dO2c8/vjjRuPGjY2QkBDj3nvvNbKzs13t06ZNq3SMli1bul1r3bp1RqdOnYzAwECjdevWbte4FKfTaUyZMsWIiYkxgoKCjJ49exp79uxxtS9cuPCSn+FKrlSP3W43fve73xkJCQlGcHCw0bp1a2PSpElGSUnJFcc2DMOw2WyGJMNms3nUHwAAAACAn6Kq96EmwzCMWs5CUAuqun8sAAAAAABVUdX7UL8rdwEAAAAAAKgZBBMAAAAAAMBnCCYAAAAAAIDPEEwAAAAAAACfIZgAAAAAAAA+QzABAAAAAAB8hmACAAAAAAD4DMEEAAAAAADwGYIJAAAAAADgMwQTAAAAAADAZwgmAAAAAACAzxBMAAAAAAAAnyGYAAAAAAAAPkMwAQAAAAAAfIZgAgAAAAAA+AzBBAAAAAAA8BmCCQAAAAAA4DMEEwAAAAAAwGcIJgAAAAAAgM8QTAAAAAAAAJ8hmAAAAAAAAD5DMAEAAAAAAHyGYAIAAAAAAPgMwQQAAAAAAPAZggkAAAAAAOAzBBMAAAAAAMBnCCYAAAAAAIDPEEwAAAAAAACfIZgAAAAAAAA+QzABAAAAAAB8hmACAAAAAAD4DMEEUEXDhw/XgAEDvDqnVatWevnll2uoIgAAAAC4+hBMoEEaPny4TCaT69WkSRP16dNH33zzTY1ed+vWrRo9enSNXgMAAAAAriYEE2iw+vTpo+zsbGVnZ2vNmjXy9/fX3XffXaPXjIqKUkhISI1eAwAAAACuJgQTaLCCgoIUGxur2NhYderUSb///e919OhRnThxQpJ09OhRPfDAA2rUqJEiIyPVv39/HTp06JLjFRQUaMiQIQoNDVVcXJzmzJmj7t27Kz093dXnx49yHDp0SCaTSdu3b3e15+fny2Qyaf369ZKk9evXy2QyadWqVbrxxhtlsVjUo0cP5eXlaeXKlWrfvr2sVqt+9atf6ezZs9X/JQEAAABADSOYACQVFhZq8eLFSkpKUpMmTeRwOJSamqrw8HBt2rRJn3/+ucLCwtSnTx+VlpZWOkZGRoY+//xzffjhh1q9erU2bdqk//73v9VS37PPPqv58+friy++cAUmL7/8spYsWaIVK1bos88+07x586rlWgAAAABQm/x9XQDgKx9//LHCwsIkSUVFRYqLi9PHH38sPz8/LVmyRE6nU2+99ZZMJpMkaeHChWrUqJHWr1+v3r17u41VUFCgv/71r1qyZIl69uzp6t+sWbNqqXX69Om69dZbJUkjR47UxIkT9d1336l169aSpPvuu0/r1q3T008/XS3XAwAAAIDawowJNFgpKSnavn27tm/frv/85z9KTU3VnXfeqcOHD2vHjh3av3+/wsPDFRYWprCwMEVGRqq4uFjfffddhbEOHDggh8Ohn//8565jERERuuaaa6ql1uuvv971c0xMjEJCQlyhxIVjeXl51XItAAAAAKhNzJhAgxUaGqqkpCTX+7feeksRERF68803VVhYqC5duuidd96pcF5UVFS1XN/P73wuaBiG65jD4ai0b0BAgOtnk8nk9v7CMafTWS11AQAAAEBtYsYE8AOTySQ/Pz+dO3dOnTt31r59+xQdHa2kpCS3V0RERIVzW7durYCAAG3dutV1zGazae/evZe83oWAIzs723XsxwthAgAAAEBDQDCBBqOopEzbDp/R/rwCSVJJSYlycnKUk5OjXbt26cknn1RhYaH69eunIUOGqGnTpurfv782bdqkgwcPav369frtb3+r77//vsLY4eHhGjZsmJ566imtW7dOWVlZGjlypPz8/FxrVFzMYrHo5ptv1gsvvKBdu3Zpw4YNmjx5co1+BwAAAABQ1xBMoMHIKyjRzmy79uYWqtxp6NNPP1VcXJzi4uLUtWtXbd26VcuWLVP37t0VEhKijRs3KiEhQQMHDlT79u01cuRIFRcXy2q1Vjr+Sy+9pG7duunuu+9Wr169dOutt6p9+/YKDg6+ZE1vv/22ysrK1KVLF6Wnp2v69Ok19fEBAAAAoE4yGT9+wB31ht1uV0REhGw22yVvpBuagmKHdmXbFREcoOTY8EvOZKguRUVFat68uV588UWNHDmyRq8FAAAAAL5W1ftQFr9EgxEeHKCfJzapsfG//vpr7d69Wz//+c9ls9n0hz/8QZLUv3//GrsmAAAAAFztCCaAajR79mzt2bNHgYGB6tKlizZt2qSmTZv6uiwAAAAAqLMIJoBqcuONN2rbtm2+LgMAAAAAriosfgkAAAAAAHyGYAIAAAAAAPgMwQQAAAAAAPAZggkAAAAAAOAzBBMAAAAAAMBn2JUDAAAAAIA6oqzcqaLScpU4ylVuGHIaktNpSJL8/EzyM0lmk0lBAWaFBprlb7765xsQTAAAAAAA4AMXQoiikjIVlZTpdFGpzpwtVbGjXCVlTplkkqQf/isZrv8bCvL3U3CAWY1DAhUZGqjQIP/zr6swrCCYAAAAAACgljidhk4Wlej4mXPKsRf/L4QwmRTg56fgAD9FWAIV5O8nk8lU6RiGYaikzKliR7mO5xfr8OmzMoz/hRWx1mA1a2xR09Ag+flVPkZdQjABAEA1Kikrl9MpOY3zv9Mw+5kUYPaT+Sr4RwEAAKg5xY5y5drPhwgnCkpUXu5UeHDAFUOIyphMJgUHmBUcYHYd+3FYsTe3QPtPFCnaGqSWkSGKsQa79a1rCCYAAKiicqch2zmHikrKZD/n0KmiEhWVlKvMacgwDMkk+cmkALNJjUP/N80yLMhf4cEBvi4fAADUMMMwlH/WoeP553TkzFnZzzkU5G9W09AgBfpX7+MWPw4rGoUEqrTMqdOFpTp+5pwiQgKU0DhEzRpZ1CgkwKsQpDYQTAAA4KWzpWXKtZfo0KkinSkqlaP8/PTLYH8/BQWYFeLvJ5NMMmTIMCRHuVPfnzmnQyeLZMgkS6CfmkVY1KJxiJqGBV51z4ECAIArO1darr25dh06eVbFZeWyBgeoRaOQWnu0ItDfTzHWYDkNQ/ZzDmUet2l/XqFaNQ1RcoxVlsC6M4OCYAIAAA8VFDt06ORZHf3hNx6WALOaePgbj0Y//N8wDBWVluvgySIdOlWkJqFBSmwaqhaNLQQUAADUA4ZhKMderJ3H7corKFFUWJCircE+q8fPZFKjkEA1CglUUUmZdmYX6GRhqdrHWRUXEVwnZk8QTAAAcAVOp6GjZ85qZ7ZdtnMONbIEKj4yRH5V+IvcZDIp7IfHORzlTp0pKtWWg6d0PD9EHZpZ1SgksAY+AQAAqA0XZknsP1EkP5mU0Lj2Zkh4IjTIX5YAs/IKSrT5wCklRYXWidkTBBMAAFxGQbFDu3POT8O0BJiV0Dik2n6zEGD2U7Q1+IdHPc7qdFGp2sWGq1XTUGZPAABwFTEMQ9m2Yu3KPj9LIjo8SCGBdfN228/PpNiIYJ0trTuzJ+rmNwUAQB1woqBEXx85ozNnSxVjDVaQf838NiHA7Kf4yFCdOVuqbUfO6Mw5h65vEVFj1wMAANWn3Glod7Zdu3ML6uQsiUsJCfRXQuP/zZ5oFxOudnFWn+wkRjABAEAlcu3F2nb4jM45yhVfjbMkLqdxSKBCAs36Lq9Q5U5DneIb1emtvQAAaOjKyp3amW3Xrmy7IkOCFBZ8dd1iX5g9UVhcpszjNpUbhjrEWWt95ubV9a0BAFALThSU6L+Hz6jYUa44a+1OawzyN6tZI4sOnSqSn0nqFN+42rcTAwAAP52j3Klvv7dpb26BourwoxueCAv2l5+ftPO4XWXlhjq2iFBALYYT/EsHAIAfKSh26OsjZ3TWUa7YWg4lLggw+ynOatGBk0XaedwmwzBqvQYAAHBpZeVOZR47H0rEWIOv6lDigpBAf8VYg7U3t0CZx2wqK3fW2rUJJgAA+IHTaWh3ToFOny2t9ZkSFwv091NUWJC+O1Go47Zin9UBAADcOZ2GdmUXuEKJ+vTYZXCAWTHWYO3JKdCu7AI5nbXzyxGCCQAAfvD9mXM6dLJIMeF1Y0/vkEB/mf38tOu4XWdLy3xdDgAAkH64aberaVhQvQolLggOMCsqPEg7s23ak1NQK9ckmAAAQOcf4diZbVdwgLlO/SMjKjxIJwuLtTengEc6AADwsTx7sfbk2hURElAvHt+4lJBAfzUOCdTuHLvy7DU/c5NgAgAASUdOnVX+2VI1CQ30dSlu/EwmRYUH6+DJIp0uKvV1OQAANFglZeXamW2X05CswQG+LqfGhQcHyJCUlW1XsaO8Rq9FMAEAaPCKHeU6cvqsIiwBdeIRjouFBPrL4TSUfZm1Jp599ll16tSpFqsCAKBh2Z9bqBx7saLDg31dSq2JDg9Wjq1Y3+UV1uh1CCYAAA1err1Y9uIyWS0189uPM6dOatbU8br3tht0R/tmuvvmDkoffr++2bbF4zGswQE6cvqszpVW/huL8ePHa82aNdVVMgAA+JE8e7H25RWoSUiQzH5175cYNcXsZ1LT0CDtzS2o0Uc66u9DMQAAeKDcaejQySIF+fvJr4ZmSzzzxHCVlTo0eeZ8NUtopTMnT+irLzbKdua0x2OEB/vr+zNnlWsvVqumoRXaw8LCFBYWVp1lAwAAnZ9ZmZVtV7khhQU3vFvosGB/FZQ4lJVtl9USUCNrcTFjAgDQoNnPOXS6qFSNQmpmtkSB3aYdW7/U4xOmqku32xTXPF4dbuisoY+l67Zed0qSbklqquXvvK2MEQ+q+7UtdF9KF61d+aHbOK/N+oPG3tdd7eKj1Lp1a02ZMkUOh8PVfvGjHMOHD9eAAQM0e/ZsxcXFqUmTJkpLS3M7BwAAXNl3eYXKtRcrpgE9wnGxmn6kg2ACANCgFZWWqbTcqSD/mtmJwxISqpDQUG1c/YlKS0ou2e/Nl19Q99S79beP16v3PfdpWvojOrR/r6s9JDRMTz0/VwveX6/ZL83Rm2++qTlz5lz22uvWrdN3332ndevW6a9//asWLVqkRYsWVddHAwCg3issKdPBk0VqbAlsUI9wXMzsZ1JkSKAOnixSYUn1b2FOMAEAaNAKi6v/L9cf8/f316Q/z9cn77+n3p3b6NEH7tKC2dO1f3eWW78ed96jex78jRISkzR67ES1u66Tlv3tTVf7w2nj1PmmmxUR3Uwpv7xT48eP19KlSy977caNG2v+/Plq166d7r77bvXt25d1KAAA8EKuvViFJWUKb4CPcFwsPNhfhSVlyrnMYtxV5dNgYsaMGbrpppsUHh6u6OhoDRgwQHv27HHrU1xcrLS0NDVp0kRhYWEaNGiQcnNzXe07duzQQw89pPj4eFksFrVv315z586tcK3169erc+fOCgoKUlJSkke/MTIMQ1OnTlVcXJwsFot69eqlffv2udoPHTqkkSNHKjExURaLRW3atNG0adNUWnrl7dyuVM/GjRvVr18/NWvWTCaTSf/v//2/K44JAPDeqaLSGpstcUFKn3768ItMzXx9sW6+vYf+u+VzPdy/h1b88x+uPtfdeJPbOdfd+DMd/u5/Myb+teJ9Pfmru/XwLzurRUykJk+erCNHjlz2utdee63M5v99tri4OOXl5VXTpwIAoH4rK3fq0MkihQT618ldu2qbyWRSaJC/Dp8qkqPcWa1j+zSY2LBhg9LS0vTll19q9erVcjgc6t27t4qKilx9xo4dq48++kjLli3Thg0bdPz4cQ0cONDVvm3bNkVHR2vx4sXKysrSpEmTNHHiRM2fP9/V5+DBg+rbt69SUlK0fft2paena9SoUVq1atVl65s5c6ZeeeUVLViwQFu2bFFoaKhSU1NVXHw+Idr9/9m78/imqvSP458kbZYm3ekGFCiCUvZFlEUBGbSobIobjiKbiCyKyOBPFgFHZVxxHxSURUVExQ1FrEhRkEUYqmyiIFhEWpDSli5p0iS/PzpkiC1QsNAC3/frlXn13nvuOU+uMDRPznnOjz/i9Xp55ZVX2LJlC9OnT2fGjBmMHz/+uP1WJJ6CggJatGjBSy+9dFLPVEREKq7E4+Ww031aijj9mcVi5ZLLujBw5FhefXcJ11x/C7Oee7xC9276z3dMHTOM9l26Me6Z1/kg9RsmTJhwwkR4cHBg3QyDwYDXW7m/SIiIiJyrDuQXk13gIvI01aE6G0XYgjlYUMyBw8dennoqqnQ+yueffx5wPGfOHGJjY9mwYQOdOnUiNzeX1157jfnz59O1a1cAZs+eTXJyMmvWrKFdu3YMGjQooI/69euzevVqFi1axMiRIwGYMWMGSUlJPP300wAkJyezcuVKpk+fTkpKSrmx+Xw+nn32WSZOnEjv3r0BmDdvHnFxcXz44YfccsstdO/ene7duweMvX37dv7973/z1FNPHfN9VySeq6++mquvvrrCz1JERE5eideHxwtm05n/FqReg4v4+ssl/uPNG9dz9XU3/+84fQMXNm4GwKb/rCOuZiIDho9hT3YhtWqG8euvv57xmEVERM4XPp+PjIOFGAwQZFIFhCOCTEZMBiN7sgtJCLdW2kySavWEc3NzAYiKigJKZ0O43W66devmb9OoUSPq1KnD6tWrj9vPkT4AVq9eHdAHQEpKynH72LVrF5mZmQH3hYeHc+mll57U2OU5lXhOpLi4mLy8vICXiIgcn89X+ovH6ZydmXsom5G39eHzDxey48ct/L7nV7767CPemvkil//tfwno5Us+ZvG7b5Gxaweznv0X2374DzfcPgSAxHr1ydr3G6mLF7F/76/MmzWDDz744PQFLSIicp7LKyohK89JZIi5qkOpdiJCgsnMdZJbVHk7fVWbCh5er5fRo0fTsWNHmjZtCkBmZiZms5mIiIiAtnFxcWRmZpbbz7fffss777zDp59+6j+XmZlJXFxcmT7y8vIoKirCZrOV6edI/+Xdd6yxd+zYwQsvvHDc2RKnGs+JTJs2jalTp570fSIi5zODofTlO41j2ELsNGnRhndmz2Bvxm5KSkqITahJr5tv5467R/vbDb73AVIXf8BTk8cRHRvH1OmvktTwIgAu73Y1twwcxjNT/4/i4mK6dEth0qRJTJky5TRGLiIicv7KzCuiyO0h5jzeIvRYQsxB/JFfzL4cJxGVlLipNomJESNGsHnzZlauXHnKfWzevJnevXszefJkrrrqqgrf99Zbb3HXXXf5j5csWRJQLKwi9u7dS/fu3bnxxhu58847/ecdDof/59tuu40ZM2acVL8V9eCDDzJmzBj/cV5eHomJiadlLBGRc0VpYsKAz3f6UhNmi4W7/zGJu/8x6bjtasTF89zc9455fcQDUxjxwBT2ZBfSrHYYyQnhjB79v8TGlClTAhIV5RV5fvbZZ086fhERkfNRVp4T2xmoQXW2sgWbOJDvJJmwSumvWiQmRo4cyeLFi/n666+pXbu2/3x8fDwul4ucnJyAWRNZWVnEx8cH9LF161b+9re/MXToUCZOnBhwLT4+PmAnjyN9hIWFYbPZ6NWrF5deeqn/Wq1atdi3b5+/XUJCQsB9LVu2DOjr999/54orrqBDhw68+uqrAdfS09P9P4eFhVUonlNhsViwWCyndK+cWbt37yYpKYmNGzeW+bMkImdWsNGIJciI0332FIT04Tvtu4iIiIicz5xuD/nFnjNSHLsqPDJuJIfzcnl8xhun3Ic12MRhpwenu3KeU5XWmPD5fIwcOZIPPviAr776iqSkpIDrbdq0ITg4OGDP9e3bt5ORkUH79u3957Zs2cIVV1zBHXfcwaOPPlpmnPbt25fZtz01NdXfR2hoKA0aNPC/bDYbSUlJxMfHB9yXl5fH2rVrA8beu3cvXbp0oU2bNsyePRujMfCRHt1vbGxsheKR0+vAgQPcfffd1KlTB4vFQnx8PCkpKaxataqqQxORM8xoNBAZYqbI7anqUCrE6/Vh/O9WXSIiInJ6FBSXUOT2VMsZE4+MG8kDw24HYMStvXj2kQlVEoct2ISzxENBcUml9Felv9mMGDGC+fPn89FHHxEaGuqv3RAeHo7NZiM8PJzBgwczZswYoqKiCAsLY9SoUbRv35527doBpcs3unbtSkpKCmPGjPH3YTKZiImJAWDYsGG8+OKLjBs3jkGDBvHVV1+xcOHCgDoUf2YwGBg9ejSPPPIIDRs2JCkpiUmTJlGzZk369OkD/C8pUbduXZ566ikOHDjgv//PMzqOVpF48vPz2bFjh/94165dpKenExUVRZ06dU72UctR+vbti8vlYu7cudSvX5+srCyWLVvGwYMHqzo0EakCESHBeKp4C81vd/xRoXbOEg+WYKMSEyIiIqdRQbGHEo9Xu3EcR5DJiMfjpaDYQ7TjxO1PpEqf9L///W9yc3Pp0qULCQkJ/tc777zjbzN9+nR69OhB37596dSpE/Hx8SxatMh//b333uPAgQO8+eabAX20bdvW3yYpKYlPP/2U1NRUWrRowdNPP82sWbOOuVXoEePGjWPUqFEMHTqUtm3bkp+fz+eff47VWloAJTU1lR07drBs2TJq164dMP7xVCSe9evX06pVK1q1agXAmDFjaNWqFQ899FDFH7CUkZOTwzfffMPjjz/OFVdcQd26dbnkkkt48MEH6dWrF2PHjqVHjx7+9s8++ywGgyFga9sGDRowa9Ys//GsWbNITk7GarXSqFEjXn755YAx161bR6tWrbBarVxjeJSVAAAgAElEQVR88cVs3LixTFybN2/m6quvxuFwEBcXx+23384ff/zvg0qXLl245557GDduHFFRUcTHx6vonUglcViCMBoMeL2nswRm5XC6vdjNQYRUw29wREREzhWHne5K2wbzdHlk3Eg2rvuWhXNeoUODGnRoUIN9v2Xg8Xh47P/upW+X1nRpUptbrryUd+a8csx+lnzwDt0vboiruDjg/APDbmfq/XcfPwiDgcPOytmZo8qXcpT3GjBggL+N1WrlpZdeIjs7m4KCAhYtWhQwG2HKlCnl9rF79+6Asbp06cLGjRspLi5m586dAWMci8Fg4OGHHyYzMxOn08mXX37JhRde6L8+YMCAY76HEzlRPF26dCm33/KKmUnFORwOHA4HH374IcV/+ssH0LlzZ1auXInHUzqte8WKFdSoUYO0tDSgdJbMzp076dKlC1BaOPWhhx7i0UcfZdu2bTz22GNMmjSJuXPnAqUzX3r06EHjxo3ZsGEDU6ZMYezYsQFj5uTk0LVrV1q1asX69ev5/PPPycrK4qabbgpoN3fuXOx2O2vXruWJJ57g4YcfJjU1tZKfkMj5x2ENIsQcRH4lTUU8nQqK3dRwmDEaq/cvSyIiImezgwXFWIOq92yJ0ZMeo2mrtvS6+XY+Wb2FT1ZvITahFj6vl9j4BB554TXmf76KgSPH8srTj7Ls0w/L7afr1b3wejysXPa/L2KzDx7g27RUetxw63FjsAYZOVhQ9jPVqajeT1ukkgUFBTFnzhzmzp1LREQEHTt2ZPz48fzwww8AXH755Rw+fJiNGzfi8/n4+uuvuf/++/2JibS0NGrVqkWDBg0AmDx5Mk8//TTXX389SUlJXH/99dx333288kppVnL+/Pl4vV5ee+01mjRpQo8ePfjHP/4RENOLL75Iq1ateOyxx2jUqBGtWrXi9ddfZ/ny5fz000/+ds2bN2fy5Mk0bNiQ/v37c/HFF5epVSIiJy/EHETtSCu5lZTxP11cJV6MRiMJEadWJFlERERO7GwpfOkIDSM4OBirLYTomDiiY+IwmUwEBQczZPT/kdysFTUT65LS+0au7duPZZ99VG4/FquNK3v25dP33/afW/rhu8Ql1KZ1u8uOG8PRBTD/KiUm5LzTt29ffv/9dz7++GO6d+9OWloarVu3Zs6cOURERNCiRQvS0tLYtGkTZrOZoUOHsnHjRvLz81mxYgWdO3cGoKCggJ07dzJ48GD/TAyHw8EjjzzCzp07Adi2bRvNmzf3L/8ByhQ5/f7771m+fHlAH40aNQLw9wOliYmjJSQksH///tPyjETONzUjQjAZDRSXVN8imIcKXcSGWahh1w5MIiIip0txiRdXiQdzNZ8xcTzvv/EaA3t35Zq2F/G35nX56J15ZO377Zjte918O+tWLudAZunOlJ8tWsC1fW854XIWc5ARt8dDcclfr9Wl6llyXrJarVx55ZVceeWVTJo0iSFDhjB58mQGDBhAly5dSEtLw2Kx0LlzZ6KiokhOTmblypWsWLGC+++/HyhdpgEwc+bMgO1mobT4akXl5+fTs2dPHn/88TLXjq5XEhwcHHDNYDDgreKCfSLnimi7mdhQC38cdhEfXv2+IfH6fDjdHupGhWgZh4iIyGlUuoQeDJyd/96mLl7EC/+azKgHH6Zpq4ux2x28NetFtn7/n2Pec1GT5jRo1IQlH7zDJZd3YdfPP3LNzLeP2f4Io8GAz0eFShmciBITcl7Ic7rJLXQTF2YtN/vZuHFjPvywdN1V586def311wkKCqJ79+5Aac2Pt99+m59++slfXyIuLo6aNWvyyy+/8Pe//73ccZOTk3njjTdwOp3+WRNr1qwJaNO6dWvef/996tWrR1CQ/kqKVAWj0UC9aDuZuU6KSzxYgqpXcuKPw8VEO8zEhVlP3FhEREROmdcHXsB4FkyYCAo24/UEzvbctGEdzVq3pe9tg/zn9mbsPmFfPW+6jYVzXuFA1j4u7tCZuJq1TniPwQAeX+kz+6vOgsct8tf4fD42ZuTwzc8H2PhTBl27duXNN9/khx9+YNeuXbz77rs88cQT9O7dG4BOnTpx+PBhFi9e7E9CdOnShbfeeouEhISAAqhTp05l2rRpPP/88/z0009s2rSJ2bNn88wzzwBw6623YjAYuPPOO9m6dSufffYZTz31VEB8I0aMIDs7m379+vHdd9+xc+dOli5dysCBA/1FOEXk9KsVYaNejRCy8pyVkvmvLEUuD26vl+SEsGq/3lVERORs5/3vpgNnw4yJhNqJbPl+A/t+yyAn+yBer5fa9erz46Z01nz9FRm7dvDq9Gls+6HsroB/dlWvG9ifuY+P33mDHjcev+jlEUYM4PPhrYTfm5SYkHOewWDAbjYRbgsmOjKcSy+9lOnTp9OpUyeaNm3KpEmTuPPOO3nxxRcBiIyMpFmzZsTExPhrPXTq1Amv1+uvL3HEkCFDmDVrFrNnz6ZZs2Z07tyZOXPmkJSUBJTuAvLJJ5+wadMmWrVqxYQJE8os2ahZsyarVq3C4/Fw1VVX0axZM0aPHk1ERATGsyFVK3KOMBoNXBQfRpgtmOwCV1WHA5T+cnQg38kFMQ5qqeiliIjIaVf90xH/c+uQERiNJm7t3pFrLrmIrN9/o88td9A5pQcP3TuEO/umkHsom+v/PuiEfTlCw+iS0gOb3U6nbtdUaHwfVNoDM/iq09dCUmny8vIIDw8nNzeXsLCwqg6nynm8peuzQ8ymar8nsYhUrV8PFrBuVzbRdgs2c9XOUMjMdeKwmuhwQQ3sFi31EhEROd0OFbj4avt+Yh0Wgkzn15eEo26/jqSGjRjz0LQKtS/xeNmfX8wVF8USZTcDp/459Px60nLeMhkN2C1BSkqIyAklRobQMNbB/nxnle7S8Ud+MSYTNKsVoaSEiIjIGWI0GDBSOXUTzhZ5uTms+OJTNq5dFVCb4kS8vtKEgqkSPmPpNx0REZGjGI0GmtQKp8TrY8f+fOLCrGe8tsMf+cV4vT5a140kPlwFL0VERM4Uo7H0S83KqJtwthjQ6woO5+YwfNxD1K3fsML3eX0+jAYDhkqY7qDEhIiIyJ8Em4w0rx2B0WDgp6zDRNstOKyn/59Mr8/H/rzSmRKt60aSGBVy2scUERGR/7EFmzAHGXG6PedN0elFK05cHLM8TrcHS7CRkEp4TlrKISIiUg5zkJHmtcNpWiuc/GI3v+cU4TmN8zoLXSXsyS4g1BrEJfWilZQQERGpAkEmIxE2M063t6pDqfacbi8RNnOl1OLQjAkREZFjCDIZaVornJhQC1t/z2PPoUKi7WZCrcGVNobX6+NAfjEer5dG8WFcGB9KiFn/PIuIiFSVaIeZjEOFVR1GtefyeIlymCulL/3mIyIicgJxYVbCbcHs2J/PLwfyOVTowmEJJsIWjNF4agWfnG4PhwpduD1eou0WkmuGUTPcqiK9IiIiVSzEEoTP58Pn8+nf5WMo3dzTV2kFupWYEBERqQBrsImmtcKpHWkjM9fJr9mF7DlUSLDJSIjZhDXYhCXIeMxfYDxeH0VuD8VuD4UuD0EmAzGhFupF24kJtZw361hFRESqO4c5CEuQEZfHiyVI/z6Xp/TZGHFU0ixPJSZEREROQkSImYgQM0kxdvbnFbMvt4hDBS7yitw4S7wY8OEDDBgorUhR+r8mgwFrsAm7JYh6NezEh1uJCjGf8owLEREROT3sltIvHIpcHiUmjqHI5fnv7zWV83yUmBARETkFliATiVEhJEaFUOLxUuDyUFBcQqHLg8frw+P1YjQaMBoMmE1GQiwmHJYgbMEmTQsVERGpxoJMRiJDzOzNKSKiqoOpporcHmqG2yql8CUoMSEiIvKXBZmMhNuMhNsqryimiIiIVJ24MCu7Dxbi9fkw6guFAF6fD5fHS3y4tdL61HahIiIiIiIiIkeJC7MSZg0ir8hd1aFUO3lFbsKtwcSFKTEhIiIiIiIiclrYzKVLNvOcSkz8WZ7TTWJUCDZz5dXfUGJCziqXXXYZY8eOrXD7HTt2YDAY2Lx582mMqlTt2rV58cUXT/s4IiIiIiJy+tWMsGE2GSlyeao6lGrD6fZgNhmpGWGr1H5VY0KqnQEDBjB37twy53/++Wc+/vhjgoO1hltERERERE6vyJDS5Qr7cp3YzJX7QfxslV3gIiHcSmRI5X4mU2JCqqXu3bsze/bsgHMxMTGYTNquR0RERERETj+DwUCd6BD2HCrC4/VhOs+3+C7ddcxHneiQSt9hTEs5pFqyWCzEx8cHvEwmU5mlHLVr1+bxxx9nwIABhIaGUrduXV577bVj9ut2uxk0aBD16tXDZrNx0UUX8cILLwS0ue2227jhhht4/PHHiY+Pp0aNGtxzzz2UlJT422RmZtKjRw9sNhv169dnwYIFlf8QRERERESkSsWGWomyB5NdUFzVoVS57IJiIu3BxIZWXtHLI5SYkLPek08+Sbt27di4cSNDhw7lrrvuYseOHeW29Xg81KlTh/fee4+tW7cyceJEHnjgARYtWhTQLjU1lT179pCWlsbrr7/OzJkzeeONN/zX+/fvz++//86KFSt45513eO655zh48OBpfZ8iIiIiInJmmYOMNIoPw+Xx4nSfv7UmnG4PxSVeGsWHYQ6q/DSCEhNSLS1evBiHw+F/3Xjjjcds27NnT4YNG0aDBg0YP348ERERpKWlldvWarUyZcoULr74YpKSkrj99tvp378/CxcuDGhXo0YNnn/+eRo1akSvXr24+uqrWbZsGQBbt24lNTWV1157jUsuuYS2bdsyc+ZMnE5npb1/ERERERGpHmpF2KgXbScrz4nP56vqcM44n89HVp6TpBp2alVy0csjqjQxMW3aNNq2bUtoaCixsbH06dOH7du3B7RxOp2MGDGC6OhoHA4Hffv2JSsry3/9+++/p1+/fiQmJmKz2UhOTua5554rM1ZaWhqtW7fGYrHQoEED5syZc8L4fD4fDz30EAkJCdhsNrp168bPP/8c0ObRRx+lQ4cOhISEEBERUaH3nZaWRu/evUlISMBut9OyZUveeuutgDZut5uHH36YCy64AKvVSosWLfj8888r1P+54IorriA9Pd3/ev7554/Ztnnz5v6fDQYDcXFx7N+//5jtX3jhBdq0aUONGjVwOBy8/vrrZGRkBLRp2rQpRuP//nokJCT4+9y2bRsWi4WWLVsGtA8NDT3p9ykiIiIiItWb0WigUUIYkXYzf+Sff0s6/sh3EWk30yghDONpqrNRpYmJFStWMGLECNasWUNqaiput5urrrqKgoICf5v77ruPTz75hHfffZcVK1bw+++/c/311/uvb9iwgdjYWN588022bNnChAkTePDBBwO2bdy1axfXXnut/8Pu6NGjGTJkCEuXLj1ufE888QTPP/88M2bMYO3atdjtdlJSUgK+GXe5XNx4443cfffdFX7f3377Lc2bN+f999/nhx9+YODAgfTv35/Fixf720ycOJFXXnmFF154ga1btzJs2DCuu+46Nm7cWOFxzhZer49fDxawbtdBNvx6CKfbg91up0GDBv5XQkLCMe//8y4dBoMBr9dbbts333yTBx54gDvvvJPU1FTS09Pp378/LpfrlPsUEREREZFzm8MSROOE829JR+kSDg+NE8JwWE7f3hlVuivHn2cAzJkzh9jYWDZs2ECnTp3Izc3ltddeY/78+XTt2hWA2bNnk5yczJo1a2jXrh2DBg0K6KN+/fqsXr2aRYsWMXLkSABmzJhBUlISTz/9NADJycmsXLmS6dOnk5KSUm5sPp+PZ599lokTJ9K7d28A5s2bR1xcHB9++CG33HILAFOnTvXHXlHjx48POL733nv54osvWLRoET169ADgjTfeYMKECVxzzTUA3H333Xz55Zc8/fTTvPnmmxUe62zwyx/5pO/JwWgwUOL18Ue+C/NpSgKsWrWKyy+/nGHDhvnPHasexbEkJydTXFxMeno6rVq1AmDLli0cPny4UmMVEREREZHq48iSjh3786kTVfk7U1Q3R5ZwNIh1nLYlHEdUqxoTubm5AERFRQGlsyHcbjfdunXzt2nUqBF16tRh9erVx+3nSB8Aq1evDugDICUl5bh97Nq1i8zMzID7wsPDufTSS49736n6c8zFxcVYrYHVTm02GytXriz3/uLiYvLy8gJeZwOfz0fGwULMJhMJ4TYSI0Nwe7y43KcnMdGwYUPWrl1LamoqP/30E+PHjz/pWSiNGzemW7du3HnnnXz33XesX7+eoUOHlvnvJSIiIiIi544jSzqi7Gay8s79JR37Dxef9iUcR1SbxITX62X06NF07NiRpk2bAqVbMprN5jK1G+Li4sjMzCy3n2+//ZZ33nmHoUOH+s9lZmYSFxdXpo+8vDyKiorK7edI/+Xdd6yxT9XChQv57rvvGDhwoP9cSkoKzzzzDD///DNer5fU1FQWLVrEvn37yu1j2rRphIeH+1+JiYmVGuPpdCbLxwwfPpxevXpx44030q5dO/Ly8rjrrrtOup958+YRGxvL5Zdfzg033OCvgyIiIiIiIucuhyWI5rUjCDIZOHD43E1O/JFfjNEILWpHnNYlHEdU6VKOo40YMYLNmzcfc0ZARWzevJnevXszefJkrrrqqgrf99ZbbwV8OF2yZAkmk+mU4zhakyZN+PXXXwG4/PLLWbJkScD15cuXM3DgQGbOnEmTJk3855977jnuvPNOGjVqhMFg4IILLmDgwIG8/vrr5Y7z4IMPMmbMGP9xXl7eWZGcMBgM1I0OIT0jh325RZR4fTww7Xna1S//Q/6f/3z89ttvZdps3rzZ/3ODBg0CKudarVbmzZt33JjKWypzdM0SKC2G+dlnnwWcu/XWW4/br4iIiIiInP3iw620qhPBht2HyC5wEWU3V3VIlSq7wIXH66VN3Sjiw8/MrPBqkZgYOXIkixcv5uuvv6Z27dr+8/Hx8bhcLnJycgJmTWRlZREfHx/Qx9atW/nb3/7G0KFDmThxYsC1+Pj4gJ08jvQRFhaGzWajV69eXHrppf5rtWrV8s9MyMrKCii8mJWVFbAbw4l89tlnuN1uoHQpxtFWrFhBz549mT59Ov379w+4FhMTw4cffojT6eTgwYPUrFmT//u//6N+/frljmOxWLBYLBWOqzqpX8NBsMlIVp4Tk9FInaiQc+4vt4iIiIiInDtqR4ZQ4vGxMeMQB/OLiXacnZ/F/iy7wIWrxEOrOpEkRoWcsXGrNDHh8/kYNWoUH3zwAWlpaSQlJQVcb9OmDcHBwSxbtoy+ffsCsH37djIyMmjfvr2/3ZYtW+jatSt33HEHjz76aJlx2rdvX+bb7dTUVH8foaGhZbZ6TEpKIj4+nmXLlvkTEXl5eaxdu/akduCoW7duuefT0tLo0aMHjz/+eMCykz+zWq3UqlULt9vN+++/z0033VThsc8WRqOButF26kbbqzoUERERERGRCqlXw47BAOkZORw47CQm9OyuOXfgcDE+n4/WdSPP+GezKk1MjBgxgvnz5/PRRx8RGhrqr90QHh6OzWYjPDycwYMHM2bMGKKioggLC2PUqFG0b9+edu3aAaXT9rt27UpKSgpjxozx92EymYiJiQFg2LBhvPjii4wbN45Bgwbx1VdfsXDhQj799NNjxmYwGBg9ejSPPPIIDRs2JCkpiUmTJlGzZk369Onjb5eRkUF2djYZGRl4PB7S09OB0iUEDoej3L6XL19Ojx49uPfee+nbt68/ZrPZ7C+AuXbtWvbu3UvLli3Zu3cvU6ZMwev1Mm7cuL/yyEVERERERKSS1I22YzIa2JiRw76cIuLCrRjPst06vP/dfSPYZKR13UhqR565mRJHGHxHL8A/04Mf4z/Y7NmzGTBgAABOp5P777+ft99+m+LiYlJSUnj55Zf9SzmmTJni37LzaHXr1mX37t3+47S0NO677z62bt1K7dq1mTRpkn+MY/H5fEyePJlXX32VnJwcLrvsMl5++WUuvPBCf5sBAwYwd+7cMvcuX76cLl26lNvvse7p3LkzaWlpQOkyj7vvvptffvkFh8PBNddcw7/+9S9q1qx53JiPyMvLIzw8nNzcXMLCwip0j4iIiIiIiJy8rDwnm37L5Y/8YmJCLYSYq0XVhBMqdJVw4HAxNRwWmtUOJy7sr836ONXPoVWamJDTR4kJERERERGRM6fI5WF7Zh47DhQQZDQQE2qptrMnvD4fBw4X4/H6uCDGzoXxoZWSTDnVz6FnRxpHREREREREpBqzmU20SIwgNszK1t/z2JNdWC1nTxw9SyK5Zhg1w63HXM1wplSvJyQiIiIiIiJyljIYDNSMsBEZYvbPnsgtchNtt2AOMlZpbK4SL9kFLrw+H43iQyttlkRlqB5RiIiIiIiIiJwjjp498XPWYQ7kO/H5IMJmxm4xnbEZCj6fj4JiDzlFLgwGiHFYaBAXWi1mSRxNiQkRERERERGRSnZk9kRcmJWD+cX8dqiIvTlFHCwsxmEOIjwkmCDj6ZlFUeL1klvoJt9Vgt0cxAUxDmpH2oh2WDAZq09C4gglJkREREREREROE5PRQGyYldgwKw3jHGTmOvn1YCGZuU4ALEEmbMEmrMFGgkynlqgo8Xhxur0UuT0Ul3jwAVEhZi6KDyU+3EqoNbgS31HlU2JCRERERERE5AwItQYTag2mXg07+w8Xk1Pg4o/8YvKLPeQ63Xi8XowGA5YgE5YgI0aDAYMBjqy68PlKX16fj+ISL8UlHrw+HyajEVuwiXBbENEOO5F2M7GhFoJPMdFxpikxISIiIiIiInIGBZuM1IqwUSvCBoDT7aGguISCYg+HnW7+yC+m0O2hxOPD5/PhxQeAEQMGgwGjEX8SIswajN1iwm4Jwhpsqsq3dcqUmBARERERERGpQtZgE9ZgE9GO/51ze7x4fT683tIZEoB/BoXJaDhrZkNUxLnzTkREpFrJzMzk3nvvpUGDBlitVuLi4ujYsSP//ve/KSwsrOrwRERERKq1YJOxtP6EuXQ2hN0ShM1cmsA4l5ISoBkTIiJyGvzyyy907NiRiIgIHnvsMZo1a4bFYmHTpk28+uqr1KpVi169ep10vy6XC7PZfBoiFhEREZGqcm6lWUREpEr4fD6KXB7yi0twuj0MHz6coKAg1q9fz0033URycjL169end+/efPrpp/Ts2ROAnJwchgwZQkxMDGFhYXTt2pXvv//e3++UKVNo2bIls2bNIikpCavVCkCXLl0YNWoUo0ePJjIykri4OGbOnElBQQEDBw4kNDSUBg0asGTJEn9fHo+HwYMHk5SUhM1m46KLLuK5554LeB8DBgygT58+PPXUUyQkJBAdHc2IESNwu90APPzwwzRt2rTM+2/ZsiWTJk2q9OcqIiIicj5QYkJERE6Zx+tjX24R3+3O5qsfs/hqWxYfrP6RL774gjsGD8VmCyn3PsN/S0vfeOON7N+/nyVLlrBhwwZat27N3/72N7Kzs/1td+zYwfvvv8+iRYtIT0/3n587dy41atRg3bp1jBo1irvvvpsbb7yRDh068J///IerrrqK22+/3b9sxOv1Urt2bd599122bt3KQw89xPjx41m4cGFAbMuXL2fnzp0sX76cuXPnMmfOHObMmQPAoEGD2LZtG999952//caNG/nhhx8YOHBgpTxTERERkfONwef7bxUNOafk5eURHh5Obm4uYWFhVR2OiJyD3B4vm/fmsvNAPgDhVjMmo4FN6esZfeu1jHvyVW7vdxPJCWGYjAZq1KiB01m6X/eIESPo2bMn1157Lfv378disfj7bdCgAePGjWPo0KFMmTKFxx57jL179xITE+Nv06VLFzweD9988w1QOhsiPDyc66+/nnnz5gGlNS4SEhJYvXo17dq1K/c9jBw5kszMTN577z2gdMZEWloaO3fuxGQqrWp90003YTQaWbBgAQDXXHMN9erV4+WXXwbgnnvuYdOmTSxfvrzSnq2IiIjI2ehUP4dqxoSIiJw0n8/H1t9z2Z51mGi7hVoRITispQWZ7ObS8kU2cxBbfs/l56zDAKxbt4709HSaNGlCcXEx33//Pfn5+URHR+NwOPyvXbt2sXPnTv9YdevWDUhKHNG8eXP/zyaTiejoaJo1a+Y/FxcXB8D+/fv951566SXatGlDTEwMDoeDV199lYyMjIB+mzRp4k9KACQkJAT0ceedd/L222/jdDpxuVzMnz+fQYMGndJzFBEREREVvxQRkVOQW+Rm9x+FRNstZfbLrl03CYPBQNaeXVx6RTA7DxSQGBVC/fr1AbDZSvfrzs/PJyEhgbS0tDL9R0RE+H+22+3lxhAcHBxwbDAYAs4dWS7i9XoBWLBgAWPHjuXpp5+mffv2hIaG8uSTT7J27doT9nukD4CePXtisVj44IMPMJvNuN1ubrjhhnJjFBEREZETU2JCRERO2r4cJ84SD7Fh1jLXwiOjaNuxC++/MYsbbh/MAaeBrDwn9WMcAe1at25NZmYmQUFB1KtX77THvGrVKjp06MDw4cP9546emVFRQUFB3HHHHcyePRuz2cwtt9ziT7aIiIiIyMnTUg4RETlp2YUurEGmY14fO/WJ0l0wrr+SdcsWs/GHzWzfvp0333yTH3/8EZPJRLdu3Wjfvj19+vThiy++YPfu3Xz77bdMmDCB9evXV3rMDRs2ZP369SxdupSffvqJSZMmBRSxPBlDhgzhq6++4vPPP9cyDhEREZG/SDMmRETkpHl9Pv9SifLUrpvEnI+/Yt6/n+Wtlx7nhcn7sFosNG7cmLFjxzJ8+HAMBgOfffYZEyZMYODAgRw4cID4+Hg6derkrw9Rme666y42btzIzTffjMFgoF+/fgwfPjxgS9GKatiwIR06dCA7O5tLL7200mMVEREROZ9oV45zlHblEJHTKT3jED/vz6d2ZPnbgR4tI7uA5r08Lh4AACAASURBVLUiSK557vx/kc/no2HDhgwfPpwxY8ZUdTgiIiIi1YJ25RARkTMmIcKGwVC6ZejxFLk8mE1G4sLL1qI4Wx04cIAXX3yRzMxMBg4cWNXhiIiIiJz1tJRDREROWg2HhbgwK/tyndSKsGEsZ1mHx+tj/2EnSTXsRIYEl9PL2Sk2NpYaNWrw6quvEhkZWdXhiIiIiJz1lJgQEZGTZjIaaF47ghLPIfZkFxIZYibUGlS6tabPR16Rm9wiFzUjQmhWO/y49SjONloBKSIiIlK5lJgQEZFTEm4Lpm1SFDv3H+a3Q072HCrEgAEfPsKswTSrFUH9GAc287F37xARERERUWJCREROmcMSRIvESBrElpBT5Mbj8WEyGYgKMSshISIiIiIVosSEiIj8ZXZLEHaL/kkRERERkZOnXTlEREREREREpMooMSEiIiIiIiIiVUaJCRERERERERGpMkpMiIiIiIiIiEiVUWJCRERERERERKqMEhMiIiIiIiIiUmWUmBABdu/ejcFgID09/YyMl5aWhsFgICcn54yMJyIiIiIiUl0pMSFnrQEDBmAwGBg2bFiZayNGjMBgMDBgwIAzH1gFdOjQgX379hEeHl7VoYiIiIiIiFQpJSbkrJaYmMiCBQsoKiryn3M6ncyfP586depUYWTHZzabiY+Px2AwVHUoIiIiIiIiVUqJCTmrtW7dmsTERBYtWuQ/t2jRIurUqUOrVq385z7//HMuu+wyIiIiiI6OpkePHuzcufOY/Xo8HgYPHkxSUhI2m42LLrqI5557zn/966+/Jjg4mMzMzID7Ro8ezeWXXw7Ar7/+Ss+ePYmMjMRut9OkSRM+++wzoOxSjoMHD9KvXz9q1apFSEgIzZo14+233/7rD0hERERERKSaU2JCzho+n49CVwmHnW6cbo///KBBg5g9e7b/+PXXX2fgwIEB9xYUFDBmzBjWr1/PsmXLMBqNXHfddXi93nLH8nq91K5dm3fffZetW7fy0EMPMX78eBYuXAhAp06dqF+/Pm+88Yb/HrfbzVtvvcWgQYOA0uUkxcXFfP3112zatInHH38ch8NR7nhOp5M2bdrw6aefsnnzZoYOHcrtt9/OunXrTu1hiYiIiIiInCWCqjoAkRPx+Xxk5RWTkV3A/rxivD4fwSYjhwpclHi83HbbbTz44IP8+uuvAKxatYoFCxaQlpbm76Nv374Bfb7++uvExMSwdetWmjZtWmbM4OBgpk6d6j9OSkpi9erVLFy4kJtuugmAwYMHM3v2bP7xj38A8Mknn+B0Ov3XMzIy6Nu3L82aNQOgfv36x3yPtWrVYuzYsf7jUaNGsXTpUhYuXMgll1xyMo9LRERERETkrKIZE1Kt+Xw+duzP59udf5CRXYgl2ESoNRiAnCI32QUufNZQrr32WubMmcPs2bO59tprqVGjRkA/P//8M/369aN+/fqEhYVRr149oDR5cCwvvfQSbdq0ISYmBofDwauvvhrQfsCAAezYsYM1a9YAMGfOHG666SbsdjsA99xzD4888ggdO3Zk8uTJ/PDDD8ccy+Px8M9//pNmzZoRFRWFw+Fg6dKlx41PRERERETkXKDEhFRr+w8Xs3lvLiHmIGpFhOCwBGENNhERYsZuNuHx+vh+Tw5/v/0O5syZw9y5c/1LKY7Ws2dPsrOzmTlzJmvXrmXt2rUAuFyucsddsGABY8eOZfDgwXzxxRekp6czcODAgPaxsbH07NmT2bNnk5WVxZIlSwLGHjJkCL/88gu33347mzZt4uKLL+aFF14od7wnn3yS5557jgceeIDly5eTnp5OSkrKMeMTERERERE5V2gph1Rre7ILKfH6CLcFl7lmMBiwBps4WOCibfsuuFwuDAYDKSkpAe0OHjzI9u3bmTlzpr8w5cqVK4877qpVq+jQoQPDhw/3nyuvWOaQIUPo168ftWvX5oILLqBjx44B1xMTExk2bBjDhg3jwQcfZObMmYwaNarc8Xr37s1tt90GlNa4+Omnn2jcuPFx4xQRERERETnbacaEVFtOt4esvGLCyklKHM1sMnIg3822bdvYunUrJpMp4HpkZCTR0dG8+uqr7Nixg6+++ooxY8Yct8+GDRuyfv16li5dyk8//cSkSZP47rvvyrRLSUkhLCyMRx55pEzBzdGjR7N06VJ27drFf/7zH5YvX05ycvIxx0tNTeXbb79l27Zt3HXXXWRlZR03RhERERERkXOBEhNSbXl9Pjw+L0FGw3HbBRkNuL1ewsLCCAsLK3PdaDSyYMECNmzYQNOmTbnvvvt48sknj9vnXXfdxfXXX8/NN9/MpZdeysGDBwNmTxzd94ABA/B4PPTv3z/gmsfjYcSIESQnJ9O9e3cuvPBCXn755XLHmzhxIq1btyYlJYUuXboQHx9Pnz59jhujiIiIiIjIucDg8/l8VR2EVL68vDzCw8PJzc0t98P62cDt8bL8x/24PT6i7OZjtvvtUCENYh20qhN5BqP7n8GDB3PgwAE+/vjjKhlfRERERESkOjjVz6GqMSHVVrDJSJ2oEL7/LYfIkGAMhrIzJ1wlXgwGqBlhO+Px5ebmsmnTJubPn6+khIiIiIiIyCnSUg6p1mpF2oiym9mbU4THGzi5p7jEw77cImpHhlDDYTnjsfXu3ZurrrqKYcOGceWVV57x8UVERERERM4FmjEh1VqoNZg2daNI33OIvTmFmIxGgowGXCUejEYDSTXstEiMwHSCOhSnQ1pa2hkfU0RERERE5FyjxIRUe1F2Mx0b1GB/XjH7DxfjKvEQYjaREG4j2mGpkqSEiIiIiIiIVA4lJuSsYAkykRgVQmJUSFWHIiIiIiIiIpVINSZEREREREREpMooMSEiIiIiIiIiVUaJCRERERERERGpMkpMiIiIiIiIiEiVUWJCRERERERERKqMEhMiIiIiIiIiUmWUmBARERERERGRKqPEhJxVDAYDH374YVWHISIiIiIiIpVEiQmpVg4cOMDdd99NnTp1sFgsxMfHk5KSwqpVqwDYt28fV199dRVHKSIiIiIiIpUlqKoDkPNXicfLwQIXhS4PXp8Pk8HArX2ux+txM3fuXOrXr09WVhbLli3j4MGDAMTHx1dx1JXH5XJhNpurOgwREREREZEqpRkTcsYVl3j45UA+3/z0B9/8fIB1uw6yYXc2X/3wC6u/XUn/ex6kYctLSUyswyWXXMKDDz5Ir169gMClHLt378ZgMLBo0SKuuOIKQkJCaNGiBatXrw4Yb+bMmSQmJhISEsJ1113HM888Q0REhP/6zp076d27N3FxcTgcDtq2bcuXX34Z0Ee9evX45z//Sb9+/bDb7dSqVYuXXnopoE1GRga9e/fG4XAQFhbGTTfdRFZWlv/6lClTaNmyJbNmzSIpKQmr1QqA1+tl2rRpJCUlYbPZaNGiBe+9917lPXAREREREZFqTIkJOaMKiktYvzubdbuyOVzsJjbUSp0oO4lRdhrWisUWYuezxZ/wzY+/s+X3PEo83hP2OWHCBMaOHUt6ejoXXngh/fr1o6SkBIBVq1YxbNgw7r33XtLT07nyyit59NFHA+7Pz8/nmmuuYdmyZWzcuJHu3bvTs2dPMjIyAto9+eSTtGjRgo0bN/J///d/3HvvvaSmpgKlyYXevXuTnZ3NihUrSE1N5ZdffuHmm28O6GPHjh28//77LFq0iPT0dACmTZvGvHnzmDFjBlu2bOG+++7jtttuY8WKFaf8nEVERERERM4WBp/P56vqIKTy5eXlER4eTm5uLmFhYVUdDgBOt4cNv2az51ARtcJtBJnK5sWWf/4J/5pwH8VOJ/UuakKnTp0ZOaQ/LVq0AEpnTHzwwQf06dOH3bt3k5SUxKxZsxg8eDAAW7dupUmTJmzbto1GjRpxyy23kJ+fz+LFi/1j3HbbbSxevJicnJxjxtq0aVOGDRvGyJEjgdIZE8nJySxZssTf5pZbbiEvL4/PPvuM1NRUrr76anbt2kViYmJALOvWraNt27ZMmTKFxx57jL179xITEwNAcXExUVFRfPnll7Rv397f95AhQygsLGT+/Pmn+rhFRERERETOqFP9HKoZE3LG/HqwgIzsImpFlJ+UALiie08+/nYzT7zyJu07/Y2vv15BmzZtmDNnzjH7bd68uf/nhIQEAPbv3w/A9u3bueSSSwLa//k4Pz+fsWPHkpycTEREBA6Hg23btpWZMXF04uDI8bZt2wDYtm0biYmJ/qQEQOPGjYmIiPC3Aahbt64/KQGlMygKCwu58sorcTgc/te8efPYuXPnMd+ziIiIiIjIuULFL+WMKC7x8OvBQsKswQQZj58Ps1isXHJZFy65rAvX3jGS+U+NZ/LkyQwYMKDc9sHBwf6fDQYDULq0oqLGjh1LamoqTz31FA0aNMBms3HDDTfgcrkq3EdF2e32gOP8/HwAPv30U2rVqhVwzWKxVPr4IiIiIiIi1Y0SE3JG7M8r5lChi1oRISd1X7gtmKja9ckvWHLixuW46KKL+O677wLO/fl41apVDBgwgOuuuw4oTRbs3r27TF9r1qwpc5ycnAxAcnIye/bsYc+ePQFLOXJycmjcuPEx42vcuDEWi4WMjAw6d+580u9PRERERETkbKfEhJwRhS4PACaj4Zhtcg9lM2HUIHrccCsNGjUhxO5g26aNvD/731xzTY9TGnfUqFF06tSJZ555hp49e/LVV1+xZMkS/8wKgIYNG7Jo0SJ69uyJwWBg0qRJ5c64WLVqFU888QR9+vQhNTWVd999l08//RSAbt260axZM/7+97/z7LPPUlJSwvDhw+ncuTMXX3zxMeMLDQ1l7Nix3HfffXi9Xi677DJyc3NZtWoVYWFh3HHHHaf0vkVERERERM4WSkzIGeH2eDEajp2UALCF2GnSog3vzJ7B3ozdlJSUEJtQk659buGfj0w5pXE7duzIjBkzmDp1KhMnTiQlJYX77ruPF1980d/mmWeeYdCgQXTo0IEaNWrwwAMPkJeXV6av+++/n/Xr1zN16lTCwsJ45plnSElJAUqXkHz00Uf+RIjRaKR79+688MILJ4zxn//8JzExMUybNo1ffvmFiIgIWrduzfjx40/pPYuIiIiIiJxNtCvHOaq67cqxPfMw6XsOUSfKfuLGR/H5fPyWU0jnC2OJC7NWSix33nknP/74I998802F76lXrx6jR49m9OjRlRKDiIiIiIjIueZUP4dqxoScEXaLCTDg8fqOu5zjzw4XlxASHITdcup/VJ966imuvPJK7HY7S5YsYe7cubz88sun3J+IiIiIiIhUHiUm5IyIDbUSZQ8mp9BFtKPiu03kFrq4MC4UxzESE063hzynG4+3dOLPfcOHUpSfx0cffeRvs27dOp544gkOHz5M/fr1ef755xkyZMhfe0MiIiIiIiJSKZSYkDPCHGSkblQIGzJyCLcFE2Q6/pahAPnFJQSZjNSMtJW5llPoYl+Ok4xDheQVufH6fICBzDwn7kIXOw/kExdmxWEJYuHChX85/vJ26RAREREREZG/7sSfDkUqSd0adupGhbA3pwi3p+yuF0fLLy7hYH4xF8WFEnPUDAuv18f2zDy+/ukAP+zNocTjo2a4jTpRdupEhWALNuHx+li3K5vpry+kbbsOREREEB0dTY8ePdi5c6e/r927d2MwGFiwYAEdOnTAarXStGlTVqxY4W/j8XgYPHgwSUlJ2Gw2LrroIp577rmAWAcMGECfPn146qmnSEhIIDo6mhEjRuB2uyvpyYmIiIiIiJy7lJiQM8YSZKJlnQjqRoewL9dJVp4TV0lggqLQVcLenNJZEM1qhdMoIcy/tafX62PrvjzS9+RiCTJRJ8pOlN2M8aiaFUaDwT87o7CwgCtuGMgHX3zNsmXLMBqNXHfddWW2Av3HP/7B/fffz8aNG2nfvj09e/bk4MGD/x3TS+3atXn33XfZunUrDz30EOPHjy8zC2P58uXs3LmT5cuXM3fuXObMmcOcOXNOw1MUERERERE5t2hXjnNUdduV42iuEi+/5xTx68ECDuS78Hp9+PBhAKzBJuLCrNSJDiE+zOpPSgDs3H+Y/2TkEBliPmYxzEfGjeRwXi6Pz3gDgOwCFyUeL+0uiMbkyicmJoZNmzbRtGlTdu/eTVJSEv/617944IEHACgpKSEpKYlRo0Yxbty4cscYOXIkmZmZvPfee0DpjIm0tDR27tyJyWQC4KabbsJoNLJgwYLKemwiIiIiIiLVmnblkLOGOchIvRp2EqNCOFhQjNPlxePzEWQ0EGoNItwWHJCQgNJkxi9/FGALNlV4h449u3cy89l/8cN/NpCfl43hvzm4jIwMmjZt6m/Xvn17/89BQUFcfPHFbNu2zX/upZde4vXXXycjI4OioiJcLhctW7YMGKtJkyb+pARAQkICmzZtqvhDEREREREROU9V6VKOadOm0bZtW0JDQ4mNjaVPnz5s3749oI3T6WTEiBFER0fjcDjo27cvWVlZ/uvff/89/fr1IzExEZvNRnJycpkaAABpaWm0bt0ai8VCgwYNKjTN3ufz8dBDD5GQkIDNZqNbt278/PPPAW0effRROnToQEhICBERERV632lpafTu3ZuEhATsdjstW7bkrbfeCmgzc+ZMLr/8ciIjI4mMjKRbt26sW7euQv2fLUxGA7GhpbMjkv6bqIgIMZdJSgDsP+wku8BNRIi5wv3/Y+jfycvJYew/n+Zfcz7mi+UrAXC5XBXuY8GCBYwdO5bBgwfzxRdfkJ6ezsCBA8v0ERwcHHBsMBjKLBkRERERERGRsqo0MbFixQpGjBjBmjVrSE1Nxe12c9VVV1FQUOBvc9999/HJJ5/w7rvvsmLFCn7//Xeuv/56//UNGzYQGxvLm2++yZYtW5gwYQIPPvggL774or/Nrl27uPbaa7niiitIT09n9OjRDBkyhKVLlx43vieeeILnn3+eGTNmsHbtWux2OykpKTidTn8bl8vFjTfeyN13313h9/3tt9/SvHlz3n//fX744QcGDhxI//79Wbx4sb9NWloa/fr1Y/ny5axevZrExESuuuoq9u7dW+FxziUZBwsJMhowGcsmLcqTeyibjF92MGDEGC7rfAWxdS5g597MctuuWbPG/3NJSQkbNmwgOTkZgFWrVtGhQweGDx9Oq1ataNCgQUABTREREREREflrqnQpx+effx5wPGfOHGJjY9mwYQOdOnUiNzeX1157jfnz59O1a1cAZs+eTXJyMmvWrKFdu3YMGjQooI/69euzevVqFi1axMiRIwGYMWMGSUlJPP300wAkJyezcuVKpk+fTkpKSrmx+Xw+nn32WSZOnEjv3r0BmDdvHnFxcXz44YfccsstAEydOtUfe0WNHz8+4Pjee+/liy++YNGiRfTo0QOgzAyKWbNm8f7777Ns2TL69+9f4bHOBSUeL3lOd4WXcACEhkcQHhnFRwvmUSMmju07drHg5X+V2/all16iYcOGJCcnM336dA4dOuT/c9WwYUPmzZvH0qVLSUpK4o033uC7774jKSmpUt6biIiIiIjI+a5a7cqRm5sLQFRUFFA6G8LtdtOtWzd/m0aNGlGnTh1Wr1593H6O9AGwevXqgD4AUlJSjtvH/7d371FR1ev/wN8DzgzDZUACuYkIcfGIYopHHE3tBEe08pKc8hCVlksyMTO11FJRVwV5Od2OdfRUeCqVrERdaZgig3mJzAAhjNQvShZIiYAGCDLP7w8X++fIoGjIKL5fa81azP48+7M/m3nYM/Ow92cXFxejrKzMbD1nZ2dERERccb3rdfmYL1dTU4OGhoYWY86fP4/q6mqzR0fRKAKTAK05WcJkMsHWthNsbGyw5I3/oqggD4/eNwQp/1qEKc8vtLhOcnIykpOT0adPH+zZswdbtmyBm5sbAOCpp57CuHHjMH78eEREROD06dOYOnVqW+4eERERERHRbe2mmfzSZDJhxowZGDx4sDIxYVlZGTQaTbO5Gzw8PFBWZvm0/H379uGTTz7B1q1blWVlZWXw8PBo1kd1dTVqa2uh0+ma9dPUv6X1Wtr29dqwYQMOHDiAVatWtRgzZ84ceHt7NyuwNElKSlLO3uhobFUq2KiA1tw/5szp39HV7+LZDH8dPAzrtu8DAJRX16GLXgtLN6H5y1/+guzsbIv9abVapKSkICUlxWx5UlKS8rOls2XeeOONqw+WiIiIiIiIbp4zJhISElBQUPCnbq9YUFCAMWPGIDExEcOHD2/1emvXroWjo6Py+Prrr697DJcLDQ1V+h05cmSz9szMTDzxxBP473//i9DQUIt9JCcnIzU1FWlpabCzs7MYM2/ePFRVVSmPn3/+uc32wdpsbVRw0HZCTX1jizHVVZXYu2s7crL3ov/gYc3az18wwclObWFNIiIiIiIisqab4oyJadOm4YsvvsDu3bvRtWtXZbmnpyfq6+tRWVlpdtbEqVOn4OnpadZHYWEhIiMjER8fj/nz55u1eXp6mt3Jo6kPvV4PnU6H0aNHIyIiQmnz8fFBaWmpEufl5WW23uW3irySbdu2oaGhAQCanZmRlZWFUaNG4fXXX29x3ojly5cjOTkZO3fuRFhYWIvb0Wq10Gq1rR7XrUSlUsHP1QG/VtbCZBLYWLim49W503H4UA7+OWkqhkaZF4DqGhrRyVYFT2fLRR0iIiIiIiKyHqsWJkQEzzzzDNLS0mA0GptNKBgeHg61Wo2MjAzExMQAAIqKilBSUgKDwaDE/fDDD7j33nsxYcIEvPLKK822YzAYsG3bNrNlO3bsUPpwcnKCk5OTWbu/vz88PT2RkZGhFCKqq6uRnZ19TXfg8PPzs7jcaDTigQcewGuvvYb4+HiLMUuXLsUrr7yC7du3o3///q3eZkfURa+FXqdGVV0DOlu4ZWjyux+2uO6Zmnp0cdLC9bL1unfvbvHSDiIiIiIiImo/Vi1MJCQkYN26ddi8eTOcnJyUuRucnZ2h0+ng7OyMSZMmYebMmXB1dYVer8czzzwDg8GAgQMHArh4+ca9996L6OhozJw5U+nD1tYW7u7uAIApU6bg3//+N1544QU8+eST2LVrFzZs2GA2D8XlVCoVZsyYgZdffhlBQUHw9/fHggUL4O3tjbFjxypxJSUlqKioQElJCRobG5GbmwsACAwMhKOjo8W+MzMz8cADD+DZZ59FTEyMMmaNRqNMbvnaa69h4cKFWLduHbp3767ENF0WcruxU9vC/w4H5J2shE5tCzu1bavWq65rgAjQ3c3B4pkWREREREREZF0qseK/jFUqy18UU1JSMHHiRABAXV0dZs2ahfXr1+P8+fOIjo7GO++8o1zKsWjRIouTPvr5+eH48ePKc6PRiOeeew6FhYXo2rUrFixYoGyjJSKCxMRErF69GpWVlbj77rvxzjvvIDg4WImZOHEi/ve//zVbNzMzE/fcc4/FfltaZ9iwYTAajQAu/jf/xIkTzWISExOxaNGiK44buHh2h7OzM6qqqqDX668afyu40GjCoZNV+OnUWbg7aWGvuXJdraq2AWfrGhDW1RnBHk4t5hsRERERERH9edf7PdSqhQm6cTpiYQIAGhpNKPy1CkfL/4BJBC46DRy0tkrRwSSC6toGVNc1QKe2xV+89LjT3ZFnSxAREREREd1g1/s99KaY/JKotdS2Nujt4wIvZx1OnqnFL5W1OFNTD0DQVGHT26kR6u0MbxcdXB2az0dBRERERERENw8WJuiWY2OjQhe9Hbro7RDk4YjKmgY0mi6WJdSdbOBqr4FO07o5KIiIiIiIiMi6WJigW5qTnRpOdmprD4OIiIiIiIiuk421B0BEREREREREty8WJoiIiIiIiIjIaliYICIiIiIiIiKrYWGCiIiIiIiIiKyGhQkiIiIiIiIishoWJoiIiIiIiIjIaliYICIiIiIiIiKrYWGCiIiIiIiIiKyGhQkiIiIiIiIishoWJoiIiIiIiIjIaliYICIiIiIiIiKrYWGCiIiIiIiIiKyGhQkiIiIiIiIisppO1h4A3RgiAgCorq628kiIiIiIiIjodtD0/bPp+2hrsTDRQZ09exYA4Ovra+WREBERERER0e3k7NmzcHZ2bnW8Sq61lEG3BJPJhF9//RVOTk5QqVTWHg6qq6vh6+uLn3/+GXq93trDoZsIc4OuhPlBLWFuUEuYG9QS5ga1hLnRdkQEZ8+ehbe3N2xsWj9zBM+Y6KBsbGzQtWtXaw+jGb1ezz92soi5QVfC/KCWMDeoJcwNaglzg1rC3Ggb13KmRBNOfklEREREREREVsPCBBERERERERFZje2iRYsWWXsQdHuwtbXFPffcg06deAURmWNu0JUwP6glzA1qCXODWsLcoJYwN6yLk18SERERERERkdXwUg4iIiIiIiIishoWJoiIiIiIiIjIaliYICIiIiIiIiKrYWGCiIiIiIiIiKyGhQlqFytXrkT37t1hZ2eHiIgIfPvtt9YeErWhRYsWQaVSmT169OihtNfV1SEhIQF33HEHHB0dERMTg1OnTpn1UVJSgvvvvx/29vbo0qULnn/+eVy4cMEsxmg0ol+/ftBqtQgMDMSaNWvaY/foGuzevRujRo2Ct7c3VCoVNm3aZNYuIli4cCG8vLyg0+kQFRWFI0eOmMVUVFQgLi4Oer0eLi4umDRpEs6dO2cWc+jQIQwZMgR2dnbw9fXF0qVLm43l008/RY8ePWBnZ4fevXtj27Ztbb/D1GpXy42JEyc2O46MGDHCLIa50TElJSXhr3/9K5ycnNClSxeMHTsWRUVFZjHt+T7Czyw3j9bkxj333NPs2DFlyhSzGOZGx/Puu+8iLCwMer0eer0eBoMBX375pdLOY8YtSIhusNTUVNFoNPLBBx/IDz/8IJMnTxYXFxc5deqUtYdGbSQxMVFCQ0OltLRUefz2229K+5QpU8TX11cyMjLku+++zXybAAAAFG5JREFUk4EDB8qgQYOU9gsXLkivXr0kKipKcnJyZNu2beLm5ibz5s1TYv7v//5P7O3tZebMmVJYWChvv/222NraSnp6ervuK13Ztm3b5KWXXpKNGzcKAElLSzNrT05OFmdnZ9m0aZPk5eXJ6NGjxd/fX2pra5WYESNGSJ8+feSbb76Rr7/+WgIDAyU2NlZpr6qqEg8PD4mLi5OCggJZv3696HQ6WbVqlRKzd+9esbW1laVLl0phYaHMnz9f1Gq15Ofn3/hfAll0tdyYMGGCjBgxwuw4UlFRYRbD3OiYoqOjJSUlRQoKCiQ3N1fuu+8+6datm5w7d06Jaa/3EX5mubm0JjeGDRsmkydPNjt2VFVVKe3MjY5py5YtsnXrVvnpp5+kqKhIXnzxRVGr1VJQUCAiPGbciliYoBtuwIABkpCQoDxvbGwUb29vSUpKsuKoqC0lJiZKnz59LLZVVlaKWq2WTz/9VFl2+PBhASD79+8XkYtfWGxsbKSsrEyJeffdd0Wv18v58+dFROSFF16Q0NBQs77Hjx8v0dHRbb071EYu//JpMpnE09NTli1bpiyrrKwUrVYr69evFxGRwsJCASAHDhxQYr788ktRqVTyyy+/iIjIO++8I507d1ZyQ0Rkzpw5EhISojx/+OGH5f777zcbT0REhDz11FNtu5N0XVoqTIwZM6bFdZgbt4/y8nIBIFlZWSLSvu8j/Mxyc7s8N0QuFiaeffbZFtdhbtw+OnfuLO+99x6PGbcoXspBN1R9fT0OHjyIqKgoZZmNjQ2ioqKwf/9+K46M2tqRI0fg7e2NgIAAxMXFoaSkBABw8OBBNDQ0mOVAjx490K1bNyUH9u/fj969e8PDw0OJiY6ORnV1NX744Qcl5tI+mmKYR7eO4uJilJWVmb2Ozs7OiIiIMMsFFxcX9O/fX4mJioqCjY0NsrOzlZihQ4dCo9EoMdHR0SgqKsKZM2eUGObLrcdoNKJLly4ICQnB008/jdOnTyttzI3bR1VVFQDA1dUVQPu9j/Azy83v8txosnbtWri5uaFXr16YN28eampqlDbmRsfX2NiI1NRU/PHHHzAYDDxm3KI6WXsA1LH9/vvvaGxsNPujBwAPDw/8+OOPVhoVtbWIiAisWbMGISEhKC0txeLFizFkyBAUFBSgrKwMGo0GLi4uZut4eHigrKwMAFBWVmYxR5rarhRTXV2N2tpa6HS6G7V71EaaXktLr+Olr3OXLl3M2jt16gRXV1ezGH9//2Z9NLV17ty5xXxp6oNuPiNGjMC4cePg7++PY8eO4cUXX8TIkSOxf/9+2NraMjduEyaTCTNmzMDgwYPRq1cvAGi395EzZ87wM8tNzFJuAMAjjzwCPz8/eHt749ChQ5gzZw6KioqwceNGAMyNjiw/Px8GgwF1dXVwdHREWloaevbsidzcXB4zbkEsTBDRnzZy5Ejl57CwMERERMDPzw8bNmxgwYCIWuWf//yn8nPv3r0RFhaGO++8E0ajEZGRkVYcGbWnhIQEFBQUYM+ePdYeCt1kWsqN+Ph45efevXvDy8sLkZGROHbsGO688872Hia1o5CQEOTm5qKqqgqfffYZJkyYgKysLGsPi64TL+WgG8rNzQ22trbNZsE9deoUPD09rTQqutFcXFwQHByMo0ePwtPTE/X19aisrDSLuTQHPD09LeZIU9uVYvR6PYsft4im1/JKxwNPT0+Ul5ebtV+4cAEVFRVtki887tw6AgIC4ObmhqNHjwJgbtwOpk2bhi+++AKZmZno2rWrsry93kf4meXm1VJuWBIREQEAZscO5kbHpNFoEBgYiPDwcCQlJaFPnz548803ecy4RbEwQTeURqNBeHg4MjIylGUmkwkZGRkwGAxWHBndSOfOncOxY8fg5eWF8PBwqNVqsxwoKipCSUmJkgMGgwH5+flmXzp27NgBvV6Pnj17KjGX9tEUwzy6dfj7+8PT09PsdayurkZ2drZZLlRWVuLgwYNKzK5du2AymZQPmwaDAbt370ZDQ4MSs2PHDoSEhKBz585KDPPl1nby5EmcPn0aXl5eAJgbHZmIYNq0aUhLS8OuXbuaXY7TXu8j/Mxy87labliSm5sLAGbHDubG7cFkMuH8+fM8ZtyqrD37JnV8qampotVqZc2aNVJYWCjx8fHi4uJiNgsu3dpmzZolRqNRiouLZe/evRIVFSVubm5SXl4uIhdv2dStWzfZtWuXfPfdd2IwGMRgMCjrN92yafjw4ZKbmyvp6eni7u5u8ZZNzz//vBw+fFhWrlzJ24XehM6ePSs5OTmSk5MjAORf//qX5OTkyIkTJ0Tk4u1CXVxcZPPmzXLo0CEZM2aMxduF9u3bV7Kzs2XPnj0SFBRkdkvIyspK8fDwkMcee0wKCgokNTVV7O3tm90SslOnTrJ8+XI5fPiwJCYm8paQVnal3Dh79qzMnj1b9u/fL8XFxbJz507p16+fBAUFSV1dndIHc6Njevrpp8XZ2VmMRqPZLR9ramqUmPZ6H+FnlpvL1XLj6NGjsmTJEvnuu++kuLhYNm/eLAEBATJ06FClD+ZGxzR37lzJysqS4uJiOXTokMydO1dUKpV89dVXIsJjxq2IhQlqF2+//bZ069ZNNBqNDBgwQL755htrD4na0Pjx48XLy0s0Go34+PjI+PHj5ejRo0p7bW2tTJ06VTp37iz29vby4IMPSmlpqVkfx48fl5EjR4pOpxM3NzeZNWuWNDQ0mMVkZmbKXXfdJRqNRgICAiQlJaU9do+uQWZmpgBo9pgwYYKIXLxl6IIFC8TDw0O0Wq1ERkZKUVGRWR+nT5+W2NhYcXR0FL1eL0888YScPXvWLCYvL0/uvvtu0Wq14uPjI8nJyc3GsmHDBgkODhaNRiOhoaGydevWG7bfdHVXyo2amhoZPny4uLu7i1qtFj8/P5k8eXKzD3bMjY7JUl4AMDvGt+f7CD+z3DyulhslJSUydOhQcXV1Fa1WK4GBgfL8889LVVWVWT/MjY7nySefFD8/P9FoNOLu7i6RkZFKUUKEx4xbkUpEpP3OzyAiIiIiIiIi+v84xwQRERERERERWQ0LE0RERERERERkNSxMEBEREREREZHVsDBBRERERERERFbDwgQRERERERERWQ0LE0RERERERERkNSxMEBEREREREZHVsDBBRERERERERFbDwgQRERHRNVKpVNi0aZO1h0FERNQhsDBBRER0myorK8MzzzyDgIAAaLVa+Pr6YtSoUcjIyLD20K5o0aJFuOuuu1odf/LkSWg0GvTq1avNtlVaWoqRI0dec3/XKicnBw899BA8PDxgZ2eHoKAgTJ48GT/99NMN3zYREVF7YWGCiIjoNnT8+HGEh4dj165dWLZsGfLz85Geno6//e1vSEhIuO5+RQQXLlxotry+vv7PDPdPWbNmDR5++GFUV1cjOzu7Tfr09PSEVqttk75a8sUXX2DgwIE4f/481q5di8OHD+Pjjz+Gs7MzFixYcEO3TURE1J5YmCAiIroNTZ06FSqVCt9++y1iYmIQHByM0NBQzJw5E9988w2Ai8ULlUqF3NxcZb3KykqoVCoYjUYAgNFohEqlwpdffonw8HBotVrs2bNHOdPgvffeg7+/P+zs7AAAJpMJSUlJ8Pf3h06nQ58+ffDZZ58p/Tf1l5GRgf79+8Pe3h6DBg1CUVERgItFhsWLFyMvLw8qlQoqlQpr1qxpcT9FBCkpKXjsscfwyCOP4P33328Wc/LkScTGxsLV1RUODg7o378/srOzr7itSy/lGDRoEObMmWPW52+//Qa1Wo3du3cDAM6fP4/Zs2fDx8cHDg4OiIiIUH6HltTU1OCJJ57Afffdhy1btiAqKgr+/v6IiIjA8uXLsWrVKgBAY2MjJk2apPw+Q0JC8Oabb5r1ZTQaMWDAADg4OMDFxQWDBw/GiRMnlPbNmzejX79+sLOzQ0BAABYvXqwUl0QEixYtQrdu3aDVauHt7Y3p06e3OG4iIqLr0cnaAyAiIqL2VVFRgfT0dLzyyitwcHBo1u7i4nLNfc6dOxfLly9HQEAAOnfuDKPRiKNHj+Lzzz/Hxo0bYWtrCwBISkrCxx9/jP/85z8ICgrC7t278eijj8Ld3R3Dhg1T+nvppZewYsUKuLu7Y8qUKXjyySexd+9ejB8/HgUFBUhPT8fOnTsBAM7Ozi2OKzMzEzU1NYiKioKPjw8GDRqE119/Xdnvc+fOYdiwYfDx8cGWLVvg6emJ77//HiaTqdXbiouLw9KlS5GcnAyVSgUA+OSTT+Dt7Y0hQ4YAAKZNm4bCwkKkpqbC29sbaWlpGDFiBPLz8xEUFNSsz+3bt+P333/HCy+8YHG/ml4jk8mErl274tNPP8Udd9yBffv2IT4+Hl5eXnj44Ydx4cIFjB07FpMnT8b69etRX1+Pb7/9Vhnn119/jccffxxvvfUWhgwZgmPHjiE+Ph4AkJiYiM8//xyvv/46UlNTERoairKyMuTl5bX4+yYiIrouQkRERLeV7OxsASAbN268YlxxcbEAkJycHGXZmTNnBIBkZmaKiEhmZqYAkE2bNpmtm5iYKGq1WsrLy5VldXV1Ym9vL/v27TOLnTRpksTGxpr1t3PnTqV969atAkBqa2uVvvv06dOqfX3kkUdkxowZyvM+ffpISkqK8nzVqlXi5OQkp0+ftrh+S9sCIGlpaSIiUl5eLp06dZLdu3cr7QaDQebMmSMiIidOnBBbW1v55ZdfzPqIjIyUefPmWdzua6+9JgCkoqKiVft5qYSEBImJiRERkdOnTwsAMRqNFmMjIyPl1VdfNVv20UcfiZeXl4iIrFixQoKDg6W+vv6ax0FERNRavJSDiIjoNiMibd5n//79my3z8/ODu7u78vzo0aOoqanB3//+dzg6OiqPDz/8EMeOHTNbNywsTPnZy8sLAFBeXn5NY6qsrMTGjRvx6KOPKsseffRRs8s5cnNz0bdvX7i6ul5T35dyd3fH8OHDsXbtWgBAcXEx9u/fj7i4OABAfn4+GhsbERwcbLbfWVlZzfa7ybW8RitXrkR4eDjc3d3h6OiI1atXo6SkBADg6uqKiRMnIjo6GqNGjcKbb76J0tJSZd28vDwsWbLEbFyTJ09GaWkpampq8NBDD6G2thYBAQGYPHky0tLSLM4hQkRE9GfwUg4iIqLbTFBQEFQqFX788ccrxtnYXPz/xaVfkhsaGizGWrok5PJl586dAwBs3boVPj4+Zm2XTySpVquVn5suOzCZTFcc7+XWrVuHuro6REREKMtEBCaTCT/99BOCg4Oh0+muqc+WxMXFYfr06Xj77bexbt069O7dG7179wZwcb9tbW1x8OBB5ZKWJo6Ojhb7Cw4OBgD8+OOPMBgMLW43NTUVs2fPxooVK2AwGODk5IRly5aZTfKZkpKC6dOnIz09HZ988gnmz5+PHTt2YODAgTh37hwWL16McePGNevbzs4Ovr6+KCoqws6dO7Fjxw5MnToVy5YtQ1ZWltlrRERE9GfwjAkiIqLbjKurK6Kjo7Fy5Ur88ccfzdorKysBQDnb4dL/sF86Eea16tmzJ7RaLUpKShAYGGj28PX1bXU/Go0GjY2NV417//33MWvWLOTm5iqPvLw8DBkyBB988AGAi2dm5ObmoqKi4k9ta8yYMairq0N6ejrWrVunnC0BAH379kVjYyPKy8ub7benp6fF/oYPHw43NzcsXbrUYnvTa7R3714MGjQIU6dORd++fREYGGjxLIy+ffti3rx52LdvH3r16oV169YBAPr164eioqJm4woMDFQKUzqdDqNGjcJbb70Fo9GI/fv3Iz8//6q/EyIiotbiGRNERES3oZUrV2Lw4MEYMGAAlixZgrCwMFy4cAE7duzAu+++i8OHD0On02HgwIFITk6Gv78/ysvLMX/+/OveppOTE2bPno3nnnsOJpMJd999N6qqqrB3717o9XpMmDChVf10794dxcXFyM3NRdeuXeHk5NTsjIvc3Fx8//33WLt2LXr06GHWFhsbiyVLluDll19GbGwsXn31VYwdOxZJSUnw8vJCTk4OvL29YTAYWrUt4OLZIWPHjsWCBQtw+PBhxMbGKm3BwcGIi4vD448/jhUrVqBv37747bffkJGRgbCwMNx///0W+3vvvffw0EMPYfTo0Zg+fToCAwPx+++/Y8OGDSgpKUFqaiqCgoLw4YcfYvv27fD398dHH32EAwcOwN/fH8DFy0pWr16N0aNHw9vbG0VFRThy5Agef/xxAMDChQvxwAMPoFu3bvjHP/4BGxsb5OXloaCgAC+//DLWrFmDxsZGREREwN7eHh9//DF0Oh38/Pxa9VoRERG1inWnuCAiIiJr+fXXXyUhIUH8/PxEo9GIj4+PjB49WpnYUkSksLBQDAaD6HQ6ueuuu+Srr76yOPnlmTNnzPpuadJIk8kkb7zxhoSEhIharRZ3d3eJjo6WrKysFvvLyckRAFJcXCwiFyfRjImJERcXFwFgNpllk2nTpknPnj0t7ndpaanY2NjI5s2bRUTk+PHjEhMTI3q9Xuzt7aV///6SnZ19xW3hkskvm2zbtk0AyNChQ5tts76+XhYuXCjdu3cXtVotXl5e8uCDD8qhQ4csjrHJgQMHZNy4ceLu7i5arVYCAwMlPj5ejhw5ooxv4sSJ4uzsLC4uLvL000/L3Llzld99WVmZjB07Vry8vESj0Yifn58sXLhQGhsblW2kp6fLoEGDRKfTiV6vlwEDBsjq1atFRCQtLU0iIiJEr9eLg4ODDBw40GxiUiIioragErkBM2AREREREREREbUC55ggIiIiIiIiIqthYYKIiIiIiIiIrIaFCSIiIiIiIiKyGhYmiIiIiIiIiMhqWJggIiIiIiIiIqthYYKIiIiIiIiIrIaFCSIiIiIiIiKyGhYmiIiIiIiIiMhqWJggIiIiIiIiIqthYYKIiIiIiIiIrIaFCSIiIiIiIiKymv8HzDwDmRMWY7MAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from matplotlib.pyplot import figure\n", + "\n", + "figure(num=None, figsize=(12, 8), dpi=100, facecolor='w', edgecolor='k')\n", + "\n", + "plt.scatter(df_final.Confirmed, df_final.Date, s=df_final.Recovered, alpha = 0.25)\n", + "\n", + "[plt.text( x=row['Confirmed'], y=row['Date'], s=row['Country/Region']) for k,row in df_final.iterrows()]\n", + "\n", + "plt.xlabel('Current Active Cases')\n", + "plt.ylabel('First Date Confirmed')\n", + "\n", + "axes = plt.gca()\n", + "axes.set_ylim(['2020-01-20','2020-03-10'])\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #7: Plot Data with continent colors" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "continent_colors = {'Europe':'red',\n", + " 'Africa':'green',\n", + " 'Americas':'blue',\n", + " 'Asia':'cyan',\n", + " 'Australia and New Zealand':'purple'}" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from matplotlib.pyplot import figure\n", + "\n", + "figure(num=None, figsize=(12, 8), dpi=100, facecolor='w', edgecolor='k')\n", + "\n", + "plt.xlabel('Current Active Cases')\n", + "plt.ylabel('First Date Confirmed')\n", + "\n", + "for i,j in df_final.iterrows():\n", + " reg_color = continent_colors.get(j['continent'], 'black')\n", + " plt.scatter(df_final['Confirmed'][i], df_final['Date'][i], s=200, alpha = 0.25, color=reg_color)\n", + "\n", + " \n", + "[plt.text( x=row['Confirmed'], y=row['Date'], s=row['Country/Region']) for k,row in df_final.iterrows()] \n", + "axes = plt.gca()\n", + "axes.set_ylim(['2020-01-20','2020-03-10'])\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/test.py b/test.py index 2a105ea..9515637 100644 --- a/test.py +++ b/test.py @@ -1,5 +1,5 @@ import urllib.parse -f = '20. Pandas - value_counts - multiple columns, all columns and bad data' +f = '25 Pandas Create A Matplotlib Scatterplot From A Dataframe ' ff = urllib.parse.quote_plus(f) print(ff.replace('+', '_')) \ No newline at end of file From 411818bfe454bc673b0ede760524a3e1bfd2fe8f Mon Sep 17 00:00:00 2001 From: softhints Date: Thu, 9 Apr 2020 12:53:27 +0300 Subject: [PATCH 59/76] 26.pandas-display-all-columns-and-show-more-rows --- ...splay-all-columns-and-show-more-rows.ipynb | 2177 +++++++++++++++++ 1 file changed, 2177 insertions(+) create mode 100644 notebooks/pandas/26.pandas-display-all-columns-and-show-more-rows.ipynb diff --git a/notebooks/pandas/26.pandas-display-all-columns-and-show-more-rows.ipynb b/notebooks/pandas/26.pandas-display-all-columns-and-show-more-rows.ipynb new file mode 100644 index 0000000..b495c54 --- /dev/null +++ b/notebooks/pandas/26.pandas-display-all-columns-and-show-more-rows.ipynb @@ -0,0 +1,2177 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 26. Pandas Display All Columns and Show More Rows" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5043, 28)" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
2ColorSam Mendes602.0148.00.0161.0Rory Kinnear11000.0200074175.0Action|Adventure|Thriller...994.0EnglishUKPG-13245000000.02015.0393.06.82.3585000
3ColorChristopher Nolan813.0164.022000.023000.0Christian Bale27000.0448130642.0Action|Thriller...2701.0EnglishUSAPG-13250000000.02012.023000.08.52.35164000
4NaNDoug WalkerNaNNaN131.0NaNRob Walker131.0NaNDocumentary...NaNNaNNaNNaNNaNNaN12.07.1NaN0
..................................................................
5038ColorScott Smith1.087.02.0318.0Daphne Zuniga637.0NaNComedy|Drama...6.0EnglishCanadaNaNNaN2013.0470.07.7NaN84
5039ColorNaN43.043.0NaN319.0Valorie Curry841.0NaNCrime|Drama|Mystery|Thriller...359.0EnglishUSATV-14NaNNaN593.07.516.0032000
5040ColorBenjamin Roberds13.076.00.00.0Maxwell Moody0.0NaNDrama|Horror|Thriller...3.0EnglishUSANaN1400.02013.00.06.3NaN16
5041ColorDaniel Hsia14.0100.00.0489.0Daniel Henney946.010443.0Comedy|Drama|Romance...9.0EnglishUSAPG-13NaN2012.0719.06.32.35660
5042ColorJon Gunn43.090.016.016.0Brian Herzlinger86.085222.0Documentary...84.0EnglishUSAPG1100.02004.023.06.61.85456
\n", + "

5043 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "2 Color Sam Mendes 602.0 148.0 \n", + "3 Color Christopher Nolan 813.0 164.0 \n", + "4 NaN Doug Walker NaN NaN \n", + "... ... ... ... ... \n", + "5038 Color Scott Smith 1.0 87.0 \n", + "5039 Color NaN 43.0 43.0 \n", + "5040 Color Benjamin Roberds 13.0 76.0 \n", + "5041 Color Daniel Hsia 14.0 100.0 \n", + "5042 Color Jon Gunn 43.0 90.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "2 0.0 161.0 Rory Kinnear \n", + "3 22000.0 23000.0 Christian Bale \n", + "4 131.0 NaN Rob Walker \n", + "... ... ... ... \n", + "5038 2.0 318.0 Daphne Zuniga \n", + "5039 NaN 319.0 Valorie Curry \n", + "5040 0.0 0.0 Maxwell Moody \n", + "5041 0.0 489.0 Daniel Henney \n", + "5042 16.0 16.0 Brian Herzlinger \n", + "\n", + " actor_1_facebook_likes gross genres \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy \n", + "2 11000.0 200074175.0 Action|Adventure|Thriller \n", + "3 27000.0 448130642.0 Action|Thriller \n", + "4 131.0 NaN Documentary \n", + "... ... ... ... \n", + "5038 637.0 NaN Comedy|Drama \n", + "5039 841.0 NaN Crime|Drama|Mystery|Thriller \n", + "5040 0.0 NaN Drama|Horror|Thriller \n", + "5041 946.0 10443.0 Comedy|Drama|Romance \n", + "5042 86.0 85222.0 Documentary \n", + "\n", + " ... num_user_for_reviews language country content_rating budget \\\n", + "0 ... 3054.0 English USA PG-13 237000000.0 \n", + "1 ... 1238.0 English USA PG-13 300000000.0 \n", + "2 ... 994.0 English UK PG-13 245000000.0 \n", + "3 ... 2701.0 English USA PG-13 250000000.0 \n", + "4 ... NaN NaN NaN NaN NaN \n", + "... ... ... ... ... ... ... \n", + "5038 ... 6.0 English Canada NaN NaN \n", + "5039 ... 359.0 English USA TV-14 NaN \n", + "5040 ... 3.0 English USA NaN 1400.0 \n", + "5041 ... 9.0 English USA PG-13 NaN \n", + "5042 ... 84.0 English USA PG 1100.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "0 2009.0 936.0 7.9 1.78 \n", + "1 2007.0 5000.0 7.1 2.35 \n", + "2 2015.0 393.0 6.8 2.35 \n", + "3 2012.0 23000.0 8.5 2.35 \n", + "4 NaN 12.0 7.1 NaN \n", + "... ... ... ... ... \n", + "5038 2013.0 470.0 7.7 NaN \n", + "5039 NaN 593.0 7.5 16.00 \n", + "5040 2013.0 0.0 6.3 NaN \n", + "5041 2012.0 719.0 6.3 2.35 \n", + "5042 2004.0 23.0 6.6 1.85 \n", + "\n", + " movie_facebook_likes \n", + "0 33000 \n", + "1 0 \n", + "2 85000 \n", + "3 164000 \n", + "4 0 \n", + "... ... \n", + "5038 84 \n", + "5039 32000 \n", + "5040 16 \n", + "5041 660 \n", + "5042 456 \n", + "\n", + "[5043 rows x 28 columns]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
actor_1_facebook_likesgrossgenresactor_1_name
5640.073058679.0Action|Adventure|Sci-FiDaryl Sabara
624000.0336530303.0Action|Adventure|RomanceJ.K. Simmons
7799.0200807262.0Adventure|Animation|Comedy|Family|Fantasy|Musi...Brad Garrett
826000.0458991599.0Action|Adventure|Sci-FiChris Hemsworth
925000.0301956980.0Adventure|Family|Fantasy|MysteryAlan Rickman
\n", + "
" + ], + "text/plain": [ + " actor_1_facebook_likes gross \\\n", + "5 640.0 73058679.0 \n", + "6 24000.0 336530303.0 \n", + "7 799.0 200807262.0 \n", + "8 26000.0 458991599.0 \n", + "9 25000.0 301956980.0 \n", + "\n", + " genres actor_1_name \n", + "5 Action|Adventure|Sci-Fi Daryl Sabara \n", + "6 Action|Adventure|Romance J.K. Simmons \n", + "7 Adventure|Animation|Comedy|Family|Fantasy|Musi... Brad Garrett \n", + "8 Action|Adventure|Sci-Fi Chris Hemsworth \n", + "9 Adventure|Family|Fantasy|Mystery Alan Rickman " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.iloc[5:10,7:11]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #1: Display all columns and rows with Pandas options" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "pd.set_option('display.max_rows', None)\n", + "pd.set_option('display.max_columns', None)\n", + "pd.set_option('display.width', None)\n", + "pd.set_option('display.max_colwidth', None)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.\n", + " \"\"\"Entry point for launching an IPython kernel.\n" + ] + } + ], + "source": [ + "pd.set_option('display.max_colwidth', -1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #2: Display more or all rows " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "pd.reset_option('display.max_rows')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
genres
Drama236
Comedy209
Comedy|Drama191
Comedy|Drama|Romance187
Comedy|Romance158
......
Adventure|Animation|Comedy|Fantasy|Music|Romance1
Family|Fantasy|Music1
Action|Adventure|Drama|History|Romance|War1
Biography|Comedy|Crime|Drama|Romance1
Adventure|Comedy|Musical|Romance1
\n", + "

914 rows × 1 columns

\n", + "
" + ], + "text/plain": [ + " genres\n", + "Drama 236 \n", + "Comedy 209 \n", + "Comedy|Drama 191 \n", + "Comedy|Drama|Romance 187 \n", + "Comedy|Romance 158 \n", + "... ... \n", + "Adventure|Animation|Comedy|Fantasy|Music|Romance 1 \n", + "Family|Fantasy|Music 1 \n", + "Action|Adventure|Drama|History|Romance|War 1 \n", + "Biography|Comedy|Crime|Drama|Romance 1 \n", + "Adventure|Comedy|Musical|Romance 1 \n", + "\n", + "[914 rows x 1 columns]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.genres.value_counts(dropna=False).to_frame()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "pd.set_option('display.max_rows', 100)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "pd.set_option('display.max_rows', None)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Drama 236\n", + "Comedy 209\n", + "Comedy|Drama 191\n", + "Comedy|Drama|Romance 187\n", + "Comedy|Romance 158\n", + "Drama|Romance 152\n", + "Crime|Drama|Thriller 101\n", + "Horror 71 \n", + "Action|Crime|Drama|Thriller 68 \n", + "Action|Crime|Thriller 65 \n", + "Drama|Thriller 64 \n", + "Crime|Drama 63 \n", + "Horror|Thriller 56 \n", + "Crime|Drama|Mystery|Thriller 55 \n", + "Action|Adventure|Sci-Fi 51 \n", + "Comedy|Crime 51 \n", + "Documentary 51 \n", + "Action|Adventure|Thriller 46 \n", + "Drama|Mystery|Thriller 37 \n", + "Biography|Drama 35 \n", + "Action|Adventure|Sci-Fi|Thriller 35 \n", + "Horror|Mystery|Thriller 35 \n", + "Action|Comedy|Crime 30 \n", + "Action|Thriller 30 \n", + "Action|Adventure|Fantasy 30 \n", + "Horror|Mystery 29 \n", + "Adventure|Animation|Comedy|Family|Fantasy 29 \n", + "Drama|Music 27 \n", + "Drama|Sport 26 \n", + "Comedy|Family 26 \n", + "Biography|Drama|History 26 \n", + "Biography|Drama|Sport 26 \n", + "Adventure|Animation|Comedy|Family 24 \n", + "Comedy|Crime|Drama 24 \n", + "Drama|War 23 \n", + "Action|Comedy|Crime|Thriller 23 \n", + "Action|Sci-Fi 23 \n", + "Action|Drama|Thriller 22 \n", + "Mystery|Thriller 22 \n", + "Action|Crime|Drama|Mystery|Thriller 22 \n", + "Drama|History|War 21 \n", + "Drama|Horror|Mystery|Thriller 21 \n", + "Comedy|Family|Fantasy 20 \n", + "Adventure|Family|Fantasy 20 \n", + "Thriller 20 \n", + "Drama|Music|Romance 19 \n", + "Crime|Thriller 18 \n", + "Horror|Sci-Fi|Thriller 18 \n", + "Comedy|Drama|Music 18 \n", + "Comedy|Horror 18 \n", + "Fantasy|Horror 18 \n", + "Drama|Family 18 \n", + "Biography|Drama|Romance 17 \n", + "Comedy|Music 17 \n", + "Action|Sci-Fi|Thriller 16 \n", + "Crime|Drama|Romance|Thriller 16 \n", + "Comedy|Sport 16 \n", + "Biography|Crime|Drama 16 \n", + "Comedy|Fantasy 16 \n", + "Crime|Mystery|Thriller 15 \n", + "Comedy|Drama|Family 15 \n", + "Action|Comedy 15 \n", + "Comedy|Crime|Thriller 15 \n", + "Action|Horror|Sci-Fi|Thriller 15 \n", + "Drama|Sci-Fi|Thriller 15 \n", + "Drama|Fantasy|Romance 14 \n", + "Comedy|Drama|Romance|Sport 14 \n", + "Drama|Romance|War 14 \n", + "Adventure|Comedy 14 \n", + "Action|Adventure|Fantasy|Sci-Fi 13 \n", + "Crime|Drama|Romance 13 \n", + "Comedy|Fantasy|Romance 13 \n", + "Comedy|Family|Romance 13 \n", + "Drama|Horror|Thriller 13 \n", + "Adventure|Drama 13 \n", + "Action|Adventure|Drama|Thriller 13 \n", + "Drama|Mystery|Sci-Fi|Thriller 13 \n", + "Biography|Drama|Music 13 \n", + "Drama|Mystery|Romance|Thriller 12 \n", + "Action|Adventure|Comedy 12 \n", + "Action|Adventure|Fantasy|Sci-Fi|Thriller 12 \n", + "Adventure|Comedy|Family|Fantasy 12 \n", + "Adventure|Animation|Family|Fantasy 12 \n", + "Western 12 \n", + "Action|Crime|Sci-Fi|Thriller 11 \n", + "Adventure|Comedy|Sci-Fi 11 \n", + "Biography|Drama|History|Romance 11 \n", + "Drama|Horror|Sci-Fi|Thriller 11 \n", + "Comedy|Fantasy|Horror 11 \n", + "Action|Adventure 11 \n", + "Action 11 \n", + "Comedy|Crime|Romance 11 \n", + "Comedy|Drama|Fantasy|Romance 11 \n", + "Documentary|Music 10 \n", + "Action|Drama 10 \n", + "Drama|Mystery 10 \n", + "Action|Mystery|Thriller 10 \n", + "Drama|History 10 \n", + "Action|Horror|Thriller 10 \n", + "Drama|Sci-Fi 10 \n", + "Action|Drama|War 10 \n", + "Crime|Drama|Mystery 10 \n", + "Drama|Romance|Sci-Fi 10 \n", + "Drama|Fantasy 10 \n", + "Fantasy|Horror|Thriller 10 \n", + "Sci-Fi|Thriller 10 \n", + "Action|Horror|Sci-Fi 9 \n", + "Animation|Comedy|Family 9 \n", + "Comedy|Music|Romance 9 \n", + "Horror|Sci-Fi 9 \n", + "Drama|Musical|Romance 9 \n", + "Action|Adventure|Crime|Thriller 9 \n", + "Action|Drama|Sci-Fi|Thriller 9 \n", + "Fantasy|Horror|Mystery|Thriller 9 \n", + "Comedy|Sci-Fi 9 \n", + "Adventure|Fantasy 9 \n", + "Comedy|Drama|Music|Romance 9 \n", + "Drama|History|Thriller 9 \n", + "Comedy|Horror|Sci-Fi 8 \n", + "Action|Comedy|Sci-Fi 8 \n", + "Drama|Western 8 \n", + "Adventure|Drama|Romance 8 \n", + "Comedy|Crime|Drama|Thriller 8 \n", + "Action|Drama|History|War 8 \n", + "Adventure|Comedy|Drama 8 \n", + "Action|Adventure|Comedy|Family|Sci-Fi 8 \n", + "Adventure|Comedy|Family 8 \n", + "Drama|Romance|Western 8 \n", + "Comedy|Fantasy|Horror|Thriller 8 \n", + "Action|Adventure|Drama 7 \n", + "Biography|Drama|Thriller 7 \n", + "Action|Adventure|Drama|Romance 7 \n", + "Adventure|Drama|Thriller 7 \n", + "Adventure|Sci-Fi|Thriller 7 \n", + "Mystery|Sci-Fi|Thriller 7 \n", + "Action|Adventure|Drama|History|War 7 \n", + "Comedy|Documentary 7 \n", + "Comedy|Musical|Romance 7 \n", + "Adventure|Animation|Comedy|Family|Sci-Fi 7 \n", + "Action|Drama|Thriller|War 7 \n", + "Action|Crime|Mystery|Thriller 7 \n", + "Comedy|Romance|Sport 7 \n", + "Adventure|Drama|History 7 \n", + "Action|Comedy|Crime|Drama|Thriller 7 \n", + "Action|Adventure|Animation|Comedy|Family 6 \n", + "Action|Adventure|Drama|Fantasy 6 \n", + "Comedy|Drama|Sport 6 \n", + "Biography|Drama|History|War 6 \n", + "Action|Comedy|Romance 6 \n", + "Crime|Horror|Thriller 6 \n", + "Action|Crime|Drama|Romance|Thriller 6 \n", + "Animation|Comedy|Family|Fantasy 6 \n", + "Comedy|Horror|Thriller 6 \n", + "Action|Adventure|Fantasy|Thriller 6 \n", + "Drama|Romance|Thriller 6 \n", + "Comedy|Drama|Family|Romance 6 \n", + "Action|Adventure|Western 6 \n", + "Action|Mystery|Sci-Fi|Thriller 6 \n", + "Biography|Crime|Drama|Thriller 6 \n", + "Action|Adventure|Comedy|Crime 6 \n", + "Comedy|Crime|Mystery 6 \n", + "Drama|History|Romance|War 6 \n", + "Drama|Fantasy|Horror|Thriller 6 \n", + "Horror|Mystery|Sci-Fi|Thriller 6 \n", + "Drama|Romance|Sport 6 \n", + "Crime|Drama|Horror|Thriller 5 \n", + "Action|Horror 5 \n", + "Comedy|Drama|Musical|Romance 5 \n", + "Adventure|Drama|Family|Fantasy 5 \n", + "Action|Adventure|Comedy|Thriller 5 \n", + "Comedy|Family|Sci-Fi 5 \n", + "Action|Adventure|Comedy|Sci-Fi 5 \n", + "Documentary|Sport 5 \n", + "Comedy|Romance|Sci-Fi 5 \n", + "Action|Adventure|Drama|Sci-Fi 5 \n", + "Crime|Horror|Mystery|Thriller 5 \n", + "Biography|Comedy|Drama 5 \n", + "Drama|Horror 5 \n", + "Action|Adventure|Family|Fantasy 5 \n", + "Biography|Drama|Music|Musical 5 \n", + "Action|Adventure|Romance 5 \n", + "Adventure|Drama|Romance|War 5 \n", + "Family 5 \n", + "Adventure|Animation|Family 5 \n", + "Comedy|Musical 5 \n", + "Drama|History|Thriller|War 5 \n", + "Action|Fantasy|Horror|Thriller 5 \n", + "Comedy|Drama|Family|Sport 5 \n", + "Comedy|Family|Fantasy|Romance 5 \n", + "Adventure|Family 5 \n", + "Adventure|Horror|Thriller 5 \n", + "Action|Crime|Drama 5 \n", + "Drama|Mystery|Sci-Fi 5 \n", + "Action|Drama|Sport 5 \n", + "Adventure|Family|Fantasy|Mystery 5 \n", + "Comedy|Drama|Fantasy 4 \n", + "Action|Adventure|Romance|Sci-Fi 4 \n", + "Drama|Family|Fantasy|Romance 4 \n", + "Comedy|Western 4 \n", + "Comedy|Drama|War 4 \n", + "Action|Comedy|Horror 4 \n", + "Adventure|Biography|Drama|History|War 4 \n", + "Documentary|War 4 \n", + "Adventure|Mystery|Sci-Fi 4 \n", + "Drama|Mystery|Romance 4 \n", + "Action|Adventure|Animation|Comedy|Family|Fantasy 4 \n", + "Romance 4 \n", + "Drama|Musical 4 \n", + "Comedy|Drama|Family|Music|Musical|Romance 4 \n", + "Drama|Family|Romance 4 \n", + "Action|Adventure|Romance|Sci-Fi|Thriller 4 \n", + "Action|Comedy|Fantasy|Sci-Fi 4 \n", + "Biography|Drama|War 4 \n", + "Adventure|Drama|Fantasy|Romance 4 \n", + "Drama|Fantasy|Horror 4 \n", + "Drama|Family|Sport 4 \n", + "Action|Crime 4 \n", + "Adventure|Drama|Family 4 \n", + "Adventure|Drama|Western 4 \n", + "Action|Fantasy|Thriller 4 \n", + "Action|Fantasy|Horror 4 \n", + "Adventure|Drama|Sci-Fi|Thriller 4 \n", + "Drama|History|Sport 4 \n", + "Action|Comedy|Thriller 4 \n", + "Action|Adventure|History 4 \n", + "Comedy|Drama|Sci-Fi 4 \n", + "Action|Adventure|Comedy|Family|Fantasy|Sci-Fi 4 \n", + "Biography|Drama|Music|Romance 4 \n", + "Adventure|Drama|Sci-Fi 4 \n", + "Comedy|Crime|Drama|Romance 4 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi 4 \n", + "Biography|Documentary|Music 4 \n", + "Adventure|Comedy|Drama|Romance 4 \n", + "Drama|Horror|Sci-Fi 4 \n", + "Action|Adventure|Horror|Sci-Fi 4 \n", + "Adventure|Biography|Drama 3 \n", + "Adventure|Animation|Comedy|Family|Sport 3 \n", + "Crime|Drama|Horror|Mystery|Thriller 3 \n", + "Action|Adventure|Mystery|Sci-Fi|Thriller 3 \n", + "Action|Comedy|Fantasy 3 \n", + "Adventure 3 \n", + "Adventure|Comedy|Family|Romance 3 \n", + "Action|Crime|Fantasy|Thriller 3 \n", + "Drama|Family|Music 3 \n", + "Action|Crime|Romance|Thriller 3 \n", + "Musical|Romance 3 \n", + "Drama|Fantasy|Horror|Mystery|Thriller 3 \n", + "Drama|Music|Musical|Romance 3 \n", + "Action|Biography|Crime|Drama 3 \n", + "Biography|Comedy|Crime|Drama 3 \n", + "Drama|Fantasy|Thriller 3 \n", + "Comedy|Crime|Romance|Thriller 3 \n", + "Action|Animation|Comedy|Family|Sci-Fi 3 \n", + "Adventure|Animation|Comedy|Family|Romance 3 \n", + "Adventure|Family|Fantasy|Musical 3 \n", + "Adventure|Comedy|Fantasy 3 \n", + "Adventure|Animation|Comedy|Family|Musical 3 \n", + "Action|Adventure|Fantasy|Romance 3 \n", + "Action|Crime|Drama|Thriller|Western 3 \n", + "Fantasy|Horror|Mystery 3 \n", + "Action|Comedy|Sport 3 \n", + "Adventure|Comedy|Drama|Family|Fantasy 3 \n", + "Action|Adventure|Comedy|Fantasy 3 \n", + "Drama|Family|Musical|Romance 3 \n", + "Action|Biography|Drama|Sport 3 \n", + "Biography|Comedy|Drama|History 3 \n", + "Adventure|Drama|Fantasy 3 \n", + "Sci-Fi 3 \n", + "Horror|Mystery|Sci-Fi 3 \n", + "Comedy|Fantasy|Sci-Fi 3 \n", + "Drama|Horror|Mystery 3 \n", + "Drama|Fantasy|Mystery|Thriller 3 \n", + "Drama|Family|Fantasy 3 \n", + "Drama|Fantasy|Romance|Sci-Fi 3 \n", + "Comedy|Crime|Music 3 \n", + "Action|Adventure|Drama|History|Romance 3 \n", + "Comedy|Drama|Family|Fantasy 3 \n", + "Biography|Drama|Romance|Sport 3 \n", + "Documentary|Drama 3 \n", + "Adventure|Biography|Drama|Thriller 3 \n", + "Fantasy|Romance 3 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Musical 3 \n", + "Action|Romance|Thriller 3 \n", + "Comedy|War 3 \n", + "Action|Drama|Romance 3 \n", + "Comedy|Crime|Drama|Romance|Thriller 3 \n", + "Action|Adventure|Comedy|Romance|Sci-Fi 3 \n", + "Action|Adventure|Animation|Comedy|Family|Sci-Fi 3 \n", + "Action|Adventure|Horror|Sci-Fi|Thriller 3 \n", + "Comedy|Crime|Drama|Mystery|Romance 3 \n", + "Drama|Thriller|War 3 \n", + "Action|Comedy|Crime|Romance|Thriller 3 \n", + "Comedy|Mystery|Romance 2 \n", + "Animation|Comedy 2 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Romance 2 \n", + "Drama|Romance|Sci-Fi|Thriller 2 \n", + "Adventure|Comedy|Family|Fantasy|Sci-Fi 2 \n", + "Adventure|Animation|Comedy|Drama|Family|Musical 2 \n", + "Comedy|Horror|Musical 2 \n", + "Action|Adventure|Comedy|Western 2 \n", + "Comedy|Mystery 2 \n", + "Action|Comedy|Family 2 \n", + "Action|Drama|Western 2 \n", + "Action|Comedy|War 2 \n", + "Biography|Drama|History|Sport 2 \n", + "Action|Adventure|Crime|Mystery|Thriller 2 \n", + "Biography|Comedy|Drama|Sport 2 \n", + "Action|Drama|History|Romance|War 2 \n", + "Crime|Drama|Western 2 \n", + "Adventure|Comedy|Family|Sport 2 \n", + "Adventure|Mystery|Thriller 2 \n", + "Adventure|Comedy|Drama|Fantasy 2 \n", + "Comedy|Drama|Family|Music|Romance 2 \n", + "Adventure|Comedy|Mystery 2 \n", + "Animation 2 \n", + "Comedy|Horror|Mystery 2 \n", + "Biography|Drama|Romance|War 2 \n", + "Action|Comedy|Horror|Sci-Fi 2 \n", + "Adventure|Drama|Family|Fantasy|Sci-Fi 2 \n", + "Animation|Comedy|Family|Sci-Fi 2 \n", + "Comedy|Drama|Music|Musical 2 \n", + "Crime|Drama|Sport 2 \n", + "Adventure|Comedy|Romance 2 \n", + "Comedy|Drama|Romance|Sci-Fi 2 \n", + "Adventure|Fantasy|Mystery|Thriller 2 \n", + "Adventure|Family|Fantasy|Romance 2 \n", + "Animation|Comedy|Family|Fantasy|Music 2 \n", + "Biography|Drama|Sport|War 2 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Musical|Romance 2 \n", + "Action|Drama|Fantasy|War 2 \n", + "Adventure|Animation|Family|Sci-Fi 2 \n", + "Action|Adventure|Thriller|War 2 \n", + "Action|Comedy|Crime|Drama 2 \n", + "Comedy|Horror|Romance 2 \n", + "Drama|Horror|Romance|Thriller 2 \n", + "Animation|Family|Fantasy|Music 2 \n", + "Biography|Comedy|Drama|Romance 2 \n", + "Documentary|History|Music 2 \n", + "Drama|Horror|Mystery|Sci-Fi|Thriller 2 \n", + "Action|Adventure|Drama|Romance|War 2 \n", + "Comedy|Drama|Horror|Romance 2 \n", + "Biography|Comedy|Romance 2 \n", + "Action|Biography|Drama|History 2 \n", + "Adventure|Drama|Mystery 2 \n", + "Action|Adventure|Crime|Drama|Thriller 2 \n", + "Action|Adventure|Drama|Sci-Fi|Thriller 2 \n", + "Biography|Drama|Thriller|War 2 \n", + "Comedy|Family|Sport 2 \n", + "Fantasy 2 \n", + "Action|Drama|Fantasy|Romance 2 \n", + "Action|Adventure|Animation|Family|Fantasy|Sci-Fi 2 \n", + "Adventure|Comedy|Fantasy|Sci-Fi 2 \n", + "Action|Adventure|Animation|Family|Sci-Fi 2 \n", + "Animation|Comedy|Family|Mystery|Sci-Fi 2 \n", + "Action|Comedy|Crime|Romance 2 \n", + "Adventure|Animation|Fantasy 2 \n", + "Comedy|Drama|Thriller 2 \n", + "Action|Drama|Sci-Fi 2 \n", + "Action|Fantasy|Horror|Sci-Fi|Thriller 2 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Music 2 \n", + "Fantasy|Horror|Sci-Fi 2 \n", + "Action|Comedy|Crime|Fantasy 2 \n", + "Animation|Family 2 \n", + "Action|Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi 2 \n", + "Biography|Drama|History|Thriller 2 \n", + "Action|Drama|Fantasy|Mystery|Thriller 2 \n", + "Comedy|Drama|Family|Fantasy|Romance 2 \n", + "Animation|Comedy|Drama 2 \n", + "Action|Comedy|Documentary 2 \n", + "Action|Adventure|Drama|History 2 \n", + "Crime|Drama|Music 2 \n", + "Adventure|Drama|War 2 \n", + "Action|Comedy|Romance|Thriller 2 \n", + "Comedy|Fantasy|Horror|Romance 2 \n", + "Biography|Crime|Drama|Romance 2 \n", + "Crime|Romance|Thriller 2 \n", + "Adventure|Animation|Comedy 2 \n", + "Action|Adventure|Animation|Comedy|Drama|Family|Sci-Fi 2 \n", + "Action|Fantasy 2 \n", + "Comedy|Romance|Thriller 2 \n", + "Crime|Documentary 2 \n", + "Action|Adventure|Family|Sci-Fi 2 \n", + "Adventure|Horror 2 \n", + "Comedy|Drama|Musical 2 \n", + "Action|Adventure|Comedy|Romance 2 \n", + "Action|Comedy|Family|Fantasy 2 \n", + "Action|Drama|Mystery|Sci-Fi 2 \n", + "Drama|Fantasy|Musical|Romance 2 \n", + "Comedy|Drama|Horror|Sci-Fi 2 \n", + "Action|Adventure|Mystery|Sci-Fi 2 \n", + "Action|Crime|Mystery|Romance|Thriller 2 \n", + "Comedy|Crime|Family 2 \n", + "Mystery|Romance|Thriller 2 \n", + "Drama|Fantasy|Horror|Mystery 2 \n", + "Action|Drama|Fantasy 2 \n", + "Crime|Documentary|War 2 \n", + "Action|Crime|Drama|Sci-Fi|Thriller 2 \n", + "Comedy|Drama|Romance|Thriller 2 \n", + "Documentary|History 2 \n", + "Animation|Family|Fantasy|Musical 2 \n", + "Action|Drama|Family|Sport 2 \n", + "Adventure|Comedy|Family|Musical 2 \n", + "Action|Drama|Horror|Thriller 2 \n", + "Biography|Crime|Drama|History 2 \n", + "Action|Adventure|Horror|Thriller 2 \n", + "Family|Sci-Fi 2 \n", + "Animation|Comedy|Family|Fantasy|Musical 2 \n", + "Action|Sci-Fi|Sport 2 \n", + "Action|Adventure|Drama|Horror|Sci-Fi 2 \n", + "Action|Adventure|Animation|Family|Fantasy 2 \n", + "Adventure|Animation|Comedy|Drama|Family 2 \n", + "Biography|Documentary 2 \n", + "Action|Comedy|Sci-Fi|Thriller 2 \n", + "Action|Crime|Sport|Thriller 2 \n", + "Action|Comedy|Drama|Thriller 2 \n", + "Drama|Mystery|Romance|War 2 \n", + "Drama|History|War|Western 2 \n", + "Drama|Romance|War|Western 2 \n", + "Adventure|Comedy|Family|Fantasy|Horror 2 \n", + "Adventure|Animation|Comedy|Drama|Family|Fantasy|Musical 1 \n", + "History 1 \n", + "Adventure|Animation|Comedy|Fantasy|Romance 1 \n", + "Animation|Comedy|Family|Musical 1 \n", + "Game-Show|Reality-TV|Romance 1 \n", + "Adventure|Comedy|Drama|Fantasy|Romance 1 \n", + "Adventure|Fantasy|Horror|Mystery|Thriller 1 \n", + "Comedy|Romance|Sci-Fi|Thriller 1 \n", + "Comedy|Horror|Mystery|Thriller 1 \n", + "Adventure|Comedy|History|Romance 1 \n", + "Biography|Comedy|Drama|Music 1 \n", + "Comedy|Drama|Music|Musical|Romance 1 \n", + "Action|Adventure|Animation|Comedy|Fantasy|Sci-Fi 1 \n", + "Adventure|Biography|Drama|Romance 1 \n", + "Adventure|Animation|Drama|Family|Musical 1 \n", + "Drama|Fantasy|Horror|Romance 1 \n", + "Biography|Crime|Drama|Western 1 \n", + "Adventure|Family|Fantasy|Horror|Mystery 1 \n", + "Comedy|Mystery|Sci-Fi|Thriller 1 \n", + "Adventure|Animation|Fantasy|Horror|Sci-Fi 1 \n", + "Comedy|Crime|Drama|Horror|Mystery|Thriller 1 \n", + "Action|Drama|Fantasy|Sci-Fi 1 \n", + "Action|Biography|Drama|History|War 1 \n", + "Comedy|Drama|Mystery|Romance|Thriller 1 \n", + "Drama|Mystery|Romance|Thriller|War 1 \n", + "Adventure|Animation|Family|Musical 1 \n", + "Action|Crime|Drama|Western 1 \n", + "Adventure|Drama|Thriller|Western 1 \n", + "Action|Animation|Comedy|Sci-Fi 1 \n", + "Adventure|Drama|Family|Romance|Western 1 \n", + "Romance|Short 1 \n", + "Adventure|Animation|Comedy|Crime|Family 1 \n", + "Adventure|Fantasy|Mystery 1 \n", + "Drama|Family|Music|Musical 1 \n", + "Romance|Sci-Fi|Thriller 1 \n", + "Drama|Music|Mystery|Romance 1 \n", + "Adventure|Drama|History|War 1 \n", + "Comedy|Fantasy|Thriller 1 \n", + "Adventure|Comedy|Family|Fantasy|Horror|Mystery 1 \n", + "Action|Drama|History|Thriller 1 \n", + "Animation|Comedy|Family|Horror|Sci-Fi 1 \n", + "Biography|Crime|Documentary|History 1 \n", + "Adventure|Animation|Drama|Family|History|Musical|Romance 1 \n", + "Thriller|Western 1 \n", + "Comedy|Drama|Family|Musical 1 \n", + "Comedy|Crime|Drama|Thriller|War 1 \n", + "Animation|Comedy|Family|Romance 1 \n", + "Comedy|Family|Fantasy|Musical 1 \n", + "Comedy|Documentary|Drama 1 \n", + "Adventure|Comedy|Crime|Family|Mystery 1 \n", + "Action|Drama|History|Thriller|War 1 \n", + "Comedy|Crime|Musical 1 \n", + "Animation|Drama|Family|Fantasy 1 \n", + "Action|Adventure|Drama|Thriller|Western 1 \n", + "Crime|Drama|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Drama|Romance|Thriller 1 \n", + "Action|Comedy|Drama|Family|Thriller 1 \n", + "Action|Adventure|Drama|Romance|Western 1 \n", + "Comedy|Drama|Romance|War 1 \n", + "Biography|Crime|Drama|Romance|Thriller 1 \n", + "Adventure|Comedy|Crime|Drama|Family 1 \n", + "Comedy|Crime|Family|Sci-Fi 1 \n", + "Drama|Mystery|War 1 \n", + "Action|Adventure|Biography|Drama|History 1 \n", + "Action|Adventure|Family|Thriller 1 \n", + "Drama|Music|Musical 1 \n", + "Comedy|Crime|Musical|Romance 1 \n", + "Crime|Drama|Fantasy|Romance 1 \n", + "Action|Adventure|Crime|Fantasy|Mystery|Thriller 1 \n", + "Drama|Fantasy|War 1 \n", + "Action|Animation|Fantasy|Horror|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Fantasy|War 1 \n", + "Comedy|Drama|History|Romance 1 \n", + "Action|Adventure|Romance|War 1 \n", + "Fantasy|Mystery|Romance|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Comedy|Family|Romance|Sci-Fi 1 \n", + "Comedy|History 1 \n", + "Adventure|Comedy|Family|Romance|Sci-Fi 1 \n", + "Adventure|Animation|Family|Fantasy|Musical 1 \n", + "Comedy|Crime|Sport 1 \n", + "Thriller|War 1 \n", + "Drama|Music|Romance|War 1 \n", + "Biography|Crime|Drama|History|Romance 1 \n", + "Comedy|Mystery|Thriller 1 \n", + "Biography|Crime|Drama|Music 1 \n", + "Action|Crime|Drama|Sport 1 \n", + "Drama|Fantasy|Romance|Thriller 1 \n", + "Drama|Film-Noir|Mystery|Thriller 1 \n", + "Action|Comedy|Drama 1 \n", + "Drama|War|Western 1 \n", + "Film-Noir|Mystery|Romance|Thriller 1 \n", + "Action|Horror|Mystery|Sci-Fi|Thriller 1 \n", + "Adventure|Crime|Drama|Romance 1 \n", + "Biography|Comedy|Drama|Music|Romance 1 \n", + "Drama|Music|Mystery|Romance|Sci-Fi 1 \n", + "Biography|Documentary|Sport 1 \n", + "Adventure|Animation|Comedy|Family|Western 1 \n", + "Action|Comedy|Crime|Music 1 \n", + "Action|Adventure|Comedy|Crime|Family|Romance|Thriller 1 \n", + "Action|Comedy|Drama|Music 1 \n", + "Animation|Biography|Documentary|Drama|History|War 1 \n", + "Fantasy|Horror|Romance|Thriller 1 \n", + "Action|Drama|Romance|Sci-Fi|Thriller 1 \n", + "Action|Comedy|Crime|Fantasy|Horror|Mystery|Sci-Fi|Thriller 1 \n", + "Comedy|Crime|Family|Mystery|Romance|Thriller 1 \n", + "Adventure|Comedy|Romance|Sci-Fi 1 \n", + "Comedy|Drama|Horror 1 \n", + "Crime|Drama|History|Romance 1 \n", + "Action|Crime|Drama|Thriller|War 1 \n", + "Action|Crime|Drama|History|Western 1 \n", + "Adventure|Biography|Drama|History|Sport|Thriller 1 \n", + "Comedy|Drama|Fantasy|Horror 1 \n", + "Adventure|Animation|Family|Sport 1 \n", + "Action|Adventure|Drama|Mystery 1 \n", + "Animation|Comedy|Fantasy 1 \n", + "Crime|Film-Noir|Thriller 1 \n", + "Documentary|Drama|War 1 \n", + "Adventure|Crime|Drama|Mystery|Western 1 \n", + "Animation|Comedy|Fantasy|Musical 1 \n", + "Action|Adventure|Comedy|Family|Fantasy|Mystery|Sci-Fi 1 \n", + "Biography|Comedy|Drama|History|Music|Musical 1 \n", + "Action|Adventure|Drama|Thriller|War 1 \n", + "Adventure|Comedy|Sport 1 \n", + "Biography|Drama|History|Music 1 \n", + "Comedy|Family|Music|Musical 1 \n", + "Animation|Comedy|Family|Music|Western 1 \n", + "Drama|Fantasy|Sci-Fi 1 \n", + "Action|Biography|Drama|History|Romance|Western 1 \n", + "Biography|Crime|Drama|History|Thriller 1 \n", + "Action|Adventure|Comedy|Music|Thriller 1 \n", + "Biography|Drama|Fantasy|History 1 \n", + "Animation|Family|Fantasy 1 \n", + "Drama|Fantasy|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Comedy|Family|Romance 1 \n", + "Action|Drama|Fantasy|Horror|War 1 \n", + "Comedy|Drama|Romance|Western 1 \n", + "Animation|Drama|Family|Fantasy|Musical|Romance 1 \n", + "Action|Fantasy|Romance|Sci-Fi 1 \n", + "Adventure|Drama|History|Romance 1 \n", + "Action|Biography|Drama 1 \n", + "Action|Adventure|Comedy|Drama|Thriller 1 \n", + "Comedy|Short 1 \n", + "Action|Adventure|Comedy|Crime|Mystery|Thriller 1 \n", + "Adventure|Comedy|Drama|Romance|Sci-Fi 1 \n", + "Adventure|Comedy|Family|Mystery|Sci-Fi 1 \n", + "Action|Adventure|Comedy|Sci-Fi|Thriller 1 \n", + "Action|Drama|Fantasy|Thriller|Western 1 \n", + "Biography|Comedy|Drama|Family|Sport 1 \n", + "Action|Adventure|Crime|Drama|Mystery|Thriller 1 \n", + "Action|Animation|Comedy|Family|Fantasy|Sci-Fi 1 \n", + "Action|Adventure|Comedy|Family|Mystery 1 \n", + "Adventure|Family|Romance 1 \n", + "Adventure|Comedy|Fantasy|Music|Sci-Fi 1 \n", + "Drama|Musical|Romance|Thriller 1 \n", + "Crime|Documentary|News 1 \n", + "Comedy|Drama|Reality-TV|Romance 1 \n", + "Action|Drama|Fantasy|Horror|Thriller 1 \n", + "Drama|History|Music|Romance|War 1 \n", + "Action|Crime|Horror|Sci-Fi|Thriller 1 \n", + "Comedy|Family|Musical|Romance 1 \n", + "Action|Comedy|Horror|Thriller 1 \n", + "Comedy|Family|Romance|Sci-Fi 1 \n", + "Action|Adventure|Romance|Thriller 1 \n", + "Animation|Drama|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Family|Fantasy|Musical 1 \n", + "Adventure|Crime|Drama 1 \n", + "Action|Adventure|Animation|Drama|Mystery|Sci-Fi|Thriller 1 \n", + "Comedy|Drama|Mystery|Romance|Thriller|War 1 \n", + "Drama|Horror|Romance 1 \n", + "Action|Sci-Fi|War 1 \n", + "Action|Drama|Romance|Thriller 1 \n", + "Action|Comedy|Drama|Western 1 \n", + "Crime|Horror|Music|Thriller 1 \n", + "Documentary|Drama|Sport 1 \n", + "Family|Fantasy|Musical 1 \n", + "Biography|Crime|Documentary|History|Thriller 1 \n", + "Adventure|Drama|History|Romance|War 1 \n", + "Horror|Musical 1 \n", + "Horror|Musical|Sci-Fi 1 \n", + "Animation|Biography|Drama|War 1 \n", + "Action|Adventure|Fantasy|Horror|Sci-Fi|Thriller 1 \n", + "Comedy|Crime|Drama|Horror|Thriller 1 \n", + "Comedy|Sci-Fi|Thriller 1 \n", + "Comedy|Drama|Music|War 1 \n", + "Crime|Drama|Horror 1 \n", + "Drama|History|Horror 1 \n", + "Crime|Drama|Mystery|Romance|Thriller 1 \n", + "Drama|Fantasy|Romance|War 1 \n", + "Adventure|Animation|Family|Thriller 1 \n", + "Adventure|Horror|Mystery 1 \n", + "Mystery|Romance|Sci-Fi|Thriller 1 \n", + "Documentary|History|Sport 1 \n", + "Crime|Documentary|Drama 1 \n", + "Comedy|Thriller 1 \n", + "Action|Fantasy|Horror|Sci-Fi 1 \n", + "Adventure|Drama|Romance|Western 1 \n", + "Action|Adventure|Fantasy|Horror|Sci-Fi 1 \n", + "Action|Animation|Comedy|Family|Fantasy 1 \n", + "Adventure|Comedy|Western 1 \n", + "Action|Thriller|Western 1 \n", + "Action|Crime|Horror|Thriller 1 \n", + "Comedy|Crime|Family|Romance 1 \n", + "Crime|Drama|Music|Romance 1 \n", + "Drama|Family|Music|Romance 1 \n", + "Adventure|Comedy|Drama|Family|Sport 1 \n", + "Adventure|Documentary 1 \n", + "Biography|Comedy|Drama|Family|Romance 1 \n", + "Action|Adventure|Animation|Comedy|Sci-Fi 1 \n", + "Horror|Romance|Sci-Fi 1 \n", + "Action|Adventure|Romance|Western 1 \n", + "Action|Adventure|Animation|Comedy|Crime|Family|Fantasy 1 \n", + "Adventure|Comedy|Family|Fantasy|Musical 1 \n", + "Comedy|Drama|Musical|Romance|War 1 \n", + "Action|Adventure|Comedy|Family 1 \n", + "Biography|Crime|Drama|History|Music 1 \n", + "Action|Adventure|Comedy|Drama|War 1 \n", + "Action|Adventure|Fantasy|Horror|Thriller 1 \n", + "Action|Adventure|Drama|Fantasy|Sci-Fi 1 \n", + "Action|Adventure|Comedy|Fantasy|Romance 1 \n", + "Adventure|Comedy|Family|Fantasy|Music|Sci-Fi 1 \n", + "Comedy|Documentary|Music 1 \n", + "Adventure|Animation|Drama|Family|Fantasy|Musical|Mystery|Romance 1 \n", + "Musical 1 \n", + "Adventure|Comedy|Family|Sci-Fi 1 \n", + "Adventure|Comedy|Music|Sci-Fi 1 \n", + "Family|Music|Romance 1 \n", + "Action|Crime|Fantasy|Romance|Thriller 1 \n", + "Comedy|Family|Fantasy|Sci-Fi 1 \n", + "Family|Musical 1 \n", + "Action|Comedy|Sci-Fi|Western 1 \n", + "Adventure|Drama|Mystery|Sci-Fi|Thriller 1 \n", + "Adventure|Animation|Comedy|Fantasy 1 \n", + "Comedy|Fantasy|Horror|Musical 1 \n", + "Action|Animation|Comedy|Crime|Family 1 \n", + "Comedy|Drama|Musical|Romance|Western 1 \n", + "Comedy|Crime|Mystery|Romance 1 \n", + "Action|Comedy|Crime|Family 1 \n", + "Action|Horror|Romance|Sci-Fi|Thriller 1 \n", + "Action|Western 1 \n", + "Biography|Crime|Drama|War 1 \n", + "Crime|Drama|Mystery|Sci-Fi|Thriller 1 \n", + "Adventure|Comedy|History 1 \n", + "Comedy|Family|Music 1 \n", + "Comedy|Crime|Drama|Mystery|Thriller 1 \n", + "Adventure|Crime|Thriller 1 \n", + "Crime|Horror 1 \n", + "Action|Adventure|Comedy|Fantasy|Sci-Fi 1 \n", + "Comedy|Horror|Musical|Sci-Fi 1 \n", + "Adventure|Drama|Fantasy|Thriller|Western 1 \n", + "Drama|Family|History|Musical 1 \n", + "Action|Crime|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Drama|Music|Romance 1 \n", + "Adventure|Biography 1 \n", + "Action|Adventure|Comedy|Fantasy|Thriller 1 \n", + "Adventure|Biography|Drama|Horror|Thriller 1 \n", + "Action|Adventure|Crime|Drama 1 \n", + "Action|Adventure|Crime|Drama|Romance 1 \n", + "Comedy|Crime|Horror|Thriller 1 \n", + "Action|Drama|History 1 \n", + "Action|Adventure|Comedy|Crime|Music|Mystery 1 \n", + "Fantasy|Horror|Mystery|Romance 1 \n", + "Drama|History|Romance 1 \n", + "Crime|Drama|Mystery|Thriller|Western 1 \n", + "Action|Comedy|Drama|Sci-Fi 1 \n", + "Action|Adventure|Drama|War 1 \n", + "Adventure|Comedy|Crime 1 \n", + "Crime|Drama|Film-Noir|Mystery|Thriller 1 \n", + "Action|Animation|Sci-Fi|Thriller 1 \n", + "Comedy|Crime|Drama|Sci-Fi 1 \n", + "Action|Comedy|Crime|Music|Romance|Thriller 1 \n", + "Adventure|Comedy|Drama|Family|Mystery 1 \n", + "Adventure|Animation 1 \n", + "Action|Adventure|Comedy|Romance|Thriller|Western 1 \n", + "Drama|Family|Musical 1 \n", + "Drama|Musical|Sci-Fi 1 \n", + "Action|Adventure|Family|Fantasy|Sci-Fi 1 \n", + "Adventure|Crime|Drama|Thriller 1 \n", + "Action|Adventure|Animation|Drama|Fantasy|Sci-Fi 1 \n", + "Comedy|Family|Fantasy|Sport 1 \n", + "Biography|Drama|History|Thriller|War 1 \n", + "Action|Fantasy|Sci-Fi|Thriller 1 \n", + "Adventure|Animation|Drama|Family|Fantasy 1 \n", + "Adventure|Animation|Comedy|Drama|Family|Fantasy|Sci-Fi 1 \n", + "Drama|Film-Noir 1 \n", + "Drama|History|Romance|Western 1 \n", + "Comedy|Crime|Sci-Fi|Thriller 1 \n", + "Comedy|Family|Musical|Romance|Short 1 \n", + "Crime|Drama|Musical|Romance|Thriller 1 \n", + "Action|Crime|Drama|Mystery 1 \n", + "Action|Adventure|Drama|Western 1 \n", + "Animation|Drama|Family 1 \n", + "Comedy|Family|Music|Romance 1 \n", + "Action|Drama|Sci-Fi|Sport 1 \n", + "Adventure|Biography|Documentary|Drama 1 \n", + "Horror|Sci-Fi|Short|Thriller 1 \n", + "Action|Adventure|Crime|Drama|Family|Fantasy|Romance|Thriller 1 \n", + "Adventure|Animation|Comedy|Drama|Family|Fantasy|Romance 1 \n", + "Comedy|Documentary|War 1 \n", + "Action|Biography|Crime|Drama|Thriller 1 \n", + "Drama|Fantasy|Music|Romance 1 \n", + "Action|Adventure|Animation|Fantasy 1 \n", + "Adventure|Biography|Drama|War 1 \n", + "Comedy|Drama|Fantasy|Music|Romance 1 \n", + "Biography|Comedy|Crime|Drama|Romance|Thriller 1 \n", + "Drama|Mystery|Romance|Sci-Fi|Thriller 1 \n", + "Comedy|Drama|Horror|Sci-Fi|Thriller 1 \n", + "Action|Drama|Mystery|Thriller|War 1 \n", + "Action|Adventure|Family|Mystery|Sci-Fi 1 \n", + "Action|Adventure|Family|Sci-Fi|Thriller 1 \n", + "Comedy|Fantasy|Musical|Sci-Fi 1 \n", + "Drama|Fantasy|Mystery|Romance 1 \n", + "Action|Adventure|Crime 1 \n", + "Animation|Family|Fantasy|Musical|Romance 1 \n", + "Action|Comedy|Drama|War 1 \n", + "Adventure|Comedy|Drama|Family|Romance 1 \n", + "Comedy|Family|Fantasy|Music|Romance 1 \n", + "Comedy|Family|Fantasy|Horror|Mystery 1 \n", + "Fantasy|Mystery|Thriller 1 \n", + "Adventure|Documentary|Short 1 \n", + "Action|Biography|Documentary|Sport 1 \n", + "Crime|Drama|History 1 \n", + "Comedy|Crime|Drama|Music|Romance 1 \n", + "Adventure|Comedy|Crime|Family|Musical 1 \n", + "Action|Adventure|Comedy|Romance|Thriller 1 \n", + "Action|Adventure|Comedy|Musical 1 \n", + "Adventure|Crime|Drama|Mystery|Thriller 1 \n", + "Adventure|Drama|Fantasy|Mystery 1 \n", + "Crime|Drama|Musical 1 \n", + "Crime|Drama|Film-Noir 1 \n", + "Action|Adventure|Comedy|Fantasy|Mystery 1 \n", + "Adventure|Drama|History|Thriller|War 1 \n", + "Drama|Family|Western 1 \n", + "Documentary|Family 1 \n", + "Biography|Drama|Family|Musical|Romance 1 \n", + "Action|Fantasy|Western 1 \n", + "Animation|Drama 1 \n", + "Action|Drama|Mystery|Thriller 1 \n", + "Biography|Drama|Romance|Western 1 \n", + "Action|Crime|Sci-Fi 1 \n", + "Action|Comedy|Fantasy|Horror 1 \n", + "Action|Comedy|Crime|Sci-Fi|Thriller 1 \n", + "Animation|Comedy|Crime|Drama|Family 1 \n", + "Comedy|Drama|History|Musical|Romance 1 \n", + "Adventure|Comedy|Drama|Fantasy|Musical 1 \n", + "Action|Animation|Crime|Sci-Fi|Thriller 1 \n", + "Biography|Comedy|Drama|War 1 \n", + "Adventure|Documentary|Drama|Sport 1 \n", + "Adventure|Animation|Sci-Fi 1 \n", + "Adventure|Animation|Biography|Drama|Family|Fantasy|Musical 1 \n", + "Adventure|Comedy|Crime|Drama 1 \n", + "Biography|Crime|Drama|History|Western 1 \n", + "Action|Drama|History|Romance|War|Western 1 \n", + "Adventure|Animation|Family|Western 1 \n", + "Adventure|Sci-Fi 1 \n", + "Adventure|Comedy|Drama|Family 1 \n", + "Crime|Drama|Fantasy 1 \n", + "Animation|Comedy|Drama|Fantasy|Sci-Fi 1 \n", + "Crime|Thriller|War 1 \n", + "Crime|Fantasy|Horror 1 \n", + "Action|Crime|Drama|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Comedy|Crime|Drama|Romance|Thriller 1 \n", + "Adventure|Family|Fantasy|Sci-Fi 1 \n", + "Action|Adventure|Animation|Comedy|Fantasy 1 \n", + "Animation|Comedy|Family|Sport 1 \n", + "Action|Comedy|Fantasy|Romance 1 \n", + "Adventure|Family|Fantasy|Music|Musical 1 \n", + "Adventure|Drama|Horror|Mystery|Thriller 1 \n", + "Action|Horror|Mystery|Thriller 1 \n", + "Action|Family|Sport 1 \n", + "Biography|Drama|Family|History|Sport 1 \n", + "Action|Biography|Drama|Thriller|War 1 \n", + "Comedy|Family|Romance|Sport 1 \n", + "Action|Adventure|Drama|Romance|Sci-Fi 1 \n", + "Adventure|Family|Sci-Fi 1 \n", + "Adventure|Horror|Sci-Fi 1 \n", + "Drama|Fantasy|Sport 1 \n", + "Biography|Documentary|History 1 \n", + "Action|Adventure|Comedy|Drama|Music|Sci-Fi 1 \n", + "Crime|Drama|Mystery|Romance 1 \n", + "Drama|Fantasy|Mystery|Sci-Fi 1 \n", + "Adventure|Drama|History|Romance|Thriller|War 1 \n", + "Action|War 1 \n", + "Action|Drama|Fantasy|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Family|Fantasy|Romance 1 \n", + "Action|Adventure|Family|Mystery 1 \n", + "Action|Adventure|Drama|Family 1 \n", + "Biography 1 \n", + "Action|Drama|Romance|War 1 \n", + "Fantasy|Thriller 1 \n", + "Action|Adventure|Fantasy|Horror 1 \n", + "Documentary|News 1 \n", + "Action|Comedy|Sci-Fi|Sport 1 \n", + "Action|Adventure|Family|Fantasy|Sci-Fi|Thriller 1 \n", + "Crime|Drama|Music|Mystery|Thriller 1 \n", + "Adventure|War|Western 1 \n", + "Drama|Horror|Mystery|Sci-Fi 1 \n", + "Drama|Music|Mystery|Romance|Thriller 1 \n", + "Action|Adventure|Drama|Fantasy|War 1 \n", + "Action|Adventure|Animation|Family 1 \n", + "Adventure|Comedy|Horror|Sci-Fi 1 \n", + "Action|Fantasy|Horror|Mystery 1 \n", + "Adventure|Comedy|Family|Fantasy|Romance|Sport 1 \n", + "Action|Adventure|Crime|Drama|Sci-Fi|Thriller 1 \n", + "Action|Biography|Drama|History|Thriller|War 1 \n", + "Adventure|Biography|Crime|Drama|Western 1 \n", + "Action|Sport 1 \n", + "Comedy|Fantasy|Mystery 1 \n", + "Biography|Drama|Family|Sport 1 \n", + "Comedy|Music|Sci-Fi 1 \n", + "Documentary|Drama|History|News 1 \n", + "Mystery|Western 1 \n", + "Action|Adventure|Mystery|Romance|Thriller 1 \n", + "Comedy|Horror|Sci-Fi|Thriller 1 \n", + "Comedy|Crime|Drama|Mystery 1 \n", + "Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi|Sport 1 \n", + "Action|Drama|Romance|Sport 1 \n", + "Animation|Family|Fantasy|Mystery 1 \n", + "Action|Animation|Sci-Fi 1 \n", + "Action|Adventure|History|Western 1 \n", + "Adventure|Drama|Horror|Thriller 1 \n", + "Documentary|Family|Music 1 \n", + "Biography|Documentary|Drama 1 \n", + "Adventure|Animation|Comedy|Drama|Family|Fantasy 1 \n", + "Biography|Drama|History|Musical 1 \n", + "Action|Fantasy|Horror|Mystery|Thriller 1 \n", + "Action|Adventure|Animation|Family|Sci-Fi|Thriller 1 \n", + "Action|Adventure|Animation|Fantasy|Romance|Sci-Fi 1 \n", + "Action|Biography|Drama|History|Romance|War 1 \n", + "Adventure|Animation|Comedy|Family|War 1 \n", + "Comedy|Documentary|Drama|Fantasy|Mystery|Sci-Fi 1 \n", + "Adventure|Drama|Fantasy|Mystery|Thriller 1 \n", + "Animation|Comedy|Family|Music|Romance 1 \n", + "Action|Comedy|Mystery 1 \n", + "Animation|Comedy|Drama|Family|Musical 1 \n", + "Adventure|Biography|Drama|History 1 \n", + "Drama|Fantasy|Mystery|Romance|Thriller 1 \n", + "Crime|Drama|Music|Thriller 1 \n", + "Adventure|Comedy|Crime|Romance 1 \n", + "Action|Biography|Crime|Drama|Family|Fantasy 1 \n", + "Action|Romance|Sport 1 \n", + "Biography|Comedy|Drama|History|Music 1 \n", + "Animation|Drama|Family|Musical|Romance 1 \n", + "Action|Adventure|Family|Fantasy|Thriller 1 \n", + "Biography|Drama|Family 1 \n", + "Fantasy|Horror|Romance 1 \n", + "Action|Adventure|Comedy|Family|Fantasy 1 \n", + "Horror|Romance|Thriller 1 \n", + "Comedy|Drama|Family|Fantasy|Musical 1 \n", + "Biography|Comedy|Musical|Romance|Western 1 \n", + "Animation|Comedy|Family|Fantasy|Musical|Romance 1 \n", + "Animation|Comedy|Family|Fantasy|Sci-Fi 1 \n", + "Adventure|Fantasy|Thriller 1 \n", + "Adventure|Family|Sport 1 \n", + "Adventure|Crime|Mystery|Sci-Fi|Thriller 1 \n", + "Action|Adventure|History|Romance 1 \n", + "Animation|Comedy|Family|Fantasy|Mystery 1 \n", + "Action|Animation|Comedy|Family 1 \n", + "Action|Adventure|Animation|Comedy|Drama|Family|Fantasy|Thriller 1 \n", + "Adventure|Crime|Drama|Western 1 \n", + "Action|Adventure|Comedy|Crime|Thriller 1 \n", + "Music 1 \n", + "Action|Comedy|Music 1 \n", + "Adventure|Drama|Family|Mystery 1 \n", + "Biography|Comedy|Musical 1 \n", + "Adventure|Comedy|Horror 1 \n", + "Adventure|Animation|Comedy|Crime 1 \n", + "Biography|Comedy|Documentary 1 \n", + "Action|Comedy|Mystery|Romance 1 \n", + "Action|Drama|Sport|Thriller 1 \n", + "Animation|Comedy|Drama|Romance 1 \n", + "Comedy|Fantasy|Horror|Mystery 1 \n", + "Crime|Drama|History|Mystery|Thriller 1 \n", + "Action|Horror|Romance 1 \n", + "Adventure|Comedy|Crime|Music 1 \n", + "Crime|Drama|Musical|Romance 1 \n", + "Adventure|Comedy|Sci-Fi|Western 1 \n", + "Crime|Drama|Fantasy|Mystery 1 \n", + "Action|Adventure|Drama|History|Thriller|War 1 \n", + "Action|Adventure|Biography|Drama|History|Thriller 1 \n", + "Comedy|Crime|Horror 1 \n", + "Adventure|Animation|Family|Fantasy|Musical|War 1 \n", + "Action|Adventure|Biography|Drama|History|Romance|War 1 \n", + "Comedy|Drama|Family|Fantasy|Sci-Fi 1 \n", + "Comedy|Crime|Musical|Mystery 1 \n", + "Adventure|Comedy|Drama|Romance|Thriller|War 1 \n", + "Adventure|Comedy|Family|Music|Romance 1 \n", + "Action|Comedy|Crime|Western 1 \n", + "Adventure|Drama|Thriller|War 1 \n", + "Biography|Crime|Drama|Mystery|Thriller 1 \n", + "Adventure|Comedy|Drama|Music 1 \n", + "Adventure|Animation|Comedy|Fantasy|Music|Romance 1 \n", + "Family|Fantasy|Music 1 \n", + "Action|Adventure|Drama|History|Romance|War 1 \n", + "Biography|Comedy|Crime|Drama|Romance 1 \n", + "Adventure|Comedy|Musical|Romance 1 \n", + "Name: genres, dtype: int64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.genres.value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #3: Show all columns and column width" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "pd.reset_option('display.width')\n", + "pd.reset_option('display.max_columns')\n", + "pd.reset_option('display.max_colwidth')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
\n", + "

2 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "0 3054.0 English USA PG-13 237000000.0 \n", + "1 1238.0 English USA PG-13 300000000.0 \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "0 2009.0 936.0 7.9 1.78 \n", + "1 2007.0 5000.0 7.1 2.35 \n", + "\n", + " movie_facebook_likes \n", + "0 33000 \n", + "1 0 \n", + "\n", + "[2 rows x 28 columns]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "# show all columns on wider area\n", + "import pandas as pd\n", + "pd.set_option('display.width', None)\n", + "pd.set_option('display.max_columns', None)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenresactor_1_namemovie_titlenum_voted_userscast_total_facebook_likesactor_3_namefacenumber_in_posterplot_keywordsmovie_imdb_linknum_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-FiCCH PounderAvatar8862044834Wes Studi0.0avatar|future|marine|native|paraplegichttp://www.imdb.com/title/tt0499549/?ref_=fn_t...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|FantasyJohnny DeppPirates of the Caribbean: At World's End47122048350Jack Davenport0.0goddess|marriage ceremony|marriage proposal|pi...http://www.imdb.com/title/tt0449088/?ref_=fn_t...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "\n", + " actor_1_facebook_likes gross genres \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy \n", + "\n", + " actor_1_name movie_title num_voted_users \\\n", + "0 CCH Pounder Avatar  886204 \n", + "1 Johnny Depp Pirates of the Caribbean: At World's End  471220 \n", + "\n", + " cast_total_facebook_likes actor_3_name facenumber_in_poster \\\n", + "0 4834 Wes Studi 0.0 \n", + "1 48350 Jack Davenport 0.0 \n", + "\n", + " plot_keywords \\\n", + "0 avatar|future|marine|native|paraplegic \n", + "1 goddess|marriage ceremony|marriage proposal|pi... \n", + "\n", + " movie_imdb_link num_user_for_reviews \\\n", + "0 http://www.imdb.com/title/tt0499549/?ref_=fn_t... 3054.0 \n", + "1 http://www.imdb.com/title/tt0449088/?ref_=fn_t... 1238.0 \n", + "\n", + " language country content_rating budget title_year \\\n", + "0 English USA PG-13 237000000.0 2009.0 \n", + "1 English USA PG-13 300000000.0 2007.0 \n", + "\n", + " actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes \n", + "0 936.0 7.9 1.78 33000 \n", + "1 5000.0 7.1 2.35 0 " + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5 Action|Adventure|Sci-Fi\n", + "6 Action|Adventure|Romance\n", + "7 Adventure|Animation|Comedy|Family|Fantasy|Musi...\n", + "8 Action|Adventure|Sci-Fi\n", + "9 Adventure|Family|Fantasy|Mystery\n", + "Name: genres, dtype: object" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.iloc[5:10,9]" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "# display column values without truncation\n", + "import pandas as pd\n", + "pd.set_option('display.max_colwidth', None)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5 Action|Adventure|Sci-Fi\n", + "6 Action|Adventure|Romance\n", + "7 Adventure|Animation|Comedy|Family|Fantasy|Musical|Romance\n", + "8 Action|Adventure|Sci-Fi\n", + "9 Adventure|Family|Fantasy|Mystery\n", + "Name: genres, dtype: object" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.iloc[5:10,9]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step #5: Increase Jupyter Notebook cell width" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from IPython.core.display import display, HTML\n", + "display(HTML(\"\"))" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from IPython.core.display import display, HTML\n", + "display(HTML(\"\"))\n", + "display(HTML(\"\"))\n", + "display(HTML(\"\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From ea5bcf5913f6b479ac0e919911434e18b4f023c1 Mon Sep 17 00:00:00 2001 From: softhints Date: Thu, 9 Apr 2020 13:41:33 +0300 Subject: [PATCH 60/76] error with table extraction. Change page with: https://en.wikipedia.org/wiki/New_York_City --- ...e wiki tables with pandas and python.ipynb | 1816 ++++++++--------- 1 file changed, 855 insertions(+), 961 deletions(-) diff --git a/notebooks/Scrape wiki tables with pandas and python.ipynb b/notebooks/Scrape wiki tables with pandas and python.ipynb index af93cad..0a8dd21 100644 --- a/notebooks/Scrape wiki tables with pandas and python.ipynb +++ b/notebooks/Scrape wiki tables with pandas and python.ipynb @@ -35,7 +35,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 14, "metadata": {}, "outputs": [ { @@ -58,7 +58,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 15, "metadata": {}, "outputs": [ { @@ -102,8 +102,8 @@ " 1\n", " 1\n", " Russia*\n", - " 13100000\n", - " 17,125,200 including European part\n", + " 13000000\n", + " 17,125,191 km² including European Russia[1]\n", " NaN\n", " \n", " \n", @@ -117,7 +117,7 @@ " \n", " 3\n", " 3\n", - " India\n", + " India[2]\n", " 3287263\n", " NaN\n", " NaN\n", @@ -126,8 +126,8 @@ " 4\n", " 4\n", " Kazakhstan*\n", - " 2455034\n", - " 2,724,902 km² including European part\n", + " 2544900\n", + " 2,724,900 km² including European part\n", " NaN\n", " \n", " \n", @@ -166,16 +166,16 @@ " 9\n", " 9\n", " Pakistan\n", - " 796095\n", - " 882,363 km² including Gilgit-Baltistan and AJK\n", + " 881913\n", + " NaN\n", " NaN\n", " \n", " \n", " 10\n", " 10\n", " Turkey*\n", - " 747272\n", - " 783,562 km² including European part\n", + " 759592\n", + " 783,356 km² including East Thrace\n", " NaN\n", " \n", " \n", @@ -358,8 +358,8 @@ " 33\n", " 33\n", " Azerbaijan*\n", - " 86600\n", - " Sometimes considered part of Europe\n", + " 86000\n", + " Located in the Caucasus, between Europe and Asia\n", " NaN\n", " \n", " \n", @@ -375,7 +375,7 @@ " 35\n", " Georgia*\n", " 69000\n", - " Sometimes considered part of Europe\n", + " Located in the Caucasus, between Europe and Asia\n", " NaN\n", " \n", " \n", @@ -389,128 +389,120 @@ " \n", " 37\n", " 37\n", - " Egypt*\n", - " 60000\n", - " 1,002,450 km² including African part\n", + " Bhutan\n", + " 38394\n", + " NaN\n", " NaN\n", " \n", " \n", " 38\n", " 38\n", - " Bhutan\n", - " 38394\n", - " NaN\n", + " Taiwan\n", + " 36193\n", + " partially recognized state/not a UN member\n", " NaN\n", " \n", " \n", " 39\n", " 39\n", - " Taiwan\n", - " 36193\n", - " excludes Hong Kong, Macau, Mainland China and ...\n", + " Armenia\n", + " 29843\n", + " Located in the Caucasus, between Europe and Asia\n", " NaN\n", " \n", " \n", " 40\n", " 40\n", - " Armenia*\n", - " 29843\n", - " Sometimes considered part of Europe\n", + " Israel\n", + " 22072\n", + " NaN\n", " NaN\n", " \n", " \n", " 41\n", " 41\n", - " Israel\n", - " 20273\n", + " Kuwait\n", + " 17818\n", " NaN\n", " NaN\n", " \n", " \n", " 42\n", " 42\n", - " Kuwait\n", - " 17818\n", + " Timor-Leste\n", + " 14874\n", " NaN\n", " NaN\n", " \n", " \n", " 43\n", " 43\n", - " Timor-Leste\n", - " 14874\n", + " Qatar\n", + " 11586\n", " NaN\n", " NaN\n", " \n", " \n", " 44\n", " 44\n", - " Qatar\n", - " 11586\n", + " Lebanon\n", + " 10452\n", " NaN\n", " NaN\n", " \n", " \n", " 45\n", " 45\n", - " Lebanon\n", - " 10452\n", + " Cyprus\n", + " 9251\n", " NaN\n", " NaN\n", " \n", " \n", " 46\n", " 46\n", - " Cyprus\n", - " 9251\n", - " 5,896 km² excluding Northern Cyprus. Political...\n", + " Palestine\n", + " 6220\n", + " partially recognized state/non-member observer...\n", " NaN\n", " \n", " \n", " 47\n", " 47\n", - " Palestine\n", - " 6220\n", + " Brunei\n", + " 5765\n", " NaN\n", " NaN\n", " \n", " \n", " 48\n", " 48\n", - " Brunei\n", - " 5765\n", + " Bahrain\n", + " 760\n", " NaN\n", " NaN\n", " \n", " \n", " 49\n", " 49\n", - " Bahrain\n", - " 765\n", + " Singapore\n", + " 697\n", " NaN\n", " NaN\n", " \n", " \n", " 50\n", " 50\n", - " Singapore\n", - " 716\n", - " NaN\n", - " NaN\n", - " \n", - " \n", - " 51\n", - " 51\n", " Maldives\n", " 300\n", " NaN\n", " NaN\n", " \n", " \n", - " 52\n", + " 51\n", " NaN\n", " Total\n", - " 44579000\n", + " 44528251\n", " NaN\n", " NaN\n", " \n", @@ -521,16 +513,16 @@ "text/plain": [ " 0 1 2 \\\n", "0 Rank Country Area (km²) \n", - "1 1 Russia* 13100000 \n", + "1 1 Russia* 13000000 \n", "2 2 China 9596961 \n", - "3 3 India 3287263 \n", - "4 4 Kazakhstan* 2455034 \n", + "3 3 India[2] 3287263 \n", + "4 4 Kazakhstan* 2544900 \n", "5 5 Saudi Arabia 2149690 \n", "6 6 Iran 1648195 \n", "7 7 Mongolia 1564110 \n", "8 8 Indonesia* 1472639 \n", - "9 9 Pakistan 796095 \n", - "10 10 Turkey* 747272 \n", + "9 9 Pakistan 881913 \n", + "10 10 Turkey* 759592 \n", "11 11 Myanmar 676578 \n", "12 12 Afghanistan 652230 \n", "13 13 Yemen 527968 \n", @@ -553,39 +545,38 @@ "30 30 North Korea 120538 \n", "31 31 South Korea 100210 \n", "32 32 Jordan 89342 \n", - "33 33 Azerbaijan* 86600 \n", + "33 33 Azerbaijan* 86000 \n", "34 34 United Arab Emirates 83600 \n", "35 35 Georgia* 69000 \n", "36 36 Sri Lanka 65610 \n", - "37 37 Egypt* 60000 \n", - "38 38 Bhutan 38394 \n", - "39 39 Taiwan 36193 \n", - "40 40 Armenia* 29843 \n", - "41 41 Israel 20273 \n", - "42 42 Kuwait 17818 \n", - "43 43 Timor-Leste 14874 \n", - "44 44 Qatar 11586 \n", - "45 45 Lebanon 10452 \n", - "46 46 Cyprus 9251 \n", - "47 47 Palestine 6220 \n", - "48 48 Brunei 5765 \n", - "49 49 Bahrain 765 \n", - "50 50 Singapore 716 \n", - "51 51 Maldives 300 \n", - "52 NaN Total 44579000 \n", + "37 37 Bhutan 38394 \n", + "38 38 Taiwan 36193 \n", + "39 39 Armenia 29843 \n", + "40 40 Israel 22072 \n", + "41 41 Kuwait 17818 \n", + "42 42 Timor-Leste 14874 \n", + "43 43 Qatar 11586 \n", + "44 44 Lebanon 10452 \n", + "45 45 Cyprus 9251 \n", + "46 46 Palestine 6220 \n", + "47 47 Brunei 5765 \n", + "48 48 Bahrain 760 \n", + "49 49 Singapore 697 \n", + "50 50 Maldives 300 \n", + "51 NaN Total 44528251 \n", "\n", " 3 4 \n", "0 Notes NaN \n", - "1 17,125,200 including European part NaN \n", + "1 17,125,191 km² including European Russia[1] NaN \n", "2 excludes Hong Kong, Macau, Taiwan and disputed... NaN \n", "3 NaN NaN \n", - "4 2,724,902 km² including European part NaN \n", + "4 2,724,900 km² including European part NaN \n", "5 NaN NaN \n", "6 NaN NaN \n", "7 NaN NaN \n", "8 1,904,569 km² including Oceanian part NaN \n", - "9 882,363 km² including Gilgit-Baltistan and AJK NaN \n", - "10 783,562 km² including European part NaN \n", + "9 NaN NaN \n", + "10 783,356 km² including East Thrace NaN \n", "11 NaN NaN \n", "12 NaN NaN \n", "13 NaN NaN \n", @@ -608,29 +599,28 @@ "30 NaN NaN \n", "31 NaN NaN \n", "32 NaN NaN \n", - "33 Sometimes considered part of Europe NaN \n", + "33 Located in the Caucasus, between Europe and Asia NaN \n", "34 NaN NaN \n", - "35 Sometimes considered part of Europe NaN \n", + "35 Located in the Caucasus, between Europe and Asia NaN \n", "36 NaN NaN \n", - "37 1,002,450 km² including African part NaN \n", - "38 NaN NaN \n", - "39 excludes Hong Kong, Macau, Mainland China and ... NaN \n", - "40 Sometimes considered part of Europe NaN \n", + "37 NaN NaN \n", + "38 partially recognized state/not a UN member NaN \n", + "39 Located in the Caucasus, between Europe and Asia NaN \n", + "40 NaN NaN \n", "41 NaN NaN \n", "42 NaN NaN \n", "43 NaN NaN \n", "44 NaN NaN \n", "45 NaN NaN \n", - "46 5,896 km² excluding Northern Cyprus. Political... NaN \n", + "46 partially recognized state/non-member observer... NaN \n", "47 NaN NaN \n", "48 NaN NaN \n", "49 NaN NaN \n", "50 NaN NaN \n", - "51 NaN NaN \n", - "52 NaN NaN " + "51 NaN NaN " ] }, - "execution_count": 2, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } @@ -641,30 +631,30 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Extracted 2 wikitables\n" + "Extracted 7 wikitables\n" ] } ], "source": [ "# extract several tables from wikipedia from a single page\n", "from pandas.io.html import read_html\n", - "page = 'https://en.wikipedia.org/wiki/List_of_UFC_events'\n", + "page = 'https://en.wikipedia.org/wiki/New_York_City'\n", "\n", - "wikitables = read_html(page, index_col=0, attrs={\"class\":\"wikitable\"})\n", + "wikitables = read_html(page, index_col=0, attrs={\"class\":\"wikitable\"}, header=None)\n", "\n", "print (\"Extracted {num} wikitables\".format(num=len(wikitables)))" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 2, "metadata": {}, "outputs": [ { @@ -680,78 +670,135 @@ " vertical-align: top;\n", " }\n", "\n", - " .dataframe thead th {\n", - " text-align: right;\n", + " .dataframe thead tr th {\n", + " text-align: left;\n", " }\n", "\n", "\n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", "
1234
0
EventDateVenueLocationRef.
UFC on ESPN 4Jun 29, 2019TBATBA[9]New York City's five boroughsvteNew York City's five boroughsvte
UFC on ESPN+ 11Jun 22, 2019TBATBA[9]JurisdictionJurisdictionPopulationGross Domestic ProductLand areaDensity
UFC 238Jun 8, 2019TBATBA[9]BoroughCountyEstimate (2018)[150]billions(US$)[151]per capita(US$)square milessquarekmpersons / sq. mipersons /km2
UFC on ESPN+ 10Jun 1, 2019TBATBA[9]The BronxBronx143213242.6952920042.10109.043465313231
BrooklynKings258283091.5593460070.82183.423713714649
ManhattanNew York1628701600.24436090022.8359.137203327826
QueensQueens227890693.31039600108.53281.09214608354
Staten IslandRichmond47617914.5143030058.37151.1881123132
\n", "" ], "text/plain": [ - " 1 2 3 4\n", - "0 \n", - "Event Date Venue Location Ref.\n", - "UFC on ESPN 4 Jun 29, 2019 TBA TBA [9]\n", - "UFC on ESPN+ 11 Jun 22, 2019 TBA TBA [9]\n", - "UFC 238 Jun 8, 2019 TBA TBA [9]\n", - "UFC on ESPN+ 10 Jun 1, 2019 TBA TBA [9]" + "New York City's five boroughsvte New York City's five boroughsvte \\\n", + "Jurisdiction Jurisdiction \n", + "Borough County \n", + "The Bronx Bronx \n", + "Brooklyn Kings \n", + "Manhattan New York \n", + "Queens Queens \n", + "Staten Island Richmond \n", + "\n", + "New York City's five boroughsvte \\\n", + "Jurisdiction Population Gross Domestic Product \n", + "Borough Estimate (2018)[150] billions(US$)[151] \n", + "The Bronx 1432132 42.695 \n", + "Brooklyn 2582830 91.559 \n", + "Manhattan 1628701 600.244 \n", + "Queens 2278906 93.310 \n", + "Staten Island 476179 14.514 \n", + "\n", + "New York City's five boroughsvte \\\n", + "Jurisdiction Land area \n", + "Borough per capita(US$) square miles squarekm \n", + "The Bronx 29200 42.10 109.04 \n", + "Brooklyn 34600 70.82 183.42 \n", + "Manhattan 360900 22.83 59.13 \n", + "Queens 39600 108.53 281.09 \n", + "Staten Island 30300 58.37 151.18 \n", + "\n", + "New York City's five boroughsvte \n", + "Jurisdiction Density \n", + "Borough persons / sq. mi persons /km2 \n", + "The Bronx 34653 13231 \n", + "Brooklyn 37137 14649 \n", + "Manhattan 72033 27826 \n", + "Queens 21460 8354 \n", + "Staten Island 8112 3132 " ] }, - "execution_count": 4, + "execution_count": 2, "metadata": {}, "output_type": "execute_result" } @@ -762,16 +809,16 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "(470, 6)" + "(17, 13)" ] }, - "execution_count": 5, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } @@ -782,7 +829,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -798,783 +845,573 @@ " vertical-align: top;\n", " }\n", "\n", - " .dataframe thead th {\n", - " text-align: right;\n", + " .dataframe thead tr th {\n", + " text-align: left;\n", " }\n", "\n", "\n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", "
123456
0vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b]vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b]
MonthJanFebMarAprMayJunJulAugSepOctNovDecYear
#EventDateVenueLocationAttendanceRef.
465UFC Fight Night: Assunção vs. Moraes 2Feb 2, 2019Centro de Formação Olímpica do NordesteFortaleza, Brazil10040[21]
UFC 233Jan 26, 2019Honda CenterAnaheim, California, U.S.Cancelled[22]
464UFC Fight Night: Cejudo vs. DillashawJan 19, 2019Barclays CenterBrooklyn, New York, U.S.12152[23]
463UFC 232: Jones vs. Gustafsson 2Dec 29, 2018The ForumInglewood, California, U.S.15862[24]
462UFC on Fox: Lee vs. Iaquinta 2Dec 15, 2018Fiserv ForumMilwaukee, Wisconsin, U.S.9010[25]
461UFC 231: Holloway vs. OrtegaDec 8, 2018Scotiabank ArenaToronto, Ontario, Canada19039[26]
460UFC Fight Night: dos Santos vs. TuivasaDec 2, 2018Adelaide Entertainment CentreAdelaide, Australia8652[27]
459The Ultimate Fighter: Heavy Hitters FinaleNov 30, 2018Pearl TheatreLas Vegas, Nevada, U.S.2020[28]
458UFC Fight Night: Blaydes vs. Ngannou 2Nov 24, 2018Cadillac ArenaBeijing, China10302[29]
457UFC Fight Night: Magny vs. PonzinibbioNov 17, 2018Estadio Mary Terán de WeissBuenos Aires, Argentina10245[30]
456UFC Fight Night: Korean Zombie vs. RodríguezNov 10, 2018Pepsi CenterDenver, Colorado, U.S.11426[31]
455UFC 230: Cormier vs. LewisNov 3, 2018Madison Square GardenNew York City, New York, U.S.17011[32]
454UFC Fight Night: Volkan vs. SmithOct 27, 2018Avenir CentreMoncton, New Brunswick, Canada6282[33]
453UFC 229: Khabib vs. McGregorOct 6, 2018T-Mobile ArenaLas Vegas, Nevada, U.S.20034[34]
452UFC Fight Night: Santos vs. AndersSep 22, 2018Ginásio do IbirapueraSão Paulo, Brazil9485[35]
451UFC Fight Night: Hunt vs. OleinikSep 15, 2018Olimpiyskiy StadiumMoscow, Russia22603[36]
450UFC 228: Woodley vs. TillSep 8, 2018American Airlines CenterDallas, Texas, U.S.14073[37]
449UFC Fight Night: Gaethje vs. VickAug 25, 2018Pinnacle Bank ArenaLincoln, Nebraska, U.S.6409[38]
448UFC 227: Dillashaw vs. Garbrandt 2Aug 4, 2018Staples CenterLos Angeles, California, U.S.17794[39]
447UFC on Fox: Alvarez vs. Poirier 2Jul 28, 2018Scotiabank SaddledomeCalgary, Alberta, Canada10603[40]
446UFC Fight Night: Shogun vs. SmithJul 22, 2018Barclaycard ArenaHamburg, Germany7798[41]
445UFC Fight Night: dos Santos vs. IvanovJul 14, 2018CenturyLink ArenaBoise, Idaho, U.S.5648[42]
444UFC 226: Miocic vs. CormierJul 7, 2018T-Mobile ArenaLas Vegas, Nevada, U.S.17464[43]
443The Ultimate Fighter: Undefeated FinaleJul 6, 2018Palms Casino ResortLas Vegas, Nevada, U.S.2123[44]
442UFC Fight Night: Cowboy vs. EdwardsJun 23, 2018Singapore Indoor StadiumKallang, Singapore6419[45]
441UFC 225: Whittaker vs. Romero 2Jun 9, 2018United CenterChicago, Illinois, U.S.18117[46]
440UFC Fight Night: Rivera vs. MoraesJun 1, 2018Adirondack Bank CenterUtica, New York, U.S.5063[47]
439UFC Fight Night: Thompson vs. TillMay 27, 2018Echo ArenaLiverpool, England, U.K.8520[48]
438UFC Fight Night: Maia vs. UsmanMay 19, 2018Movistar ArenaSantiago, Chile11082[49]
.....................
030UFC 26: Ultimate Field of DreamsJun 9, 2000Five Seasons Events CenterCedar Rapids, Iowa, U.S.1100[409]
029UFC 25: Ultimate Japan 3Apr 14, 2000Yoyogi National GymnasiumTokyo, JapanNaNNaN
028UFC 24: First DefenseMar 10, 2000Lake Charles Civic CenterLake Charles, Louisiana, U.S.NaNNaN
027UFC 23: Ultimate Japan 2Nov 19, 1999Tokyo Bay NK HallChiba, JapanNaNNaN
026UFC 22: Only One Can be ChampionSep 24, 1999Lake Charles Civic CenterLake Charles, Louisiana, U.S.NaNNaN
025UFC 21: Return of the ChampionsJul 16, 1999Five Seasons Events CenterCedar Rapids, Iowa, U.S.NaNNaN
024UFC 20: Battle for the GoldMay 7, 1999Boutwell Memorial AuditoriumBirmingham, Alabama, U.S.NaNNaN
023UFC 19: Ultimate Young GunsMar 5, 1999Casino Magic Bay St. LouisBay St. Louis, Mississippi, U.S.NaNNaN
022UFC 18: The Road to the Heavyweight TitleJan 8, 1999Pontchartrain CenterNew Orleans, Louisiana, U.S.NaNNaN
021UFC Brazil: Ultimate BrazilOct 16, 1998Ginásio da PortuguesaSão Paulo, BrazilNaNNaN
020UFC 17: RedemptionMay 15, 1998Mobile Civic CenterMobile, Alabama, U.S.NaNNaN
019UFC 16: Battle in the BayouMar 13, 1998Pontchartrain CenterNew Orleans, Louisiana, U.S.4600[410]
018UFC Japan: Ultimate JapanDec 21, 1997Yokohama ArenaYokohama, Japan5000[411]
017UFC 15: Collision CourseOct 17, 1997Casino Magic Bay St. LouisBay St. Louis, Mississippi, U.S.NaNNaN
016UFC 14: ShowdownJul 27, 1997Boutwell Memorial AuditoriumBirmingham, Alabama, U.S.5000[412]
015UFC 13: The Ultimate ForceMay 30, 1997Augusta Civic CenterAugusta, Georgia, U.S.5100[413]
014UFC 12: Judgement DayFeb 7, 1997Dothan Civic CenterDothan, Alabama, U.S.3100[414]
013UFC: The Ultimate Ultimate 2Dec 7, 1996Fair Park ArenaBirmingham, Alabama, U.S.6000[415]
012UFC 11: The Proving GroundSep 20, 1996Augusta Civic CenterAugusta, Georgia, U.S.4500[416]
011UFC 10: The TournamentJul 12, 1996Fair Park ArenaBirmingham, Alabama, U.S.4300[417]
010UFC 9: Motor City MadnessMay 17, 1996Cobo ArenaDetroit, Michigan, U.S.10000[418]
009UFC 8: David vs. GoliathFeb 16, 1996Coliseo Rubén RodríguezBayamón, Puerto Rico13000[419]
008UFC: The Ultimate UltimateDec 16, 1995Mammoth GardensDenver, Colorado, U.S.2800[420]
007UFC 7: The Brawl in BuffaloSep 8, 1995Buffalo Memorial AuditoriumBuffalo, New York, U.S.9000[421]
006UFC 6: Clash of the TitansJul 14, 1995Casper Events CenterCasper, Wyoming, U.S.2700[422]
005UFC 5: The Return of the BeastApr 7, 1995Independence ArenaCharlotte, North Carolina, U.S.6000[423]
004UFC 4: Revenge of the WarriorsDec 16, 1994Expo Square PavilionTulsa, Oklahoma, U.S.5857[424]
003UFC 3: The American DreamSep 9, 1994Grady Cole CenterCharlotte, North Carolina, U.S.NaNNaNRecord high °F (°C)72(22)78(26)86(30)96(36)99(37)101(38)106(41)104(40)102(39)94(34)84(29)75(24)106(41)
Mean maximum °F (°C)59.6(15.3)60.7(15.9)71.5(21.9)83.0(28.3)88.0(31.1)92.3(33.5)95.4(35.2)93.7(34.3)88.5(31.4)78.8(26.0)71.3(21.8)62.2(16.8)97.0(36.1)
Average high °F (°C)38.3(3.5)41.6(5.3)49.7(9.8)61.2(16.2)70.8(21.6)79.3(26.3)84.1(28.9)82.6(28.1)75.2(24.0)63.8(17.7)53.8(12.1)43.0(6.1)62.0(16.7)
Daily mean °F (°C)32.6(0.3)35.3(1.8)42.5(5.8)53.0(11.7)62.4(16.9)71.5(21.9)76.5(24.7)75.2(24.0)68.0(20.0)56.9(13.8)47.7(8.7)37.5(3.1)55.0(12.8)
Average low °F (°C)26.9(−2.8)28.9(−1.7)35.2(1.8)44.8(7.1)54.0(12.2)63.6(17.6)68.8(20.4)67.8(19.9)60.8(16.0)50.0(10.0)41.6(5.3)32.0(0.0)48.0(8.9)
Mean minimum °F (°C)9.2(−12.7)12.8(−10.7)18.5(−7.5)32.3(0.2)43.5(6.4)52.9(11.6)60.3(15.7)58.8(14.9)48.6(9.2)38.0(3.3)27.7(−2.4)15.6(−9.1)7.0(−13.9)
Record low °F (°C)−6(−21)−15(−26)3(−16)12(−11)32(0)44(7)52(11)50(10)39(4)28(−2)5(−15)−13(−25)−15(−26)
Average precipitation inches (mm)3.65(93)3.09(78)4.36(111)4.50(114)4.19(106)4.41(112)4.60(117)4.44(113)4.28(109)4.40(112)4.02(102)4.00(102)49.94(1,268)
Average snowfall inches (cm)7.0(18)9.2(23)3.9(9.9)0.6(1.5)0(0)0(0)0(0)0(0)0(0)0(0)0.3(0.76)4.8(12)25.8(66)
Average precipitation days (≥ 0.01 in)10.49.210.911.511.111.210.49.58.78.99.610.6122.0
Average snowy days (≥ 0.1 in)4.02.81.80.30000000.22.311.4
Average relative humidity (%)61.560.258.555.362.765.264.266.067.865.664.664.163.0
Mean monthly sunshine hours162.7163.1212.5225.6256.6257.3268.2268.2219.3211.2151.0139.02534.7
Percent possible sunshine54555757575759635961514857
002UFC 2: No Way OutMar 11, 1994Mammoth GardensDenver, Colorado, U.S.2000[425]Average ultraviolet index2346788864215
001UFC 1: The BeginningNov 12, 1993McNichols Sports ArenaDenver, Colorado, U.S.7800[426]Source #1: NOAA (relative humidity and sun 1961–1990)[196][210][192][211]Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...Source #1: NOAA (relative humidity and sun 196...
Source #2: Weather Atlas[212] See Geography of New York City for additional climate information from the outer boroughs.Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...Source #2: Weather Atlas[212] See Geography of...
\n", - "

470 rows × 6 columns

\n", "" ], "text/plain": [ - " 1 2 \\\n", - "0 \n", - "# Event Date \n", - "465 UFC Fight Night: Assunção vs. Moraes 2 Feb 2, 2019 \n", - "– UFC 233 Jan 26, 2019 \n", - "464 UFC Fight Night: Cejudo vs. Dillashaw Jan 19, 2019 \n", - "463 UFC 232: Jones vs. Gustafsson 2 Dec 29, 2018 \n", - "462 UFC on Fox: Lee vs. Iaquinta 2 Dec 15, 2018 \n", - "461 UFC 231: Holloway vs. Ortega Dec 8, 2018 \n", - "460 UFC Fight Night: dos Santos vs. Tuivasa Dec 2, 2018 \n", - "459 The Ultimate Fighter: Heavy Hitters Finale Nov 30, 2018 \n", - "458 UFC Fight Night: Blaydes vs. Ngannou 2 Nov 24, 2018 \n", - "457 UFC Fight Night: Magny vs. Ponzinibbio Nov 17, 2018 \n", - "456 UFC Fight Night: Korean Zombie vs. Rodríguez Nov 10, 2018 \n", - "455 UFC 230: Cormier vs. Lewis Nov 3, 2018 \n", - "454 UFC Fight Night: Volkan vs. Smith Oct 27, 2018 \n", - "453 UFC 229: Khabib vs. McGregor Oct 6, 2018 \n", - "452 UFC Fight Night: Santos vs. Anders Sep 22, 2018 \n", - "451 UFC Fight Night: Hunt vs. Oleinik Sep 15, 2018 \n", - "450 UFC 228: Woodley vs. Till Sep 8, 2018 \n", - "449 UFC Fight Night: Gaethje vs. Vick Aug 25, 2018 \n", - "448 UFC 227: Dillashaw vs. Garbrandt 2 Aug 4, 2018 \n", - "447 UFC on Fox: Alvarez vs. Poirier 2 Jul 28, 2018 \n", - "446 UFC Fight Night: Shogun vs. Smith Jul 22, 2018 \n", - "445 UFC Fight Night: dos Santos vs. Ivanov Jul 14, 2018 \n", - "444 UFC 226: Miocic vs. Cormier Jul 7, 2018 \n", - "443 The Ultimate Fighter: Undefeated Finale Jul 6, 2018 \n", - "442 UFC Fight Night: Cowboy vs. Edwards Jun 23, 2018 \n", - "441 UFC 225: Whittaker vs. Romero 2 Jun 9, 2018 \n", - "440 UFC Fight Night: Rivera vs. Moraes Jun 1, 2018 \n", - "439 UFC Fight Night: Thompson vs. Till May 27, 2018 \n", - "438 UFC Fight Night: Maia vs. Usman May 19, 2018 \n", - ".. ... ... \n", - "030 UFC 26: Ultimate Field of Dreams Jun 9, 2000 \n", - "029 UFC 25: Ultimate Japan 3 Apr 14, 2000 \n", - "028 UFC 24: First Defense Mar 10, 2000 \n", - "027 UFC 23: Ultimate Japan 2 Nov 19, 1999 \n", - "026 UFC 22: Only One Can be Champion Sep 24, 1999 \n", - "025 UFC 21: Return of the Champions Jul 16, 1999 \n", - "024 UFC 20: Battle for the Gold May 7, 1999 \n", - "023 UFC 19: Ultimate Young Guns Mar 5, 1999 \n", - "022 UFC 18: The Road to the Heavyweight Title Jan 8, 1999 \n", - "021 UFC Brazil: Ultimate Brazil Oct 16, 1998 \n", - "020 UFC 17: Redemption May 15, 1998 \n", - "019 UFC 16: Battle in the Bayou Mar 13, 1998 \n", - "018 UFC Japan: Ultimate Japan Dec 21, 1997 \n", - "017 UFC 15: Collision Course Oct 17, 1997 \n", - "016 UFC 14: Showdown Jul 27, 1997 \n", - "015 UFC 13: The Ultimate Force May 30, 1997 \n", - "014 UFC 12: Judgement Day Feb 7, 1997 \n", - "013 UFC: The Ultimate Ultimate 2 Dec 7, 1996 \n", - "012 UFC 11: The Proving Ground Sep 20, 1996 \n", - "011 UFC 10: The Tournament Jul 12, 1996 \n", - "010 UFC 9: Motor City Madness May 17, 1996 \n", - "009 UFC 8: David vs. Goliath Feb 16, 1996 \n", - "008 UFC: The Ultimate Ultimate Dec 16, 1995 \n", - "007 UFC 7: The Brawl in Buffalo Sep 8, 1995 \n", - "006 UFC 6: Clash of the Titans Jul 14, 1995 \n", - "005 UFC 5: The Return of the Beast Apr 7, 1995 \n", - "004 UFC 4: Revenge of the Warriors Dec 16, 1994 \n", - "003 UFC 3: The American Dream Sep 9, 1994 \n", - "002 UFC 2: No Way Out Mar 11, 1994 \n", - "001 UFC 1: The Beginning Nov 12, 1993 \n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Jan \n", + "Record high °F (°C) 72(22) \n", + "Mean maximum °F (°C) 59.6(15.3) \n", + "Average high °F (°C) 38.3(3.5) \n", + "Daily mean °F (°C) 32.6(0.3) \n", + "Average low °F (°C) 26.9(−2.8) \n", + "Mean minimum °F (°C) 9.2(−12.7) \n", + "Record low °F (°C) −6(−21) \n", + "Average precipitation inches (mm) 3.65(93) \n", + "Average snowfall inches (cm) 7.0(18) \n", + "Average precipitation days (≥ 0.01 in) 10.4 \n", + "Average snowy days (≥ 0.1 in) 4.0 \n", + "Average relative humidity (%) 61.5 \n", + "Mean monthly sunshine hours 162.7 \n", + "Percent possible sunshine 54 \n", + "Average ultraviolet index 2 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Feb \n", + "Record high °F (°C) 78(26) \n", + "Mean maximum °F (°C) 60.7(15.9) \n", + "Average high °F (°C) 41.6(5.3) \n", + "Daily mean °F (°C) 35.3(1.8) \n", + "Average low °F (°C) 28.9(−1.7) \n", + "Mean minimum °F (°C) 12.8(−10.7) \n", + "Record low °F (°C) −15(−26) \n", + "Average precipitation inches (mm) 3.09(78) \n", + "Average snowfall inches (cm) 9.2(23) \n", + "Average precipitation days (≥ 0.01 in) 9.2 \n", + "Average snowy days (≥ 0.1 in) 2.8 \n", + "Average relative humidity (%) 60.2 \n", + "Mean monthly sunshine hours 163.1 \n", + "Percent possible sunshine 55 \n", + "Average ultraviolet index 3 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Mar \n", + "Record high °F (°C) 86(30) \n", + "Mean maximum °F (°C) 71.5(21.9) \n", + "Average high °F (°C) 49.7(9.8) \n", + "Daily mean °F (°C) 42.5(5.8) \n", + "Average low °F (°C) 35.2(1.8) \n", + "Mean minimum °F (°C) 18.5(−7.5) \n", + "Record low °F (°C) 3(−16) \n", + "Average precipitation inches (mm) 4.36(111) \n", + "Average snowfall inches (cm) 3.9(9.9) \n", + "Average precipitation days (≥ 0.01 in) 10.9 \n", + "Average snowy days (≥ 0.1 in) 1.8 \n", + "Average relative humidity (%) 58.5 \n", + "Mean monthly sunshine hours 212.5 \n", + "Percent possible sunshine 57 \n", + "Average ultraviolet index 4 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Apr \n", + "Record high °F (°C) 96(36) \n", + "Mean maximum °F (°C) 83.0(28.3) \n", + "Average high °F (°C) 61.2(16.2) \n", + "Daily mean °F (°C) 53.0(11.7) \n", + "Average low °F (°C) 44.8(7.1) \n", + "Mean minimum °F (°C) 32.3(0.2) \n", + "Record low °F (°C) 12(−11) \n", + "Average precipitation inches (mm) 4.50(114) \n", + "Average snowfall inches (cm) 0.6(1.5) \n", + "Average precipitation days (≥ 0.01 in) 11.5 \n", + "Average snowy days (≥ 0.1 in) 0.3 \n", + "Average relative humidity (%) 55.3 \n", + "Mean monthly sunshine hours 225.6 \n", + "Percent possible sunshine 57 \n", + "Average ultraviolet index 6 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", "\n", - " 3 \\\n", - "0 \n", - "# Venue \n", - "465 Centro de Formação Olímpica do Nordeste \n", - "– Honda Center \n", - "464 Barclays Center \n", - "463 The Forum \n", - "462 Fiserv Forum \n", - "461 Scotiabank Arena \n", - "460 Adelaide Entertainment Centre \n", - "459 Pearl Theatre \n", - "458 Cadillac Arena \n", - "457 Estadio Mary Terán de Weiss \n", - "456 Pepsi Center \n", - "455 Madison Square Garden \n", - "454 Avenir Centre \n", - "453 T-Mobile Arena \n", - "452 Ginásio do Ibirapuera \n", - "451 Olimpiyskiy Stadium \n", - "450 American Airlines Center \n", - "449 Pinnacle Bank Arena \n", - "448 Staples Center \n", - "447 Scotiabank Saddledome \n", - "446 Barclaycard Arena \n", - "445 CenturyLink Arena \n", - "444 T-Mobile Arena \n", - "443 Palms Casino Resort \n", - "442 Singapore Indoor Stadium \n", - "441 United Center \n", - "440 Adirondack Bank Center \n", - "439 Echo Arena \n", - "438 Movistar Arena \n", - ".. ... \n", - "030 Five Seasons Events Center \n", - "029 Yoyogi National Gymnasium \n", - "028 Lake Charles Civic Center \n", - "027 Tokyo Bay NK Hall \n", - "026 Lake Charles Civic Center \n", - "025 Five Seasons Events Center \n", - "024 Boutwell Memorial Auditorium \n", - "023 Casino Magic Bay St. Louis \n", - "022 Pontchartrain Center \n", - "021 Ginásio da Portuguesa \n", - "020 Mobile Civic Center \n", - "019 Pontchartrain Center \n", - "018 Yokohama Arena \n", - "017 Casino Magic Bay St. Louis \n", - "016 Boutwell Memorial Auditorium \n", - "015 Augusta Civic Center \n", - "014 Dothan Civic Center \n", - "013 Fair Park Arena \n", - "012 Augusta Civic Center \n", - "011 Fair Park Arena \n", - "010 Cobo Arena \n", - "009 Coliseo Rubén Rodríguez \n", - "008 Mammoth Gardens \n", - "007 Buffalo Memorial Auditorium \n", - "006 Casper Events Center \n", - "005 Independence Arena \n", - "004 Expo Square Pavilion \n", - "003 Grady Cole Center \n", - "002 Mammoth Gardens \n", - "001 McNichols Sports Arena \n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month May \n", + "Record high °F (°C) 99(37) \n", + "Mean maximum °F (°C) 88.0(31.1) \n", + "Average high °F (°C) 70.8(21.6) \n", + "Daily mean °F (°C) 62.4(16.9) \n", + "Average low °F (°C) 54.0(12.2) \n", + "Mean minimum °F (°C) 43.5(6.4) \n", + "Record low °F (°C) 32(0) \n", + "Average precipitation inches (mm) 4.19(106) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 11.1 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 62.7 \n", + "Mean monthly sunshine hours 256.6 \n", + "Percent possible sunshine 57 \n", + "Average ultraviolet index 7 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", "\n", - " 4 5 6 \n", - "0 \n", - "# Location Attendance Ref. \n", - "465 Fortaleza, Brazil 10040 [21] \n", - "– Anaheim, California, U.S. Cancelled [22] \n", - "464 Brooklyn, New York, U.S. 12152 [23] \n", - "463 Inglewood, California, U.S. 15862 [24] \n", - "462 Milwaukee, Wisconsin, U.S. 9010 [25] \n", - "461 Toronto, Ontario, Canada 19039 [26] \n", - "460 Adelaide, Australia 8652 [27] \n", - "459 Las Vegas, Nevada, U.S. 2020 [28] \n", - "458 Beijing, China 10302 [29] \n", - "457 Buenos Aires, Argentina 10245 [30] \n", - "456 Denver, Colorado, U.S. 11426 [31] \n", - "455 New York City, New York, U.S. 17011 [32] \n", - "454 Moncton, New Brunswick, Canada 6282 [33] \n", - "453 Las Vegas, Nevada, U.S. 20034 [34] \n", - "452 São Paulo, Brazil 9485 [35] \n", - "451 Moscow, Russia 22603 [36] \n", - "450 Dallas, Texas, U.S. 14073 [37] \n", - "449 Lincoln, Nebraska, U.S. 6409 [38] \n", - "448 Los Angeles, California, U.S. 17794 [39] \n", - "447 Calgary, Alberta, Canada 10603 [40] \n", - "446 Hamburg, Germany 7798 [41] \n", - "445 Boise, Idaho, U.S. 5648 [42] \n", - "444 Las Vegas, Nevada, U.S. 17464 [43] \n", - "443 Las Vegas, Nevada, U.S. 2123 [44] \n", - "442 Kallang, Singapore 6419 [45] \n", - "441 Chicago, Illinois, U.S. 18117 [46] \n", - "440 Utica, New York, U.S. 5063 [47] \n", - "439 Liverpool, England, U.K. 8520 [48] \n", - "438 Santiago, Chile 11082 [49] \n", - ".. ... ... ... \n", - "030 Cedar Rapids, Iowa, U.S. 1100 [409] \n", - "029 Tokyo, Japan NaN NaN \n", - "028 Lake Charles, Louisiana, U.S. NaN NaN \n", - "027 Chiba, Japan NaN NaN \n", - "026 Lake Charles, Louisiana, U.S. NaN NaN \n", - "025 Cedar Rapids, Iowa, U.S. NaN NaN \n", - "024 Birmingham, Alabama, U.S. NaN NaN \n", - "023 Bay St. Louis, Mississippi, U.S. NaN NaN \n", - "022 New Orleans, Louisiana, U.S. NaN NaN \n", - "021 São Paulo, Brazil NaN NaN \n", - "020 Mobile, Alabama, U.S. NaN NaN \n", - "019 New Orleans, Louisiana, U.S. 4600 [410] \n", - "018 Yokohama, Japan 5000 [411] \n", - "017 Bay St. Louis, Mississippi, U.S. NaN NaN \n", - "016 Birmingham, Alabama, U.S. 5000 [412] \n", - "015 Augusta, Georgia, U.S. 5100 [413] \n", - "014 Dothan, Alabama, U.S. 3100 [414] \n", - "013 Birmingham, Alabama, U.S. 6000 [415] \n", - "012 Augusta, Georgia, U.S. 4500 [416] \n", - "011 Birmingham, Alabama, U.S. 4300 [417] \n", - "010 Detroit, Michigan, U.S. 10000 [418] \n", - "009 Bayamón, Puerto Rico 13000 [419] \n", - "008 Denver, Colorado, U.S. 2800 [420] \n", - "007 Buffalo, New York, U.S. 9000 [421] \n", - "006 Casper, Wyoming, U.S. 2700 [422] \n", - "005 Charlotte, North Carolina, U.S. 6000 [423] \n", - "004 Tulsa, Oklahoma, U.S. 5857 [424] \n", - "003 Charlotte, North Carolina, U.S. NaN NaN \n", - "002 Denver, Colorado, U.S. 2000 [425] \n", - "001 Denver, Colorado, U.S. 7800 [426] \n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Jun \n", + "Record high °F (°C) 101(38) \n", + "Mean maximum °F (°C) 92.3(33.5) \n", + "Average high °F (°C) 79.3(26.3) \n", + "Daily mean °F (°C) 71.5(21.9) \n", + "Average low °F (°C) 63.6(17.6) \n", + "Mean minimum °F (°C) 52.9(11.6) \n", + "Record low °F (°C) 44(7) \n", + "Average precipitation inches (mm) 4.41(112) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 11.2 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 65.2 \n", + "Mean monthly sunshine hours 257.3 \n", + "Percent possible sunshine 57 \n", + "Average ultraviolet index 8 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", "\n", - "[470 rows x 6 columns]" + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Jul \n", + "Record high °F (°C) 106(41) \n", + "Mean maximum °F (°C) 95.4(35.2) \n", + "Average high °F (°C) 84.1(28.9) \n", + "Daily mean °F (°C) 76.5(24.7) \n", + "Average low °F (°C) 68.8(20.4) \n", + "Mean minimum °F (°C) 60.3(15.7) \n", + "Record low °F (°C) 52(11) \n", + "Average precipitation inches (mm) 4.60(117) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 10.4 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 64.2 \n", + "Mean monthly sunshine hours 268.2 \n", + "Percent possible sunshine 59 \n", + "Average ultraviolet index 8 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Aug \n", + "Record high °F (°C) 104(40) \n", + "Mean maximum °F (°C) 93.7(34.3) \n", + "Average high °F (°C) 82.6(28.1) \n", + "Daily mean °F (°C) 75.2(24.0) \n", + "Average low °F (°C) 67.8(19.9) \n", + "Mean minimum °F (°C) 58.8(14.9) \n", + "Record low °F (°C) 50(10) \n", + "Average precipitation inches (mm) 4.44(113) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 9.5 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 66.0 \n", + "Mean monthly sunshine hours 268.2 \n", + "Percent possible sunshine 63 \n", + "Average ultraviolet index 8 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Sep \n", + "Record high °F (°C) 102(39) \n", + "Mean maximum °F (°C) 88.5(31.4) \n", + "Average high °F (°C) 75.2(24.0) \n", + "Daily mean °F (°C) 68.0(20.0) \n", + "Average low °F (°C) 60.8(16.0) \n", + "Mean minimum °F (°C) 48.6(9.2) \n", + "Record low °F (°C) 39(4) \n", + "Average precipitation inches (mm) 4.28(109) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 8.7 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 67.8 \n", + "Mean monthly sunshine hours 219.3 \n", + "Percent possible sunshine 59 \n", + "Average ultraviolet index 6 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Oct \n", + "Record high °F (°C) 94(34) \n", + "Mean maximum °F (°C) 78.8(26.0) \n", + "Average high °F (°C) 63.8(17.7) \n", + "Daily mean °F (°C) 56.9(13.8) \n", + "Average low °F (°C) 50.0(10.0) \n", + "Mean minimum °F (°C) 38.0(3.3) \n", + "Record low °F (°C) 28(−2) \n", + "Average precipitation inches (mm) 4.40(112) \n", + "Average snowfall inches (cm) 0(0) \n", + "Average precipitation days (≥ 0.01 in) 8.9 \n", + "Average snowy days (≥ 0.1 in) 0 \n", + "Average relative humidity (%) 65.6 \n", + "Mean monthly sunshine hours 211.2 \n", + "Percent possible sunshine 61 \n", + "Average ultraviolet index 4 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Nov \n", + "Record high °F (°C) 84(29) \n", + "Mean maximum °F (°C) 71.3(21.8) \n", + "Average high °F (°C) 53.8(12.1) \n", + "Daily mean °F (°C) 47.7(8.7) \n", + "Average low °F (°C) 41.6(5.3) \n", + "Mean minimum °F (°C) 27.7(−2.4) \n", + "Record low °F (°C) 5(−15) \n", + "Average precipitation inches (mm) 4.02(102) \n", + "Average snowfall inches (cm) 0.3(0.76) \n", + "Average precipitation days (≥ 0.01 in) 9.6 \n", + "Average snowy days (≥ 0.1 in) 0.2 \n", + "Average relative humidity (%) 64.6 \n", + "Mean monthly sunshine hours 151.0 \n", + "Percent possible sunshine 51 \n", + "Average ultraviolet index 2 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \\\n", + "Month Dec \n", + "Record high °F (°C) 75(24) \n", + "Mean maximum °F (°C) 62.2(16.8) \n", + "Average high °F (°C) 43.0(6.1) \n", + "Daily mean °F (°C) 37.5(3.1) \n", + "Average low °F (°C) 32.0(0.0) \n", + "Mean minimum °F (°C) 15.6(−9.1) \n", + "Record low °F (°C) −13(−25) \n", + "Average precipitation inches (mm) 4.00(102) \n", + "Average snowfall inches (cm) 4.8(12) \n", + "Average precipitation days (≥ 0.01 in) 10.6 \n", + "Average snowy days (≥ 0.1 in) 2.3 \n", + "Average relative humidity (%) 64.1 \n", + "Mean monthly sunshine hours 139.0 \n", + "Percent possible sunshine 48 \n", + "Average ultraviolet index 1 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... \n", + "\n", + "vteClimate data for New York (Belvedere Castle, Central Park), 1981–2010 normals,[a] extremes 1869–present[b] \n", + "Month Year \n", + "Record high °F (°C) 106(41) \n", + "Mean maximum °F (°C) 97.0(36.1) \n", + "Average high °F (°C) 62.0(16.7) \n", + "Daily mean °F (°C) 55.0(12.8) \n", + "Average low °F (°C) 48.0(8.9) \n", + "Mean minimum °F (°C) 7.0(−13.9) \n", + "Record low °F (°C) −15(−26) \n", + "Average precipitation inches (mm) 49.94(1,268) \n", + "Average snowfall inches (cm) 25.8(66) \n", + "Average precipitation days (≥ 0.01 in) 122.0 \n", + "Average snowy days (≥ 0.1 in) 11.4 \n", + "Average relative humidity (%) 63.0 \n", + "Mean monthly sunshine hours 2534.7 \n", + "Percent possible sunshine 57 \n", + "Average ultraviolet index 5 \n", + "Source #1: NOAA (relative humidity and sun 1961... Source #1: NOAA (relative humidity and sun 196... \n", + "Source #2: Weather Atlas[212] See Geography of ... Source #2: Weather Atlas[212] See Geography of... " ] }, - "execution_count": 6, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -1585,7 +1422,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -1601,78 +1438,135 @@ " vertical-align: top;\n", " }\n", "\n", - " .dataframe thead th {\n", - " text-align: right;\n", + " .dataframe thead tr th {\n", + " text-align: left;\n", " }\n", "\n", "\n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", "
1234
0
EventDateVenueLocationRef.New York City's five boroughsvteNew York City's five boroughsvte
UFC on ESPN 4Jun 29, 2019TBATBA[9]JurisdictionJurisdictionPopulationGross Domestic ProductLand areaDensity
UFC on ESPN+ 11Jun 22, 2019TBATBA[9]
UFC 238Jun 8, 2019TBATBA[9]BoroughCountyEstimate (2018)[150]billions(US$)[151]per capita(US$)square milessquarekmpersons / sq. mipersons /km2
UFC on ESPN+ 10Jun 1, 2019TBATBA[9]The BronxBronx143213242.6952920042.10109.043465313231
BrooklynKings258283091.5593460070.82183.423713714649
ManhattanNew York1628701600.24436090022.8359.137203327826
QueensQueens227890693.31039600108.53281.09214608354
Staten IslandRichmond47617914.5143030058.37151.1881123132
\n", "" ], "text/plain": [ - " 1 2 3 4\n", - "0 \n", - "Event Date Venue Location Ref.\n", - "UFC on ESPN 4 Jun 29, 2019 TBA TBA [9]\n", - "UFC on ESPN+ 11 Jun 22, 2019 TBA TBA [9]\n", - "UFC 238 Jun 8, 2019 TBA TBA [9]\n", - "UFC on ESPN+ 10 Jun 1, 2019 TBA TBA [9]" + "New York City's five boroughsvte New York City's five boroughsvte \\\n", + "Jurisdiction Jurisdiction \n", + "Borough County \n", + "The Bronx Bronx \n", + "Brooklyn Kings \n", + "Manhattan New York \n", + "Queens Queens \n", + "Staten Island Richmond \n", + "\n", + "New York City's five boroughsvte \\\n", + "Jurisdiction Population Gross Domestic Product \n", + "Borough Estimate (2018)[150] billions(US$)[151] \n", + "The Bronx 1432132 42.695 \n", + "Brooklyn 2582830 91.559 \n", + "Manhattan 1628701 600.244 \n", + "Queens 2278906 93.310 \n", + "Staten Island 476179 14.514 \n", + "\n", + "New York City's five boroughsvte \\\n", + "Jurisdiction Land area \n", + "Borough per capita(US$) square miles squarekm \n", + "The Bronx 29200 42.10 109.04 \n", + "Brooklyn 34600 70.82 183.42 \n", + "Manhattan 360900 22.83 59.13 \n", + "Queens 39600 108.53 281.09 \n", + "Staten Island 30300 58.37 151.18 \n", + "\n", + "New York City's five boroughsvte \n", + "Jurisdiction Density \n", + "Borough persons / sq. mi persons /km2 \n", + "The Bronx 34653 13231 \n", + "Brooklyn 37137 14649 \n", + "Manhattan 72033 27826 \n", + "Queens 21460 8354 \n", + "Staten Island 8112 3132 " ] }, - "execution_count": 7, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } @@ -2661,7 +2555,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.8" + "version": "3.6.9" } }, "nbformat": 4, From 609f0dc635c5721096d712a70253ad865940ac7e Mon Sep 17 00:00:00 2001 From: softhints Date: Tue, 21 Apr 2020 17:31:25 +0300 Subject: [PATCH 61/76] =?UTF-8?q?Chapter=C2=A011=C2=A0=C2=A0Dictionaries?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../Think_Python_Chapter_10__Lists.ipynb | 2 + ...hink_Python_Chapter_11__Dictionaries.ipynb | 2168 +++++++++++++++++ 2 files changed, 2170 insertions(+) create mode 100644 notebooks/Books/Think Python/Think_Python_Chapter_11__Dictionaries.ipynb diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_10__Lists.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_10__Lists.ipynb index 1653288..3e4ea07 100644 --- a/notebooks/Books/Think Python/Think_Python_Chapter_10__Lists.ipynb +++ b/notebooks/Books/Think Python/Think_Python_Chapter_10__Lists.ipynb @@ -6,6 +6,8 @@ "source": [ "# Chapter 10  Lists\n", "\n", + "http://greenteapress.com/thinkpython2/html/thinkpython2011.html\n", + "\n", "* A list is a sequence\n", "* Lists are mutable\n", "* Traversing a list\n", diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_11__Dictionaries.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_11__Dictionaries.ipynb new file mode 100644 index 0000000..da79412 --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_11__Dictionaries.ipynb @@ -0,0 +1,2168 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 11  Dictionaries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "http://greenteapress.com/thinkpython2/html/thinkpython2012.html\n", + "\n", + "* 11.1  A dictionary is a mapping\n", + "* 11.2  Dictionary as a collection of counters\n", + "* 11.3  Looping and dictionaries\n", + "* 11.4  Reverse lookup\n", + "* 11.5  Dictionaries and lists\n", + "* 11.6  Memos\n", + "* 11.7  Global variables\n", + "* 11.8  Debugging\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Python: List vs Tuple vs Dictionary vs Set](https://blog.softhints.com/python-list-vs-tuple-vs-dictionary-vs-set/)\n", + "\n", + "![](https://blog.softhints.com/content/images/size/w2000/2020/04/python_dict_vs_list_vs_tuple_vs_set.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.1 A dictionary is a mapping" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This chapter presents another built-in type called a dictionary.\n", + "Dictionaries are one of Python’s best features; they are the\n", + "building blocks of many efficient and elegant algorithms." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "A dictionary is like a list, but more general. In a list,\n", + "the indices have to be integers; in a dictionary they can\n", + "be (almost) any type." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A dictionary contains a collection of indices, which are called keys, and a collection of values. Each key is associated with a\n", + "single value. **The association of a key and a value is called a key-value pair** or sometimes an item. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In mathematical language, a dictionary represents a mapping\n", + "from keys to values, so you can also say that each key\n", + "“maps to” a value.\n", + "As an example, we’ll build a dictionary that maps from English\n", + "to Spanish words, so the keys and the values are all strings." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The function dict creates a new dictionary with no items.\n", + "Because dict is the name of a built-in function, you\n", + "should avoid using it as a variable name.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{}" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp = dict()\n", + "eng2sp" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{}" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp = {}\n", + "eng2sp" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The squiggly-brackets, {}, represent an empty dictionary.\n", + "To add items to the dictionary, you can use square brackets:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "eng2sp['one'] = 'uno'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This line creates an item that maps from the key\n", + "'one' to the value 'uno'. If we print the\n", + "dictionary again, we see a key-value pair with a colon\n", + "between the key and value:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'one': 'uno'}" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "eng2sp['one'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'one': '1'}" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This output format is also an input format. For example,\n", + "you can create a new dictionary with three items:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "eng2sp = {'one': 'uno',\n", + " 'two': 'dos', \n", + " 'three': 'tres'\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "But if you print eng2sp, you might be surprised:" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'one': 'two', 'two': 'dos', 'three': 'tres'}" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The order of the key-value pairs might not be the same. If\n", + "you type the same example on your computer, you might get a\n", + "different result. In general, the order of items in\n", + "a dictionary is unpredictable." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But that’s not a problem because\n", + "the elements of a dictionary are never indexed with integer indices.\n", + "Instead, you use the keys to look up the corresponding values:" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'dos'" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eng2sp['two']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The key 'two' always maps to the value 'dos' so the order\n", + "of the items doesn’t matter." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the key isn’t in the dictionary, you get an exception:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "ename": "KeyError", + "evalue": "'four'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0meng2sp\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'four'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mKeyError\u001b[0m: 'four'" + ] + } + ], + "source": [ + "eng2sp['four']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The len function works on dictionaries; it returns the\n", + "number of key-value pairs:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(eng2sp)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The in operator works on dictionaries, too; it tells you whether\n", + "something appears as a key in the dictionary (appearing\n", + "as a value is not good enough).\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'one' in eng2sp" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'uno' in eng2sp" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "To see whether something appears as a value in a dictionary, you\n", + "can use the method values, which returns a collection of\n", + "values, and then use the in operator:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "vals = eng2sp.values()\n", + "'uno' in vals" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "dict_values(['uno', 'dos', 'tres'])\n", + "dict_keys(['one', 'two', 'three'])\n", + "dict_items([('one', 'uno'), ('two', 'dos'), ('three', 'tres')])\n" + ] + } + ], + "source": [ + "print(eng2sp.values())\n", + "print(eng2sp.keys())\n", + "print(eng2sp.items())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The in operator uses different algorithms for lists and\n", + "dictionaries. For lists, it searches the elements of the list in\n", + "order, as in Section 8.6. As the list gets longer, the search\n", + "time gets longer in direct proportion." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Python dictionaries use a data structure\n", + "called a hashtable that has a remarkable property: the\n", + "in operator takes about the same amount of time no matter how\n", + "many items are in the dictionary. I explain how that’s possible\n", + "in Section B.4, but the explanation might not make\n", + "sense until you’ve read a few more chapters." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Bonus**:\n", + "\n", + "* [Hash function](https://en.wikipedia.org/wiki/Hash_function)\n", + "* [Hash table](https://en.wikipedia.org/wiki/Hash_table)\n", + "* [Collision (computer science)](https://en.wikipedia.org/wiki/Collision_(computer_science))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.2 Dictionary as a collection of counters" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Suppose you are given a string and you want to count how many\n", + "times each letter appears. There are several ways you could do it:\n", + "\n", + "1. You could create 26 variables, one for each letter of the alphabet. Then you could traverse the string and, for each character, increment the corresponding counter, probably using a chained conditional.\n", + "\n", + "2. You could create a list with 26 elements. Then you could convert each character to a number (using the built-in function ord), use the number as an index into the list, and increment the appropriate counter.\n", + "\n", + "3. You could create a dictionary with characters as keys and counters as the corresponding values. The first time you see a character, you would add an item to the dictionary. After that you would increment the value of an existing item." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each of these options performs the same computation, but each\n", + "of them implements that computation in a different way.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An implementation is a way of performing a computation;\n", + "some implementations are better than others. For example,\n", + "an advantage of the dictionary implementation is that we don’t\n", + "have to know ahead of time which letters appear in the string\n", + "and we only have to make room for the letters that do appear." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is what the code might look like:" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "def histogram(s):\n", + " d = dict()\n", + " for c in s:\n", + " if c not in d:\n", + " d[c] = 1\n", + " else:\n", + " d[c] += 1\n", + " return d" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The name of the function is histogram, which is a statistical\n", + "term for a collection of counters (or frequencies).\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The first line of the\n", + "function creates an empty dictionary. The for loop traverses\n", + "the string. Each time through the loop, if the character c is\n", + "not in the dictionary, we create a new item with key c and the\n", + "initial value 1 (since we have seen this letter once). If c is\n", + "already in the dictionary we increment d[c].\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s how it works:" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h = histogram('brontosaurus')\n", + "h" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The histogram indicates that the letters 'a' and 'b'\n", + "appear once; 'o' appears twice, and so on." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "Dictionaries have a method called get that takes a key\n", + "and a default value. If the key appears in the dictionary,\n", + "get returns the corresponding value; otherwise it returns\n", + "the default value. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'a': 1}" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h = histogram('a')\n", + "h" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h.get('a', 0)" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h.get('c', 0)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "ename": "KeyError", + "evalue": "'c'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mh\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'c'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mKeyError\u001b[0m: 'c'" + ] + } + ], + "source": [ + "h['c']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As an exercise, use get to write histogram more concisely. You\n", + "should be able to eliminate the if statement." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Excercise " + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [], + "source": [ + "def histogram(s):\n", + " d = dict()\n", + " for c in s:\n", + " d[c] = d.get(c, 0) + 1\n", + " return d" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h = histogram('brontosaurus')\n", + "h" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.3 Looping and dictionaries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you use a dictionary in a for statement, it traverses\n", + "the keys of the dictionary. For example, print_hist\n", + "prints each key and the corresponding value:" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [], + "source": [ + "def print_hist(h):\n", + " for c in h:\n", + " print(c, h[c])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Here’s what the output looks like:" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "p 1\n", + "a 1\n", + "r 2\n", + "o 1\n", + "t 1\n" + ] + } + ], + "source": [ + "h = histogram('parrot')\n", + "print_hist(h)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Again, the keys are in no particular order. To traverse the keys\n", + "in sorted order, you can use the built-in function sorted:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a 1\n", + "o 1\n", + "p 1\n", + "r 2\n", + "t 1\n" + ] + } + ], + "source": [ + "for key in sorted(h):\n", + " print(key, h[key])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Bonus Getting all keys and values" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "p 1\n", + "a 1\n", + "r 2\n", + "o 1\n", + "t 1\n" + ] + } + ], + "source": [ + "for k, v in h.items():\n", + " print(k, v)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.4 Reverse lookup" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Given a dictionary d and a key k, it is easy to\n", + "find the corresponding value v = d[k]. This operation\n", + "is called a lookup." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But what if you have v and you want to find k?\n", + "You have two problems: first, there might be more than one\n", + "key that maps to the value v. Depending on the application,\n", + "you might be able to pick one, or you might have to make\n", + "a list that contains all of them. Second, there is no\n", + "simple syntax to do a reverse lookup; you have to search." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is a function that takes a value and returns the first\n", + "key that maps to that value:" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [], + "source": [ + "def reverse_lookup(d, v):\n", + " for k in d:\n", + " if d[k] == v:\n", + " return k\n", + " raise LookupError()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This function is yet another example of the search pattern, but it\n", + "uses a feature we haven’t seen before, raise. The \n", + "raise statement causes an exception; in this case it causes a\n", + "LookupError, which is a built-in exception used to indicate\n", + "that a lookup operation failed.\n", + "\n", + " \n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we get to the end of the loop, that means v\n", + "doesn’t appear in the dictionary as a value, so we raise an\n", + "exception." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is an example of a successful reverse lookup:" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'r'" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "h = histogram('parrot')\n", + "key = reverse_lookup(h, 2)\n", + "key" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'p': 1, 'a': 1, 'r': 2, 'o': 1, 't': 1}" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "histogram('parrot')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "And an unsuccessful one:" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "ename": "LookupError", + "evalue": "", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mLookupError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mreverse_lookup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mh\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mreverse_lookup\u001b[0;34m(d, v)\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mv\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mk\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mLookupError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mLookupError\u001b[0m: " + ] + } + ], + "source": [ + "key = reverse_lookup(h, 3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The effect when you raise an exception is the same as when\n", + "Python raises one: it prints a traceback and an error message.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you raise an exception, you can provide a detailed error message as an optional argument. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "ename": "LookupError", + "evalue": "value does not appear in the dictionary", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mLookupError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mk\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mLookupError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'value does not appear in the dictionary'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mreverse_lookup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mh\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mreverse_lookup\u001b[0;34m(d, v)\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mv\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mk\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mLookupError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'value does not appear in the dictionary'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mreverse_lookup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mh\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mLookupError\u001b[0m: value does not appear in the dictionary" + ] + } + ], + "source": [ + "def reverse_lookup(d, v):\n", + " for k in d:\n", + " if d[k] == v:\n", + " return k\n", + " raise LookupError('value does not appear in the dictionary')\n", + "key = reverse_lookup(h, 3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "raise LookupError('value does not appear in the dictionary')\n", + "Traceback (most recent call last):\n", + " File \"\", line 1, in ?\n", + "LookupError: value does not appear in the dictionary" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A reverse lookup is much slower than a forward lookup; if you\n", + "have to do it often, or if the dictionary gets big, the performance\n", + "of your program will suffer." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.5 Dictionaries and lists" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lists can appear as values in a dictionary. For example, if you\n", + "are given a dictionary that maps from letters to frequencies, you\n", + "might want to invert it; that is, create a dictionary that maps\n", + "from frequencies to letters. Since there might be several letters\n", + "with the same frequency, each value in the inverted dictionary\n", + "should be a list of letters.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is a function that inverts a dictionary:" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": {}, + "outputs": [], + "source": [ + "def invert_dict(d):\n", + " inverse = dict()\n", + " for key in d:\n", + " val = d[key]\n", + " if val not in inverse:\n", + " inverse[val] = [key]\n", + " else:\n", + " inverse[val].append(key)\n", + " return inverse" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Each time through the loop, key gets a key from d and \n", + "val gets the corresponding value. If val is not in inverse, that means we haven’t seen it before, so we create a new\n", + "item and initialize it with a singleton (a list that contains a\n", + "single element). Otherwise we have seen this value before, so we\n", + "append the corresponding key to the list. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is an example:" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'p': 1, 'a': 1, 'r': 2, 'o': 1, 't': 1}" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "hist = histogram('parrot')\n", + "hist" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{1: ['p', 'a', 'o', 't'], 2: ['r']}" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "inverse = invert_dict(hist)\n", + "inverse" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Figure 11.1 is a state diagram showing hist and inverse.\n", + "A dictionary is represented as a box with the type dict above it\n", + "and the key-value pairs inside. If the values are integers, floats or\n", + "strings, I draw them inside the box, but I usually draw lists\n", + "outside the box, just to keep the diagram simple.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lists can be values in a dictionary, as this example shows, but they\n", + "cannot be keys. Here’s what happens if you try:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "unhashable type: 'list'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mt\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0md\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'oops'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'" + ] + } + ], + "source": [ + "t = [1, 2, 3]\n", + "d = dict()\n", + "d[t] = 'oops'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "I mentioned earlier that a dictionary is implemented using\n", + "a hashtable and that means that the keys have to be hashable.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A hash is a function that takes a value (of any kind)\n", + "and returns an integer. Dictionaries use these integers,\n", + "called hash values, to store and look up key-value pairs.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This system works fine if the keys are immutable. But if the\n", + "keys are mutable, like lists, bad things happen. For example,\n", + "when you create a key-value pair, Python hashes the key and \n", + "stores it in the corresponding location. If you modify the\n", + "key and then hash it again, it would go to a different location.\n", + "In that case you might have two entries for the same key,\n", + "or you might not be able to find a key. Either way, the\n", + "dictionary wouldn’t work correctly." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "That’s why keys have to be hashable, and why mutable types like\n", + "lists aren’t. The simplest way to get around this limitation is to\n", + "use tuples, which we will see in the next chapter." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since dictionaries are mutable, they can’t be used as keys,\n", + "but they can be used as values." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.6 Memos" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you played with the fibonacci function from\n", + "Section 6.7, you might have noticed that the bigger\n", + "the argument you provide, the longer the function takes to run.\n", + "Furthermore, the run time increases quickly.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To understand why, consider Figure 11.2, which shows\n", + "the call graph for fibonacci with n=4:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A call graph shows a set of function frames, with lines connecting each\n", + "frame to the frames of the functions it calls. At the top of the\n", + "graph, fibonacci with n=4 calls fibonacci with n=3 and n=2. In turn, fibonacci with n=3 calls\n", + "fibonacci with n=2 and n=1. And so on.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Count how many times fibonacci(0) and fibonacci(1) are\n", + "called. This is an inefficient solution to the problem, and it gets\n", + "worse as the argument gets bigger.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One solution is to keep track of values that have already been\n", + "computed by storing them in a dictionary. A previously computed value\n", + "that is stored for later use is called a memo. Here is a\n", + "“memoized” version of fibonacci:" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [], + "source": [ + "known = {0:0, 1:1}\n", + "\n", + "def fibonacci(n):\n", + " if n in known:\n", + " return known[n]\n", + "\n", + " res = fibonacci(n-1) + fibonacci(n-2)\n", + " known[n] = res\n", + " return res" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "known is a dictionary that keeps track of the Fibonacci\n", + "numbers we already know. It starts with\n", + "two items: 0 maps to 0 and 1 maps to 1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Whenever fibonacci is called, it checks known.\n", + "If the result is already there, it can return\n", + "immediately. Otherwise it has to \n", + "compute the new value, add it to the dictionary, and return it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you run this version of fibonacci and compare it with\n", + "the original, you will find that it is much faster." + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The slowest run took 214.19 times longer than the fastest. This could mean that an intermediate result is being cached.\n", + "10000000 loops, best of 3: 109 ns per loop\n" + ] + } + ], + "source": [ + "% timeit fibonacci(20)" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [], + "source": [ + "def fibonacci(n):\n", + " if n < 1:\n", + " return 1\n", + " res = fibonacci(n-1) + fibonacci(n-2)\n", + " return res" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "100 loops, best of 3: 3.59 ms per loop\n" + ] + } + ], + "source": [ + "% timeit fibonacci(20)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Bonus \n", + "\n", + "* [Dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming)\n", + "* [19. Dynamic Programming I: Fibonacci, Shortest Paths](https://www.youtube.com/watch?v=OQ5jsbhAv_M)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.7 Global variables" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the previous example, known is created outside the function,\n", + "so it belongs to the special frame called __main__.\n", + "Variables in __main__ are sometimes called global\n", + "because they can be accessed from any function. Unlike local\n", + "variables, which disappear when their function ends, global variables\n", + "persist from one function call to the next.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is common to use global variables for flags; that is, \n", + "boolean variables that indicate (“flag”) whether a condition\n", + "is true. For example, some programs use\n", + "a flag named verbose to control the level of detail in the\n", + "output:" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Running example1\n" + ] + } + ], + "source": [ + "verbose = True\n", + "\n", + "def example1():\n", + " if verbose:\n", + " print('Running example1')\n", + "example1()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you try to reassign a global variable, you might be surprised.\n", + "The following example is supposed to keep track of whether the\n", + "function has been called:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 65, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "been_called = False\n", + "\n", + "def example2():\n", + " been_called = True # WRONG\n", + "\n", + "been_called" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "But if you run it you will see that the value of been_called\n", + "doesn’t change. The problem is that example2 creates a new local\n", + "variable named been_called. The local variable goes away when\n", + "the function ends, and has no effect on the global variable.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To reassign a global variable inside a function you have to\n", + "declare the global variable before you use it:" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "been_called = False\n", + "\n", + "def example2():\n", + " global been_called \n", + " been_called = True\n", + " \n", + "example2()\n", + " \n", + "been_called" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Pause the video and find why the `been_called` is False?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The global statement tells the interpreter\n", + "something like, “In this function, when I say been_called, I\n", + "mean the global variable; don’t create a local one.”\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s an example that tries to update a global variable:" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": {}, + "outputs": [ + { + "ename": "UnboundLocalError", + "evalue": "local variable 'count' referenced before assignment", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mUnboundLocalError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mcount\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcount\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;31m# WRONG\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mexample3\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mexample3\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mexample3\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mcount\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcount\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;31m# WRONG\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mexample3\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mUnboundLocalError\u001b[0m: local variable 'count' referenced before assignment" + ] + } + ], + "source": [ + "count = 0\n", + "\n", + "def example3():\n", + " count = count + 1 # WRONG\n", + " \n", + "example3()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Python assumes that count is local, and under that assumption\n", + "you are reading it before writing it. The solution, again,\n", + "is to declare count global.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "metadata": {}, + "outputs": [], + "source": [ + "def example3():\n", + " global count\n", + " count += 1\n", + "example3()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If a global variable refers to a mutable value, you can modify\n", + "the value without declaring the variable:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{0: 0, 1: 1, 2: 1}" + ] + }, + "execution_count": 79, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "known = {0:0, 1:1}\n", + "\n", + "def example4():\n", + " known[2] = 1\n", + "example4()\n", + "known" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "So you can add, remove and replace elements of a global list or\n", + "dictionary, but if you want to reassign the variable, you\n", + "have to declare it:" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": {}, + "outputs": [], + "source": [ + "def example5():\n", + " global known\n", + " known = dict()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Global variables can be useful, but if you have a lot of them,\n", + "and you modify them frequently, they can make programs\n", + "hard to debug." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.8 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you work with bigger datasets it can become unwieldy to\n", + "debug by printing and checking the output by hand. Here are some\n", + "suggestions for debugging large datasets:\n", + "\n", + "**1. Scale down the input:**\n", + "If possible, reduce the size of the dataset. For example if the program reads a text file, start with just the first 10 lines, or with the smallest example you can find. You can either edit the files themselves, or (better) modify the program so it reads only the first n lines.\n", + "If there is an error, you can reduce n to the smallest value that manifests the error, and then increase it gradually as you find and correct errors.\n", + "\n", + "**2. Check summaries and types:**\n", + "Instead of printing and checking the entire dataset, consider printing summaries of the data: for example, the number of items in a dictionary or the total of a list of numbers.\n", + "A common cause of runtime errors is a value that is not the right type. For debugging this kind of error, it is often enough to print the type of a value.\n", + "\n", + "**3. Write self-checks:**\n", + "Sometimes you can write code to check for errors automatically. For example, if you are computing the average of a list of numbers, you could check that the result is not greater than the largest element in the list or less than the smallest. This is called a “sanity check” because it detects results that are “insane”.\n", + "Another kind of check compares the results of two different computations to see if they are consistent. This is called a “consistency check”.\n", + "\n", + "**4. Format the output:**\n", + "Formatting debugging output can make it easier to spot an error. We saw an example in Section 6.9. Another tool you might find useful is the pprint module, which provides a pprint function that displays built-in types in a more human-readable format (pprint stands for “pretty print”)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Again, time you spend building scaffolding can reduce\n", + "the time you spend debugging.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv(\"../../csv/movie_metadata.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5043, 28)" + ] + }, + "execution_count": 81, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenres...num_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-Fi...3054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1ColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|Fantasy...1238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
2ColorSam Mendes602.0148.00.0161.0Rory Kinnear11000.0200074175.0Action|Adventure|Thriller...994.0EnglishUKPG-13245000000.02015.0393.06.82.3585000
3ColorChristopher Nolan813.0164.022000.023000.0Christian Bale27000.0448130642.0Action|Thriller...2701.0EnglishUSAPG-13250000000.02012.023000.08.52.35164000
4NaNDoug WalkerNaNNaN131.0NaNRob Walker131.0NaNDocumentary...NaNNaNNaNNaNNaNNaN12.07.1NaN0
\n", + "

5 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " color director_name num_critic_for_reviews duration \\\n", + "0 Color James Cameron 723.0 178.0 \n", + "1 Color Gore Verbinski 302.0 169.0 \n", + "2 Color Sam Mendes 602.0 148.0 \n", + "3 Color Christopher Nolan 813.0 164.0 \n", + "4 NaN Doug Walker NaN NaN \n", + "\n", + " director_facebook_likes actor_3_facebook_likes actor_2_name \\\n", + "0 0.0 855.0 Joel David Moore \n", + "1 563.0 1000.0 Orlando Bloom \n", + "2 0.0 161.0 Rory Kinnear \n", + "3 22000.0 23000.0 Christian Bale \n", + "4 131.0 NaN Rob Walker \n", + "\n", + " actor_1_facebook_likes gross genres ... \\\n", + "0 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... \n", + "1 40000.0 309404152.0 Action|Adventure|Fantasy ... \n", + "2 11000.0 200074175.0 Action|Adventure|Thriller ... \n", + "3 27000.0 448130642.0 Action|Thriller ... \n", + "4 131.0 NaN Documentary ... \n", + "\n", + " num_user_for_reviews language country content_rating budget \\\n", + "0 3054.0 English USA PG-13 237000000.0 \n", + "1 1238.0 English USA PG-13 300000000.0 \n", + "2 994.0 English UK PG-13 245000000.0 \n", + "3 2701.0 English USA PG-13 250000000.0 \n", + "4 NaN NaN NaN NaN NaN \n", + "\n", + " title_year actor_2_facebook_likes imdb_score aspect_ratio \\\n", + "0 2009.0 936.0 7.9 1.78 \n", + "1 2007.0 5000.0 7.1 2.35 \n", + "2 2015.0 393.0 6.8 2.35 \n", + "3 2012.0 23000.0 8.5 2.35 \n", + "4 NaN 12.0 7.1 NaN \n", + "\n", + " movie_facebook_likes \n", + "0 33000 \n", + "1 0 \n", + "2 85000 \n", + "3 164000 \n", + "4 0 \n", + "\n", + "[5 rows x 28 columns]" + ] + }, + "execution_count": 82, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3390669.0" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(df['director_facebook_likes'].fillna(0))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 inf\n", + "1 0.300178\n", + "2 inf\n", + "3 0.007455\n", + "4 NaN\n", + " ... \n", + "5038 43.500000\n", + "5039 NaN\n", + "5040 inf\n", + "5041 inf\n", + "5042 5.625000\n", + "Length: 5043, dtype: float64" + ] + }, + "execution_count": 86, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['duration'] / df['director_facebook_likes']" + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0 907\n", + "NaN 104\n", + "3.0 70\n", + "6.0 66\n", + "7.0 64\n", + " ... \n", + "104.0 1\n", + "224.0 1\n", + "220.0 1\n", + "522.0 1\n", + "764.0 1\n", + "Name: director_facebook_likes, Length: 436, dtype: int64" + ] + }, + "execution_count": 84, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['director_facebook_likes'].value_counts(dropna=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11.9 Glossary" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 52e5c40e63858db59f6ffe70adff4ba12a7e1bba Mon Sep 17 00:00:00 2001 From: softhints Date: Wed, 27 May 2020 22:21:51 +0300 Subject: [PATCH 62/76] 41._Create_a_table_in_SQL%28MySQL_Database%29_from_python_dictionary --- ...L_Database%29_from_python_dictionary.ipynb | 421 ++++++++++++++++++ 1 file changed, 421 insertions(+) create mode 100644 notebooks/python/JSON/41._Create_a_table_in_SQL%28MySQL_Database%29_from_python_dictionary.ipynb diff --git a/notebooks/python/JSON/41._Create_a_table_in_SQL%28MySQL_Database%29_from_python_dictionary.ipynb b/notebooks/python/JSON/41._Create_a_table_in_SQL%28MySQL_Database%29_from_python_dictionary.ipynb new file mode 100644 index 0000000..3bee813 --- /dev/null +++ b/notebooks/python/JSON/41._Create_a_table_in_SQL%28MySQL_Database%29_from_python_dictionary.ipynb @@ -0,0 +1,421 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 41. Create a table in SQL(MySQL Database) from python dictionary\n", + "\n", + "\n", + "[Python convert normal JSON to JSON separated lines 3 examples](https://blog.softhints.com/python-convert-json-to-json-lines/)\n", + "\n", + "* Pandas DataFrame to MySQL\n", + "* Create table from Python Dict\n", + "* connect MySQL database and Python\n", + " * SQLAlchemy\n", + " * PyMySQL" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Python dict which is converted to a Database Table\n", + "\n", + "```json\n", + "{\"id\":1,\"label\":\"A\",\"size\":\"S\"}\n", + "{\"id\":2,\"label\":\"B\",\"size\":\"XL\"}\n", + "{\"id\":3,\"label\":\"C\",\"size\":\"XXl\"}\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Read/Create a Python dict" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idlabelsize
01AS
12BXL
23CXXl
\n", + "
" + ], + "text/plain": [ + " id label size\n", + "0 1 A S\n", + "1 2 B XL\n", + "2 3 C XXl" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "# read normal JSON with pandas\n", + "df = pd.read_json('/home/vanx/Downloads/old/normal_json.json')\n", + "\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'id': {0: 1, 1: 2, 2: 3},\n", + " 'label': {0: 'A', 1: 'B', 2: 'C'},\n", + " 'size': {0: 'S', 1: 'XL', 2: 'XXl'}}" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data_dict = df.to_dict()\n", + "data_dict" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idlabelsize
01AS
12BXL
23CXXl
\n", + "
" + ], + "text/plain": [ + " id label size\n", + "0 1 A S\n", + "1 2 B XL\n", + "2 3 C XXl" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df2 = pd.DataFrame.from_dict(data_dict)\n", + "df2.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Pandas DataFrame to MySQL table with SQLAlchemy" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# connect\n", + "from sqlalchemy import create_engine\n", + "cnx = create_engine('mysql+pymysql://test:pass@localhost/test') " + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "# create table from DataFrame\n", + "df.to_sql('test', cnx, if_exists='replace', index = False)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idlabelsize
01AS
12BXL
23CXXl
\n", + "
" + ], + "text/plain": [ + " id label size\n", + "0 1 A S\n", + "1 2 B XL\n", + "2 3 C XXl" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# query table\n", + "df = pd.read_sql('SELECT * FROM test', cnx)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: Python Dict Insert Records Into a MySQL Database" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "# connect\n", + "import pymysql\n", + "\n", + "connection = pymysql.connect(host='localhost',\n", + " user='test',\n", + " password='pass',\n", + " db='test')\n", + "cursor = connection.cursor()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Create table\n", + "cols = df.columns\n", + "table_name = 'test'\n", + "ddl = \"\"\n", + "for col in cols:\n", + " ddl += \"`{}` text,\".format(col)\n", + "\n", + "sql_create = \"CREATE TABLE IF NOT EXISTS `{}` ({}) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;\".format(table_name, ddl[:-1])\n", + "cursor.execute(sql_create)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "# insert data\n", + "cols = \"`,`\".join([str(i) for i in df.columns.tolist()])\n", + "\n", + "# insert dict records .\n", + "for i,row in df.iterrows():\n", + " sql = \"INSERT INTO `test` (`\" +cols + \"`) VALUES (\" + \"%s,\"*(len(row)-1) + \"%s)\"\n", + " cursor.execute(sql, tuple(row))\n", + " connection.commit()" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "('1', 'A', 'S')\n", + "('2', 'B', 'XL')\n", + "('3', 'C', 'XXl')\n" + ] + } + ], + "source": [ + "# read\n", + "sql = \"SELECT * FROM test\"\n", + "cursor.execute(sql)\n", + "result = cursor.fetchall()\n", + "for i in result:\n", + " print(i)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From eea532a7bfa000cf1b373585e48c7fd40c5a5255 Mon Sep 17 00:00:00 2001 From: softhints Date: Wed, 27 May 2020 22:23:57 +0300 Subject: [PATCH 63/76] 41._Create_a_table_in_SQL%28MySQL_Database%29_from_python_dictionary --- ...Create_a_table_in_MySQL_Database_from_python_dictionary.ipynb} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename notebooks/python/JSON/{41._Create_a_table_in_SQL%28MySQL_Database%29_from_python_dictionary.ipynb => 41._Create_a_table_in_MySQL_Database_from_python_dictionary.ipynb} (100%) diff --git a/notebooks/python/JSON/41._Create_a_table_in_SQL%28MySQL_Database%29_from_python_dictionary.ipynb b/notebooks/python/JSON/41._Create_a_table_in_MySQL_Database_from_python_dictionary.ipynb similarity index 100% rename from notebooks/python/JSON/41._Create_a_table_in_SQL%28MySQL_Database%29_from_python_dictionary.ipynb rename to notebooks/python/JSON/41._Create_a_table_in_MySQL_Database_from_python_dictionary.ipynb From 6f2e85f2d4d24d5be75da2d213ae56c50f767699 Mon Sep 17 00:00:00 2001 From: softhints Date: Fri, 17 Jul 2020 10:12:50 +0300 Subject: [PATCH 64/76] add example.xlsx --- notebooks/csv/excel/example.xlsx | Bin 0 -> 16660 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 notebooks/csv/excel/example.xlsx diff --git a/notebooks/csv/excel/example.xlsx b/notebooks/csv/excel/example.xlsx new file mode 100644 index 0000000000000000000000000000000000000000..d58d686be04b6c83457d008bfa2751eaf40053f1 GIT binary patch literal 16660 zcmbWebCjgZvIp9lcF(kJ+n%L8oJ9Xkkz1Y;9+*Z*6T(>tbm^ z`{$RA_-^T5df1=~_pl-7v_>q*oWeHze62wjA9BZZGfTu};Kn8$ct}puQg+GjYK^-e zA78yo@ARR zS&VtICDIpWiw(?5;`!4%&SjHa-|+(UbYdu}1USwq48$+(5x11BUI>i|v>e?A3IUwp<8OH3s9=w-Q z)TC9rMkWyQ!i_q-g4Px9E&nK%k(Gdew4+d^-u@H=FDMX@?Efh!w z&eY1-{*Rn0G{@~$hY&gsDGKY8iG{<#9*N`)-NW+%VoBr-xq#%nksPRqUktt&F5vi5 znb%)keJ%E5M)c^0vC21yW#VvsuFV#QmD$*oG}J4cfW|y#c{dkle~nu68keYwohD3I z-TS>QWW{hY*31l-b*!ed(LKBOpky0}MkZC=v(B7(zC5$hWm17@m#&v}2+QpSj~9lc zGJ-Z)+m&2>tYM||`+k_l7mij0EYajs*Vbb+s&^QHi(iBeCA7LUES@^v3zf_l$w!@b zIjCoLEYfAVz16yn<%|;a6zzIu)Kj%QJdX^_;PMmbu!b+vrF-qsMO2=fvE|n+GV)W0 z2-?pyYg9o0z~x}A#E0u-_>!L=%U>5<4dJn&7}<%7j9_Ra5ZW<((}a&s*6+oWMb1 zU^Gz#yCzX25Ws_+w#iI%LMebDASGE7fBgN2;5za=bhP*TL!G$b% zDE{{NFxZ7rWA_4b^Ieoo0CP0P9)T6iHJ;MST8O$Ii;m;8NV_0<)3_y!uTMl16FEEz zl6wP8Dbd3xNEb4qjs6kIbgS#*6^$`puT2VYryjr!4k!Yz;B(o^Cz01s6-{L6LzfR6 z(WVMGbn+HqimqvDW=>=#bV3VaChw|kt$9@)Xz0hu@c?e}VZr#N`wPehj8$B3mmNK- zoMF@<4{JxhvLC`A85Z)QhZQHG8V5H_+)+g}BcamAJvh$i$oB^$q8=QClAYp&ic$Ww zF{jx6N{Y)5`&O9m4%9Z&)?1~5D*6KQ>bg{KVpu0)=6&`l&2TZ8N<31#zFE%86vi1Gt!ua)#^*Onrj0kO%Qtor;YI$!XS69f zIgeEuNDm20ak|o4T4k{78wkbN@;8rep_qrp#z~3+9;HvBARkj6yPf)~8C2&WYaJoEvNc0OFKm&KS##3VDFwjwTP-BBb6uKxN_&L9zI)N1|BM-rO1)o3Fd@4xpKZi4wK6!KDcg?7%HTyk znfFD3UH!^xaRMfeXYnQ7ksvGoA`#J?)mg|?OAVsw0G<@9iX&w!YmMhC55+zx4tW;V zzO*g+NbDSy>9nq7SO7OGHkldHlXYg0y68eoIh83DG3}oE^nGO*2`ntB31!yf7*C>Z z0XwiF2y?})&`h5yO8PpxhPw*k%S66JJ<2z@4jX3_6v93B8F&O~o=z$TH0PaKR0Xp2jfqq}2p2Qsp75v&AFsM5L1sC|TcDzFzEffm~83FH}#K5bw9&EDficr|=k zsTgsAC8}~J6P*0FeUYR@L^u-ej2Wm^_u6{#)AW`ETM8xfbD;xizLE9*(o1p1AzXgB zKnd0Wh&&*EAlr|?dBtIGumv*r;zLNA5>*y3B{`+oFD@s(Mvl8#oltvYzkq+id~pxA z!nW(1gx%-eEB@Y}8WmYX-D7{w)Io>m(E@{jBF)YCoA6kzIM-)i8Cu(by&(v@qhV{V z#M2oE6NxA%5=}G_(P5wtxdpM6Y?Q{C=7%=b-1g<~qlgHRU!_8nr{DZ?4aeQktJhE= zN3C+3>!=VsPRw^|cA+d>N*~$P(NTf>Dn=s3uy$0*ju>hnb~B&{DhmdoidT57v_yVq zT#JHXTc@TZ+I>+4$K+`%9nqPx4Y_)*6a;v@yxh*3{M^eTK~5d$SqmqkBFOu$@I|>y znCn4Jj3M3!{6uIT;TPeA0S3|3VoVvj;8?0!9lL!|KmIt+ftPQg86--*!GWGt6LCn$ zFhm_gC}OgbKS=%#Al_86t`E$?API|^*lqNLM01A!2s>9zRV9QdAAJSkRHyleBUDnz zuP+BI4qvJhqvWafG_hIkexC$2(-+u5lYsUzo|v7gyh!opUjh@M!iUT zRFoCDZT@D7Ls%#rEADI7>UI(;X4-+wbAU&m#l~n@lVnOs%E+XgGx>E?73Gm?+VltX ze1J4ON|n7FnX#rl9Qfqeoj-*mp|m(*=LTK7z%5+eAP(8 zp6e61i!X?}Q{e4qIeO`FSqQ5UBpZ$(A|z$v(7ZA_^#TpnqmZScIPk+GGrCJ4;H#dk z0rBlNZxPo@Q+M+q30d!VT(1@NLWT0KgTMw)^Rt{vEjBpmES3yPsWCUmn%sGcDD|h+ zk3_L-NlUFKT9XeMF~+QYHa_~n7w)Sto^Z2E0V=2GU=h7jzu4{)Z3)Y~a)`2(Bs|$@ zjDw>jYkpz(1m$#ECBp}hPr;b-GKJ7}O|?Cl(YrBU@~JU=87u zp062lFXs5&+jW8$iDgnRMWGDFcWUQ5QvWKD_#K*4txuQW5f#bQTiE6tWc9A z=S3)tgu`s@Z;acHCr?Yo?OnkML92}@KJ95|9nLE5cX!`W{Yt7NT@+(-qu!<;;?G5d z!LX;-^|Jj+>64w33AG*d1lu=L%1_$&e7d)U*DKUt#YpstOD=X_I5&VnqnU6@R?H!kt2&r89p3xX_Z>5som9X&?SuP_D!`b3$ugh<^`L{C=8esy&(>7x9IMD4KWkLw z?7W_AWb&P1CRD*cnv;#kNCw}pmcWXy;;JOu0D&qwIWVH+PzIeY6vyxI5@(2rPEdWvT^@clnc!ppD(R_(PN4nh(P6&r$m5%4&lHJ)^IS`BLO72iKu#!!1IRmy zTrQrbT=Z2D%pQhoW$~tBE07?TOJ~jSfiLW`g~2dwXJlrNXBDmG1g--`hwEz=E)a)%=$t=q_??+WX8xP{|ZNO^jriJ zQ>qYmWV%X6Dy9V0uGzOY)Paanb_LX8FntXrl8FUMTi^U(@KxK|VHmR& z(}o_AqSa^0*RL1mW>Qg|dBS21>1qD2>ZDA-1)kXzo1o>_NQuL+72jHgE+;d-h`siy zw}1=34rGo*c-r#6d-URE;}O#{{&A@7O$-ek?CJjcVEW@Rc4Vx^r;A|rZ$4DX9uru9&$ZUX9?yrJ zv5?>ItHd!@Xs}ilAQ}-h@l6eVTeSA4C7O7MBrc`C-inx-Hf3j?;(ca(-@i~|b&(EZ z9A9h_wB3b;7cU_3|vGnufxPwTi-DWDB-tWH^0@x9%MKT3YegDU8`M{D`LU^wCD=K8ir0yRa5 zQOfu07{GA3_(6p}506qR&Hnc4i9{9-_Q5cr*GQ=wo`tJx&E~s>^OI9^rT=)jac9YB zNE~(9H$^yzI`!9dn(q1D?guU%?y;S*17hsMqZ_>O3tRKdqu@u4_c^e-m&Qt3o$FWg zj*@%Flbbp0ADNe=m1PT7Y(f4@R-V$0ZAhV8F1`0n{@32)VkKQ%8TBjAj~8#pE!L&c z+?#Fm7WZAGjk5$eKgk(M@wH`X|M zsMaHSv&`<4Wgd4boK;CLH`S7dH-K@})r6L?k4_x%myh~R_EV}W4NuGyz9!a_fe=nV z81XgBwqnr+DtQKBXj_B5Ktm^X@eV3^CSh>45zTG#K%((VMf(%JAFFjM87E?z{m#3D z-WbT^7|W=g6+l%5cqgd82bDc@YgzW=1?zukqdjz10yhaxYS-NFG-&+1$hbOc+33zB zT%&ZzBgYDy$qp?RsYgMY_`<9pO|+WkUK)EcZwyf25~7e1DjAD3@N!6s13>*iAIa-5 zMu`P5%EplF=KiLceuwb2Gr|NZNMi$4A_HOBhoG43>>0=R=?%1qJn$PyPjF zwWa@Mzg=@E-fUw`;lZQA9Fc6UTs9t`55bfa%!zQ>p{W#=bk!j{HzL_usX#OsaM+N+P9NEKM}*3IVW1)-@eY%}x_Zmn zy%o!_`Lq@gKfC%Fij<{-bV3+W7WivL?7vz3JFT;Thmow@_4wj|gpui8^$kK>A+>nJ zw+7hv+p8;tSga2yGCHv&97AKsu??EzkD5s2Bx)ytS0y5%(VD-u?5QzX&@<#VDR2k= zP$?<&13}ad%Ymm{iPL3=uoa^Z zF#w53#B9kj=fX4K9v5g+{SFk*E-a}Mnur+B2S7}4l0||Fa?rH+!5~r@E598d#$cd- z_Dd?)+6FNp11vW_d(J2i>%rCktmO-fB;L-TLgrVkl#SgbmsDm+<>gWvyaQPDSqcm;`*Yt zD5@ks(8DYsw*5H_z^|b|to{H7`Fr2M$OZ}n5d&@_v%wW-vV{DYz{!1Cc~$`_Vhmb4 zgj#f>0_@H1rr8J#B=h9=;!Xky`JuS{A7Ql!W`x6hsH{K8B4X`81Yh!Leao40oyt80 zlK`y+@U_IV41qBu978?sR0jD=m7Nm!r=J2d-;5!l0Eh)pi!DR4$Vt?N&kH&kMr!^X zaDxJ5A~YDBVbu}`QisNnwBV9PB>GZ+TTWy^rz+@!p zc*dtzYdsQ|P)jVIly4J|Cd!zwPhiL(BH#{g-J>uN5&AzMX8u6j^0)>-ELBo#=rJrv z(0|T^`@G0{Odh$8SU)M(;S+)(WuM?1y@&wSGk7>dUQO0tbE!k_rv}{0GB^aJ6rhjw zOe8=V5a$~pcMcehrKo<_O*+B)!@=eAFOKUN2y~*)iys5`R|-Ro6UOTzn-N8rS{Uqy z!BGe-5P%*AOB!il=-bfIEX8hr{)%ty9K}ehc$<>kn0$y@*yl)dlZ^1Fu7M2p+Gsl( z?6ldhlMy;Tcj-vn$-u`|jr>^8m21ww#%oXnW3Wd>x*$RR*UDTZ6p$P0opLE6N68{l zfJb=r0S^qp9hIRjgwHK}jVX8zq828QAOKK|h!lq+gR1dhXpX}&G1GeLYW1iZ3!N#- zFf2GyQUy{6#uSia<&et50-i)cBVtF}(EuG>5e=XQC7l=&asxdgfZ`NfasYKdeUglN zMMf;NO+L$Rzv9BzkOFr++F}ZW?UW%|CT`gw)igb7JdIBdDiB5jpQ|3d<%fu)GgWsy zx<5=7C9(?4giNK(y{f>hRp>+*kL)@oBd)O!s6=j`N`oMw(ATqUkW&cwTpgwml&w+a z0Z~pwojhp_I-_TCh@Kfwq@==tz&<21=+6l;3#z(U%6x94C-x295f_FMr&PeaGjCF% zPoN5*m;x<_)Xen0L_hv>JFX0(tt$+Ki#34i9(T6YFf%PfKZ!HVR>V{>_o}d4dG??~ z=~Y9q8n&k>e!omScQ4K^bP0?n5*&}X6l|Y?Mf|d)Zx|7fXA3fxaHTK*sld<3+FC`H z!cWkLp+RXF+rb_M7@g2V77}L2qBFZA>e~sxBDYSZp3Zuv;wtPn%yVOpJo!}P@N#^Z z<*-^}jLIoB`lV(N%`=|8iG?y|!x# z1EIt9ph~HoB_pz>J%=JDW9Dodj zq9d>t!}c1jxQtPqN{F}f60IC?a;n!FDdeA?>=vq#n^#NjxCK^S$1LxjQYtm?6O?>~ zdD;_x=GIZMMI||_ z^4rAD^5Vnklg?;8Cp9%|3fk2)rgk0iXs(A+m9Ei^_Mv&(S&8Ji?Zth=N6a%IegTv# zW3gGMY3y1`{b#2UN3o^mgOsShYpd##LTzOli1veN*VC>CkEd5h@%g}$N2gZW@AD#; zh9?fQcTZm24y!5>jfc$qjWhn6!y=KU7dtM_4WyT=YKyh?58ywh_hQ}p9SaN)klSYj z>;Dea{2!(_!+)9HI`%v4sGp{{f-KkADkk+4UI(#)6epmuv}m^BIJIPj4MNhUp+;?0 ztng*F3IOTr9)*;38wb6EKRlRh{zja0T;**Y)~%q-=BlI(>uJq}4cO0BKiKc>zl^rN z@h0u%pDo;Hq$y==Lw_Gi+pbFfNa0brdoap0Jo}|6%Zk?PxN5~4Au_I?X*n}c)~IH> zp{8_4YB@gotT$(IO0DIDw}MQ>g8L&+DlNHJ3l8tZ{-jvB%p=yb6g4`%+`>6=PA#^v z@MWBHleKr$bg^gAt$HM!wamLJsc^Nd-h}aHgVl5By!JlY^1MGEBB-e@}EoK zF>KrLkejycKCkmX@Ls2>vBtGZt$2R?NKR+E@Ecq^>p&ZL+GIQ}J5IZnwE19;>Pq;{ zUFl@xHIY@@bJWHfJsOeRW{#f8Y|Ftiz_PAmt`}>C0ik5_K7O`1>xJgMq@gFZuC)K6UCfE+RN<`koX~i@UtgI8;rLLZwc8fe z+30o2cWY4O)tgZ1*(_wg-^hW*FSS_uRQ<-oqZL(@N4DDCq2H)bOcdx{+AI*rspKIL z$fKlb!0V^bkBzcjvICAX&CkI2jw`7gC8ex2cq=!kuXintfK7p@$4E)q15kY2(bm2o z)+3{EPJ~1{B){qiH8}|ii3?@>+Oobo+yPtI3)!7tCT1`SCwkg8Re9%O7Fv8SK3yn09Gl4JjJE-u$#m7_ahii zvZ=?lmoz^VyEO646;u>GAS5p!fE@v2(9^1!*9nfxpk7n6+oH;qsSF;D*e-XPvF0}m z*)EE$_o0v=VP-MmnXp*&)r8nA#BPI!*EtJW(6BBUX6j@p7~-?`jgbghTEJrz<#&tY z;p0y8$JFQ%%>%M{9Ezjc;WF@ujBsZ7 zTXKXshbuq)5<;z|nW?AYiz7RlC!H7RPlzJTI5u~PMD=s|;U1++#quv8<3i~?$*RR3 zLxu^7xG?=P)i2V-J^SS0zNvs@>T;X?a+8PKa{&{7g3CaHlVAbU-Tf8r78KiQ&U%L+ z4Vp3}TnVd7iN=z$C&n$#DRZQNDYK-ovkq5wXaaIJ=dkEE05e~5*w#K*&tFVl;mEz_ z#CC(F`dPFb7}gBZLSQTpTt7<@lp(qrqx;k9Ot~tC-kqBDln||KQ6gvpx3e>f)1wGA87dAM`kkRVb7P@Y*{wv-3|{T!4Bm_gM~qb%J1jzAEe*&b z!(=SgD|Ok4bf0Or>$D0x!W=qga0G8fgyW3HM4u-In-HyZfRDTon`$K}CBF8{O8mNy zsfChjjz4$>or<(jzDcG+*sbZ`jxkCs!)&433-HS2o=-l|>Mav+NrV|-3Y>!?!piNA z`2kmcMp3H_Jsbp2Taz`nXHNH9g0*s4iJ&dW&PyKQC}EaG5_BRDJRXkF7-Tp|w?v4% zaN1LYCBfWA39>RLu9C<`ppYL5fa%u=e-@G+V-gYo_T5*5>GpV(v2s|6pd~0y6NO*e zr8*ZJl|KbMh29Bhel2}@l*|@v`5suoL3Z(K?D;*V-dulPwA_xUU-Bn^xe(}6CLayh zcYPz``}#;si&-@}>8C7ra&>V|dYmRmS1IPLY+E z(=2gErIV@-B$#|Eg}|o>?0T|-$_o2N3s{8WZuafwX7i#IcBz(y!-d-PZ?I;DcYJYV zZTb7 zQ<(Vm;#@z!82?5j#k9#X(M`7TDW_1gl4iCDiWvTlthTwMu$86(x~JSS!U{8?YE6Rq z%8jwD{K$`?C(l7O#++iALU}d7UDxAlZUq;kK{Q+eyu1xF@Uy!@5_BNGK(WO>&Sc5@ zNfrfMdxohHO4HulXAM)#3T|TS&#HuiN09Nd28Qrn7?(*VEveOZW;;h#0q>YoU$Nk? zL2dl3*%JTkB8F3s3S+uDo2)JHY?5N%V*Ka`+0QaF`RuImdyYZ%guHvbsJ+eI_MHo? znq6`O9fWdu8Rv&uqNTc~p<(*`k3&i>L&tQw%gE{T>gdhcX96}jv~>bZ13Kv)8}HhI z)6vEiC*z+>j^i(-ED!IOcI%nbSCm>xO76FA4(#%Kw%?ZuSIav`y^YqRJJa$6nUr1E z>+ek>=%UlzUu?abzk30^ar;I`d#90meJk!8&sVgSC-<8il{eDNS!`7g59HgBWN3p{ zORY|st);rc9j@LCT3*dFn}U&~z(+Z`Ik^C_DE&$eH9v~byGBcy4Wp%4lRM%Iy;lyL zlDuC}?#+jDR}SvapBBP1HEt&Ix*2sU#&9TBE8iDsJCHYNlNV*GHd{wc8#K?Ogj!xn z)-669=ilDgaXOMThAZzo`tIGP-d-AJ&o(onqTKbhaihvwUU~mmq9QodB0|tWK-VJw zKP*wkzbsL$b^9F=T#v3Uzn$%kiR>r%n7Yg`;&jb<()3m-TBV0Vmg_WVk&Kv*<+oEA zL_CponjOV{w`G>xV~mCsqPy3itNo@f-b(OEwHmF1(=t692HdZ;cUqmQmAqPT-cL{W zc(dEO8il9Ats{%e^Oq8hI9~2=O{0^pSF85~qa*yn)f?Jycq4W<<0kbT?nAhCzA0t`A>}PJV5l zP2cUCd~BbN_Sc%_C>zQG9#ncS{HzZ?l@+tkJ8AjwYHkqHwhU>wl9qsJc-j%H$ESifwm6Q`63QY*cWlEm8z1>=o zZz}VAM?wBouJ4cAv|37xUv{sCPw~>Dcc;~-oR5=VOQsD!tj?06UEei4iw|%XlS?=v z)wiV7zngD{Z{t@)@ff#l5TD=Po#*+VeWXdqcR07}vs@id2AkWjx)il8xsy-CRyAv+nUz=`*VXGi<5e|Pbs4|$)p8TTNoimZG8sO5Yc7x4`L99e%Q zMwQP3D-JTmT+j-vuZf@u1hJ989yk@7AW!@tZ{Z+ffwsN4YEOr%kpsbf-C;b?CMb6M zT!#M)2jB*s>bo+=h~vKk0;2MMk6e3d&2r9(p9Vvi_oc}gtm$3eW-4ZJgg z`T^GiLu1ArT7?FIdTSI5IMrtZ1Z2^R7N3cx@l?ZtYzg7rHy=MNV|RYCuw!i@4vP!AH8n+nX=iSWHlDBvjJV?2D|KednT#tZ@q6;hTJ??I{o5$gX|`I^a)iZ?ZrcuC@6xMXUK_t4-B>IPX$xm;g_(`_A0(T&sZtY z;AD5&me%%7cP_YcSv)c@kSGsdk_&qgIBHiNhS&#`kalaew_$Ya-}b}v`)~HHcb4sR zAZ3HUFm2vpY`}@Xve$t}wO6GKy+8|X9PVMJQNX=~m5x4l7%%b#s6=nGY*C|c0bxO8 z=z#uMP>}?90_PrP`Y4oRL{|vjDf!QSy>tIo>nmFOn2-I15TzBPl&-cMFYnoJ7^1in z1Vd4$Pc694)?Uo;d*2236gc5@E~XrQ55ZSyTYN^E4PM0IfcSXCfZ2H9L3IOx4BJn^ z(4PK;X|&Gk6m2wFB8R6yh*!l9u?WZDkdtKhv?byPct$nNXF{G*|23TrBrAME+9(P+ z{2jsx;z3`?7F9_xVkB1`-=UWbdqK7V3>>j}Jx_buqt4K~NXXW&{*?%(3__<^2o7W> zp8Wy+{a8=}e?7%xY}k{i1mBIfWWTVUglC6GD66@&r=P9@;gH;c1J zdpq||uDw1AR1h4lC>R(CY4lB5SP75xZCT_bRkb@}@vnIx)<>RIQ45GH$^;R z?XNg^7zwH6DHcL9B>Gi{g%(usYk3Ye@pfW3k`=WkE;^{qg~lAw?IR(>73?}gT2wow zkda7ezU1c5gW!mGQc_H!EQ<9vSG>1mys$2yaiD>E0)$li!kjFGCi>Hy2vcn`BSC7O z2@hwGZ#5eazxVN?{plFN-gboKvtxZ=E>^+=eJHM2Nd;sbUn6p!!L_L`kM$JAV`3F8 zG+q}Biy!TuxqXT>iR5XOKZttOrng|-EZFxk%OlC+ zLd&c80G`q<3NrJWV)NS`tL`HWg!}ryM77u`a7jZ3qZEr<^f98O4xL~*Hx;5S-6;om zD9OjwooDVGHT!pc;8a_&`}v~Tk1W}HEZOOcK2|!QX&=DFG;b{-FRTNHA>m3ZLw3^4 z&5!tv88&b{Pb6_rKbc7&7acT|M*3Oo?Zr@>Og29R45XXEAv{JIrnhGNm+XGAG~eRK zN;1u9`%HanLKJG`{bnYy>E79pVJ%lZWGBMhe2(9UZ4FoXU6dGUS=)E$9$IMTCqzTR zMMsD^$69U<(J0jP)=0pTT_Z~~BSEaBLy;1G>ntU_m^bcRQs)D7<-SRHxDFxJH9>_| zb%q68g_f-3Z?`Xmi&Xz&(KY8-+To_DCB)tt6SNSn>~F~tVz}~D_7!yl)<2%~UzvHx zGNUO{pCha##N6>Pwh%7v$2b;ZoL+Q84(VJX;-h|ZmDPB*PrbLyKa>og42VVPhBqEH zpdzG6@G)uA|x7I3uHpW4N<4GCWrFE9DuUD_@zg&*tCaSy6xgob56 z2G&athO$eXIv^ue%*Nu!Y5S-?-()x5b`+@%nxYz)5t0cINY4&P=c`d~k%WbQ_Ld7A zJla5;p9;2o6Ll=H=$_$-ZEVZQU`L3uQ(L$c${s$NhBz!l6a}JJXy*I%=DxfmJIz5- zyLIEHe3vWyrXv=q2401wUwThXb3TEb)NLRZ-(x@yAJ^_(D9f?qP;Jl^^>CMnOrN0A zJ0f$auGx=LPVyE&$NvbpXqGRtsB+t2*}KIx!GTbJ71!GnqV3xU)FAnyCdZS+g$8{N zk02PPA&V>uWxn2g=$Lz&Q@UO?F+2k@s5yn&iEvu~j67CSl_CMZ9J!J2Mc;o)Th8&R z^Fj2KeMWzo*^UrnN4<6_R5aXt_;V7SDo|+y*fpqX5ger*QFJ;_8)fxbOf(A(M0B|X zmS{V|g#~be$VibKhye!DFb=QJKxTCA8#NSByl7^d#A>cz8=5ciuL-es%<7iHWy8zH z5&J(e@vl1$1BWJR?@JGH45BVNL{B;9ce0>j(Y9lh^Ic=jOoh_`gl8xr9&`_W`d6u= zPBdaS9f#_ZEvN<+hc+gr{Aq`(5;|=%#7G-MW zhaGkv(kHswsc)oijDb%Y5bN41^teg+jB;!*IDbn1+% zI?slkt{+DAH5SqbJ|1r$Yn~o$nwUPkWG-|+uA41iY-DFU>Czg}o(}IOzb!v|I#wor z3paEu9mfN0#Ri}(DLY0JR$No)q$I`7Pg5_jlm&#PbmsA=709*Z z!eu({q$+$53c;!qFZ1xxUr_er>LHolc+pGA+5HpsU!e0X`I#ZjVml~YEr;~AZGdGg z1#v4*C+h`Ppim+n!&O!uRz}HJ&rXr^4CH0?fUs!huO#l_Qu?Ec;rm8Q0&Os1+hR8H z6Z%D|{Rajv`?gKIp3;j1+m+I~$W2R@=qmz3Lg!y3ScuIyUXqPxVNYU>3YoI9TE^xX zNSkwS=HHGgC%KwPXl=MsV85@@Tw0nMFhUg}Tv3nXrt$X!kag9&8)_;;1ap>#fvdG| zdV*fdqShu_XXCB1AF|E1i*wIt2S=pIP_-beToxX0Bk}|{_vQRuGCO#-p2w7&Zbk05 zCVvZ^70hE-0)txv!Dkh1`n40plxHP$sv(o{te|v7 zF^&1G`wLPzCf2fa`t9Q603W_qBWa@Yk5Qz@$LF=)?Ufoqm&PFqmp%c+i2-`{Yl2_Y z2T*!-4DIRCJXw7BDEd&j8QTn+`cRX(-*OZs!mODhP^Mrdqx1?DuZC{`8Qa*yMB2KA z?((#D`3?$WTxyUJh^FBCbEi-5Q=#~s*nLOjMl!*;1+M_!7e4GbB%EfdpTPcw2?E8> zGy&g1H9#+{sK3Og^N3W|O`gxLl%^V^W>gu5mPH^JcKBR~Yap+2a3E3?%}~%})aosx zB6DFrBm8XVR$H5+f{@qI+90U}itd}3$jk_|e+4RS7KeTs)e*Obam%~+T%B{a zIJGujJ>7f^@}K%CMfJ^7L){4Jtt#qF+32IMu6v};S0U?6FtKGnrB~^o-1tS-P_PJl zHJZ}i{4JWrConvTqwV$LM{3J{s-WS4(xKl%*V{OT9;602y^id_<%oxR9?jzUVE|=M zsWMmcGY2j8!bpR({qPO;Jau=rpnW?D+~f9#MR5>L={BmOJpp}qGTdx&@7!iqN#1FG zFRL5(Dhq0%CT8yJ*J)j*>66zYoyw+X3#>PgKe~iCiE)YWsYfq3|LK7K3=aeSS25#X z1&kGOlj5KZ@WB_JVKMH`jVMBZx{i>kbWu;Al(Ba`wp4lyM^ATuNcEBeh-fjdn{hV5 zv~@(IyReKOVItP0>QoU?ujuZ1KfgYlVs)6Sj1x^LixYn{7%83Wv$(~NvjVQgqF>UK z>rvI_+|ez-ZV3jg1Ka~FwRAu~07v&BCHEU98!wT{!4Cn+)g`AQ!OnZyAR-*oJ1V&` zG!Nt=bWH6CGLu3IO4*pwZ5DLSPwvu?=#TOqcY&5CsqDP4|SEx~q~yhw)umGAZ( z&5$C|@9l9?$D=0mzoyAdct6A75RgWteob4S)BgT>(EQ(FarjRTfVF|FowbcUot}-& zpFbfpV`OAL%RBooJi;hf8l-fK6&2E;0@$Ve1o_uWscoCGUln$wSDVOu>d(f;Cnukm zy*xY}8aeq!NFTG-SR)j-l%@tQ+^%`r=Wt3G#j6V7aNX?uBo}sqx(eV4a(p#W{3f}V zi3U&#)Xutx*N|7yjG>3cs2R^pQb_Z*IEt5jf4B{*N^FO0u^e$``R+wGb8dO(a2Ef3 zxgN9$3nY64qE!#YXYWW5Kx&=)=nrhh0ayx$%5|zY( zM>DUSoXJ1IkRa<@4<5`PBBKt|%7}*NFqT@2c%QwM>wy2`ZX-l)iCJo@FLUk=Z>On$ zwiat1tyT*KB7tO3gnaQA^gS6fP^8^dQt=vwsi#4Dfi8N*-^zIz;e;(5&6BAE?B&3p!T85KNmO zx_{uX|0E%GHdjp#SYvPUzK}Wur?iTIyQ8S@IP^IDB{h0gYFSGH&N!j0FMBH-q|9OY zyB$$xw(V6t;zx&F>%Az(q4}Ka*TGG zs_JiFvb=rbYtwU9Tw1q)fqDH(l7R0FVAoO+AK3HL$0G#J=FCUq{5<$dN<<>ow&EXr zu}j>Fh>oI0nT(g8@KOyfW$a)-=m(uhCR;H*P%D`4r@RIn{f%|TCLHiw?eV;KzuNVQ z*<|@d%2b9@DVxo;7)!uT|F&*1phCNJd2BS7_a?g+Kk~}RbN(ZJQ4&l;?oa6(fBB;z zz#zy#e;1Aavr76;(fI$Q|GR|z-*Nt#KmVtu`fsB6?D>xw^zRkc|Bmp_G@w7pK7W(S zr+NN~@V{o=|2xV*Lpc8vW#)5w{|m}r2}J*n@=veff1+rT{I4i~1%Lh><)3Ea9}nYi zqGS4hQU2v`{Cnen-iv=M$-gOp<*&wn+m-*`^q)7!pNsQvnqvPS*XX}D{O3vf=VJMr zjye9n*UZ1S{-=`vIc@(Y#P9!T{jan4-<$t)m;N~k{-y@5|MgsumG}bwhY9BMHTrq5 KF!B6(`hNhcCFnT- literal 0 HcmV?d00001 From 6ec44474b9c53d5221783cd6d15f45d9384ec61e Mon Sep 17 00:00:00 2001 From: softhints Date: Mon, 9 Nov 2020 12:56:43 +0200 Subject: [PATCH 65/76] How to Convert MySQL Table to Pandas DataFrame / Python Dictionary --- ...o_Pandas_DataFrame_Python_dictionary.ipynb | 222 ++++++++++++++++++ 1 file changed, 222 insertions(+) create mode 100644 notebooks/python/JSON/42._Convert_MySQL_table_to_Pandas_DataFrame_Python_dictionary.ipynb diff --git a/notebooks/python/JSON/42._Convert_MySQL_table_to_Pandas_DataFrame_Python_dictionary.ipynb b/notebooks/python/JSON/42._Convert_MySQL_table_to_Pandas_DataFrame_Python_dictionary.ipynb new file mode 100644 index 0000000..b762e57 --- /dev/null +++ b/notebooks/python/JSON/42._Convert_MySQL_table_to_Pandas_DataFrame_Python_dictionary.ipynb @@ -0,0 +1,222 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 42. Convert MySQL table to Pandas DataFrame(Python dictionary)\n", + "\n", + "\n", + "[How to Convert MySQL Table to Pandas DataFrame / Python Dictionary](https://blog.softhints.com/convert-mysql-table-pandas-dataframe-python-dictionary/)\n", + "\n", + "* [PyMySQL](https://pypi.org/project/PyMySQL/) + [SQLAlchemy](https://pypi.org/project/SQLAlchemy/) - the shortest and easiest way to convert MySQL table to Python dict\n", + "* [mysql.connector](https://pypi.org/project/mysql-connector-python/)\n", + "* [pyodbc](https://pypi.org/project/pyodbc/) in order to connect to MySQL database, read table and convert it to DataFrame or Python dict." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](https://blog.softhints.com/content/images/2020/11/MySQL_table_to_Pandas_DataFrame_to_Python_dict.png)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "password = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1: Convert MySQL Table to DataFrame with PyMySQL + SQLAlchemy " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},\n", + " 'name': {0: 'Emma', 1: 'Ann', 2: 'Kim', 3: 'Olivia', 4: 'Victoria'}}" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sqlalchemy import create_engine\n", + "import pymysql\n", + "import pandas as pd\n", + "\n", + "db_connection_str = 'mysql+pymysql://root:' + password + '@localhost:3306/test'\n", + "db_connection = create_engine(db_connection_str)\n", + "\n", + "df = pd.read_sql('SELECT * FROM girls', con=db_connection)\n", + "df.to_dict()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'id': 1, 'name': 'Emma'},\n", + " {'id': 2, 'name': 'Ann'},\n", + " {'id': 3, 'name': 'Kim'},\n", + " {'id': 4, 'name': 'Olivia'},\n", + " {'id': 5, 'name': 'Victoria'}]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.to_dict('records')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'id': [1, 2, 3, 4, 5], 'name': ['Emma', 'Ann', 'Kim', 'Olivia', 'Victoria']}" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.to_dict('list')" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{0: {'id': 1, 'name': 'Emma'},\n", + " 1: {'id': 2, 'name': 'Ann'},\n", + " 2: {'id': 3, 'name': 'Kim'},\n", + " 3: {'id': 4, 'name': 'Olivia'},\n", + " 4: {'id': 5, 'name': 'Victoria'}}" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.to_dict('index')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2: Convert MySQL Table to DataFrame with mysql.connector" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{0: {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},\n", + " 1: {0: bytearray(b'Emma'),\n", + " 1: bytearray(b'Ann'),\n", + " 2: bytearray(b'Kim'),\n", + " 3: bytearray(b'Olivia'),\n", + " 4: bytearray(b'Victoria')}}" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "import mysql.connector\n", + "\n", + "# Setup MySQL connection\n", + "db = mysql.connector.connect(\n", + " host=\"localhost\", # your host, usually localhost\n", + " user=\"root\", # your username\n", + " password=password, # your password\n", + " database=\"test\" # name of the data base\n", + ") \n", + "\n", + "# You must create a Cursor object. It will let you execute all the queries you need\n", + "cur = db.cursor()\n", + "\n", + "# Use all the SQL you like\n", + "cur.execute(\"SELECT * FROM girls\")\n", + "\n", + "# Put it all to a data frame\n", + "df_sql_data = pd.DataFrame(cur.fetchall())\n", + "\n", + "# Close the session\n", + "db.close()\n", + "\n", + "# Show the data\n", + "df_sql_data.to_dict()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 5bd29636e8b1cf6bc8877c3589488ca20f0384e1 Mon Sep 17 00:00:00 2001 From: softhints Date: Mon, 21 Dec 2020 12:49:33 +0200 Subject: [PATCH 66/76] update for future pandas versions fix iteration problem --- ...ount_values_in_a_column_of_type_list.ipynb | 476 ++++++++++-------- 1 file changed, 263 insertions(+), 213 deletions(-) diff --git a/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb b/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb index b15791d..42f168a 100644 --- a/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb +++ b/notebooks/pandas/Pandas_count_values_in_a_column_of_type_list.ipynb @@ -34,7 +34,8 @@ "outputs": [], "source": [ "import pandas as pd\n", - "pd.set_option('display.max_colwidth', -1)" + "import numpy as np\n", + "pd.set_option('display.max_colwidth', None)" ] }, { @@ -161,39 +162,39 @@ ], "text/plain": [ " Respondent Hobby OpenSource Country Student Employment \\\n", - "0 1 Yes No Kenya No Employed part-time \n", - "1 3 Yes Yes United Kingdom No Employed full-time \n", + "0 1 Yes No Kenya No Employed part-time \n", + "1 3 Yes Yes United Kingdom No Employed full-time \n", "\n", " FormalEducation \\\n", "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", "\n", " UndergradMajor \\\n", - "0 Mathematics or statistics \n", + "0 Mathematics or statistics \n", "1 A natural science (ex. biology, chemistry, physics) \n", "\n", " CompanySize \\\n", - "0 20 to 99 employees \n", + "0 20 to 99 employees \n", "1 10,000 or more employees \n", "\n", " DevType \\\n", - "0 Full-stack developer \n", + "0 Full-stack developer \n", "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", "\n", - " ... Exercise Gender SexualOrientation \\\n", - "0 ... 3 - 4 times per week Male Straight or heterosexual \n", - "1 ... Daily or almost every day Male Straight or heterosexual \n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "1 ... Daily or almost every day Male Straight or heterosexual \n", "\n", " EducationParents RaceEthnicity \\\n", - "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", "1 Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent \n", "\n", " Age Dependents MilitaryUS \\\n", - "0 25 - 34 years old Yes NaN \n", - "1 35 - 44 years old Yes NaN \n", + "0 25 - 34 years old Yes NaN \n", + "1 35 - 44 years old Yes NaN \n", "\n", " SurveyTooLong SurveyEasy \n", - "0 The survey was an appropriate length Very easy \n", + "0 The survey was an appropriate length Very easy \n", "1 The survey was an appropriate length Somewhat easy \n", "\n", "[2 rows x 129 columns]" @@ -361,64 +362,64 @@ ], "text/plain": [ " Respondent Hobby OpenSource Country Student \\\n", - "0 1 Yes No Kenya No \n", - "1 3 Yes Yes United Kingdom No \n", - "98853 101544 Yes No Russian Federation No \n", - "98854 101548 Yes Yes Cambodia NaN \n", + "0 1 Yes No Kenya No \n", + "1 3 Yes Yes United Kingdom No \n", + "98853 101544 Yes No Russian Federation No \n", + "98854 101548 Yes Yes Cambodia NaN \n", "\n", " Employment \\\n", - "0 Employed part-time \n", - "1 Employed full-time \n", + "0 Employed part-time \n", + "1 Employed full-time \n", "98853 Independent contractor, freelancer, or self-employed \n", - "98854 NaN \n", + "98854 NaN \n", "\n", " FormalEducation \\\n", - "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", "98853 Some college/university study without earning a degree \n", - "98854 NaN \n", + "98854 NaN \n", "\n", " UndergradMajor \\\n", - "0 Mathematics or statistics \n", + "0 Mathematics or statistics \n", "1 A natural science (ex. biology, chemistry, physics) \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", " CompanySize \\\n", - "0 20 to 99 employees \n", + "0 20 to 99 employees \n", "1 10,000 or more employees \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", " DevType \\\n", - "0 Full-stack developer \n", + "0 Full-stack developer \n", "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", - " ... Exercise Gender \\\n", - "0 ... 3 - 4 times per week Male \n", - "1 ... Daily or almost every day Male \n", - "98853 ... NaN NaN \n", - "98854 ... NaN NaN \n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "1 ... Daily or almost every day Male Straight or heterosexual \n", + "98853 ... NaN NaN NaN \n", + "98854 ... NaN NaN NaN \n", "\n", - " SexualOrientation EducationParents \\\n", - "0 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "1 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "98853 NaN NaN \n", - "98854 NaN NaN \n", + " EducationParents RaceEthnicity \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", "\n", - " RaceEthnicity Age Dependents MilitaryUS \\\n", - "0 Black or of African descent 25 - 34 years old Yes NaN \n", - "1 White or of European descent 35 - 44 years old Yes NaN \n", - "98853 NaN NaN NaN NaN \n", - "98854 NaN NaN NaN NaN \n", + " Age Dependents MilitaryUS \\\n", + "0 25 - 34 years old Yes NaN \n", + "1 35 - 44 years old Yes NaN \n", + "98853 NaN NaN NaN \n", + "98854 NaN NaN NaN \n", "\n", " SurveyTooLong SurveyEasy \n", - "0 The survey was an appropriate length Very easy \n", + "0 The survey was an appropriate length Very easy \n", "1 The survey was an appropriate length Somewhat easy \n", - "98853 NaN NaN \n", - "98854 NaN NaN \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", "\n", "[4 rows x 129 columns]" ] @@ -587,64 +588,64 @@ ], "text/plain": [ " Respondent Hobby OpenSource Country Student \\\n", - "0 1 Yes No Kenya No \n", - "1 3 Yes Yes United Kingdom No \n", - "98853 101544 Yes No Russian Federation No \n", - "98854 101548 Yes Yes Cambodia NaN \n", + "0 1 Yes No Kenya No \n", + "1 3 Yes Yes United Kingdom No \n", + "98853 101544 Yes No Russian Federation No \n", + "98854 101548 Yes Yes Cambodia NaN \n", "\n", " Employment \\\n", - "0 Employed part-time \n", - "1 Employed full-time \n", + "0 Employed part-time \n", + "1 Employed full-time \n", "98853 Independent contractor, freelancer, or self-employed \n", - "98854 NaN \n", + "98854 NaN \n", "\n", " FormalEducation \\\n", - "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) \n", "98853 Some college/university study without earning a degree \n", - "98854 NaN \n", + "98854 NaN \n", "\n", " UndergradMajor \\\n", - "0 Mathematics or statistics \n", + "0 Mathematics or statistics \n", "1 A natural science (ex. biology, chemistry, physics) \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", " CompanySize \\\n", - "0 20 to 99 employees \n", + "0 20 to 99 employees \n", "1 10,000 or more employees \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", " DevType \\\n", - "0 Full-stack developer \n", + "0 Full-stack developer \n", "1 Database administrator;DevOps specialist;Full-stack developer;System administrator \n", - "98853 NaN \n", - "98854 NaN \n", + "98853 NaN \n", + "98854 NaN \n", "\n", - " ... Exercise Gender \\\n", - "0 ... 3 - 4 times per week Male \n", - "1 ... Daily or almost every day Male \n", - "98853 ... NaN NaN \n", - "98854 ... NaN NaN \n", + " ... Exercise Gender SexualOrientation \\\n", + "0 ... 3 - 4 times per week Male Straight or heterosexual \n", + "1 ... Daily or almost every day Male Straight or heterosexual \n", + "98853 ... NaN NaN NaN \n", + "98854 ... NaN NaN NaN \n", "\n", - " SexualOrientation EducationParents \\\n", - "0 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "1 Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) \n", - "98853 NaN NaN \n", - "98854 NaN NaN \n", + " EducationParents RaceEthnicity \\\n", + "0 Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent \n", + "1 Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", "\n", - " RaceEthnicity Age Dependents MilitaryUS \\\n", - "0 Black or of African descent 25 - 34 years old Yes NaN \n", - "1 White or of European descent 35 - 44 years old Yes NaN \n", - "98853 NaN NaN NaN NaN \n", - "98854 NaN NaN NaN NaN \n", + " Age Dependents MilitaryUS \\\n", + "0 25 - 34 years old Yes NaN \n", + "1 35 - 44 years old Yes NaN \n", + "98853 NaN NaN NaN \n", + "98854 NaN NaN NaN \n", "\n", " SurveyTooLong SurveyEasy \n", - "0 The survey was an appropriate length Very easy \n", + "0 The survey was an appropriate length Very easy \n", "1 The survey was an appropriate length Somewhat easy \n", - "98853 NaN NaN \n", - "98854 NaN NaN \n", + "98853 NaN NaN \n", + "98854 NaN NaN \n", "\n", "[4 rows x 129 columns]" ] @@ -658,7 +659,7 @@ "# combine head and tail variant 2\n", "# ranges with iloc\n", "rows = 2\n", - "df.iloc[pd.np.r_[:rows, -rows:0]]" + "df.iloc[np.r_[:rows, -rows:0]]" ] }, { @@ -669,16 +670,16 @@ { "data": { "text/plain": [ - "0 JavaScript;Python;HTML;CSS \n", - "1 JavaScript;Python;Bash/Shell \n", - "2 NaN \n", + "0 JavaScript;Python;HTML;CSS\n", + "1 JavaScript;Python;Bash/Shell\n", + "2 NaN\n", "3 C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell\n", - "4 C;C++;Java;Matlab;R;SQL;Bash/Shell \n", - "98850 NaN \n", - "98851 NaN \n", - "98852 NaN \n", - "98853 NaN \n", - "98854 NaN \n", + "4 C;C++;Java;Matlab;R;SQL;Bash/Shell\n", + "98850 NaN\n", + "98851 NaN\n", + "98852 NaN\n", + "98853 NaN\n", + "98854 NaN\n", "Name: LanguageWorkedWith, dtype: object" ] }, @@ -690,7 +691,7 @@ "source": [ "# get examples from column LanguageWorkedWith\n", "rows = 5\n", - "df.LanguageWorkedWith.iloc[pd.np.r_[:rows, -rows:0]]" + "df.LanguageWorkedWith.iloc[np.r_[:rows, -rows:0]]" ] }, { @@ -701,16 +702,16 @@ { "data": { "text/plain": [ - "C#;JavaScript;SQL;HTML;CSS 1347\n", - "JavaScript;PHP;SQL;HTML;CSS 1235\n", - "Java 1030\n", - "JavaScript;HTML;CSS 881 \n", - "C#;JavaScript;SQL;TypeScript;HTML;CSS 828 \n", - "C;Go;Hack;Java;JavaScript;Perl;PHP;Python;SQL;TypeScript;HTML;CSS;Bash/Shell 1 \n", - "C;C++;Java;JavaScript;PHP;SQL;VBA;Visual Basic 6;HTML;CSS 1 \n", - "Assembly;C;C++;Java;JavaScript;Matlab;PHP;Python;R;SQL;TypeScript;Visual Basic 6;HTML;CSS 1 \n", - "C;C++;Java;JavaScript;Matlab;PHP;Python;Ruby;SQL;HTML;CSS 1 \n", - "Java;JavaScript;PHP;Scala;SQL;Kotlin;HTML;CSS;Bash/Shell 1 \n", + "C#;JavaScript;SQL;HTML;CSS 1347\n", + "JavaScript;PHP;SQL;HTML;CSS 1235\n", + "Java 1030\n", + "JavaScript;HTML;CSS 881\n", + "C#;JavaScript;SQL;TypeScript;HTML;CSS 828\n", + "C;C++;C#;Java;Python;SQL;Swift;HTML;CSS;Bash/Shell 1\n", + "C;C#;Java;JavaScript;PHP;Python;SQL;VBA;VB.NET;HTML;CSS;Bash/Shell 1\n", + "C#;Objective-C;PHP;Python;Swift;HTML;CSS;Bash/Shell 1\n", + "C#;Java;JavaScript;Objective-C;Perl;PHP;Python;SQL;Swift;TypeScript;VBA;VB.NET;HTML;CSS;Bash/Shell 1\n", + "C#;CoffeeScript;F#;JavaScript;SQL;TypeScript;HTML;CSS 1\n", "Name: LanguageWorkedWith, dtype: int64" ] }, @@ -721,7 +722,7 @@ ], "source": [ "# value counts for the same column\n", - "df.LanguageWorkedWith.value_counts().iloc[pd.np.r_[:rows, -rows:0]]" + "df.LanguageWorkedWith.value_counts().iloc[np.r_[:rows, -rows:0]]" ] }, { @@ -943,19 +944,19 @@ "" ], "text/plain": [ - " 0 1 2 3 4 5 6 \\\n", - "0 JavaScript Python HTML CSS None None None \n", - "1 JavaScript Python Bash/Shell None None None None \n", - "3 C# JavaScript SQL TypeScript HTML CSS Bash/Shell \n", - "4 C C++ Java Matlab R SQL Bash/Shell \n", - "5 Java JavaScript Python TypeScript HTML CSS None \n", + " 0 1 2 3 4 5 6 \\\n", + "0 JavaScript Python HTML CSS None None None \n", + "1 JavaScript Python Bash/Shell None None None None \n", + "3 C# JavaScript SQL TypeScript HTML CSS Bash/Shell \n", + "4 C C++ Java Matlab R SQL Bash/Shell \n", + "5 Java JavaScript Python TypeScript HTML CSS None \n", "\n", - " 7 8 9 ... 28 29 30 31 32 33 34 35 \\\n", - "0 None None None ... None None None None None None None None \n", - "1 None None None ... None None None None None None None None \n", - "3 None None None ... None None None None None None None None \n", - "4 None None None ... None None None None None None None None \n", - "5 None None None ... None None None None None None None None \n", + " 7 8 9 ... 28 29 30 31 32 33 34 35 \\\n", + "0 None None None ... None None None None None None None None \n", + "1 None None None ... None None None None None None None None \n", + "3 None None None ... None None None None None None None None \n", + "4 None None None ... None None None None None None None None \n", + "5 None None None ... None None None None None None None None \n", "\n", " 36 37 \n", "0 None None \n", @@ -1152,26 +1153,26 @@ "" ], "text/plain": [ - " 0 1 2 3 4 5 6 7 \\\n", - "Assembly 5760.0 NaN NaN NaN NaN NaN NaN NaN \n", - "Bash/Shell 29.0 465.0 1221.0 1929.0 2882.0 4442.0 4844.0 4269.0 \n", - "C 13335.0 4707.0 NaN NaN NaN NaN NaN NaN \n", - "C# 16969.0 4321.0 3990.0 1674.0 NaN NaN NaN NaN \n", - "C++ 7042.0 9275.0 3555.0 NaN NaN NaN NaN NaN \n", + " 0 1 2 3 4 5 6 7 \\\n", + "Assembly 5760.0 NaN NaN NaN NaN NaN NaN NaN \n", + "Bash/Shell 29.0 465.0 1221.0 1929.0 2882.0 4442.0 4844.0 4269.0 \n", + "C 13335.0 4707.0 NaN NaN NaN NaN NaN NaN \n", + "C# 16969.0 4321.0 3990.0 1674.0 NaN NaN NaN NaN \n", + "C++ 7042.0 9275.0 3555.0 NaN NaN NaN NaN NaN \n", "\n", - " 8 9 ... 28 29 30 31 32 33 34 35 36 \\\n", - "Assembly NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", - "Bash/Shell 3311.0 2562.0 ... 3.0 1.0 2.0 2.0 NaN 1.0 NaN NaN 2.0 \n", - "C NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", - "C# NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", - "C++ NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + " 8 9 ... 28 29 30 31 32 33 34 35 36 \\\n", + "Assembly NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + "Bash/Shell 3311.0 2562.0 ... 3.0 1.0 2.0 2.0 NaN 1.0 NaN NaN 2.0 \n", + "C NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C# NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C++ NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "\n", " 37 \n", - "Assembly NaN \n", + "Assembly NaN \n", "Bash/Shell 35.0 \n", - "C NaN \n", - "C# NaN \n", - "C++ NaN \n", + "C NaN \n", + "C# NaN \n", + "C++ NaN \n", "\n", "[5 rows x 38 columns]" ] @@ -1364,33 +1365,26 @@ "" ], "text/plain": [ - " 0 1 2 3 4 5 \\\n", - "Assembly 0.073531 NaN NaN NaN NaN NaN \n", + " 0 1 2 3 4 5 \\\n", + "Assembly 0.073531 NaN NaN NaN NaN NaN \n", "Bash/Shell 0.000370 0.005936 0.015587 0.024625 0.036791 0.056706 \n", - "C 0.170233 0.060089 NaN NaN NaN NaN \n", - "C# 0.216624 0.055161 0.050936 0.021370 NaN NaN \n", - "C++ 0.089897 0.118403 0.045383 NaN NaN NaN \n", + "C 0.170233 0.060089 NaN NaN NaN NaN \n", + "C# 0.216624 0.055161 0.050936 0.021370 NaN NaN \n", + "C++ 0.089897 0.118403 0.045383 NaN NaN NaN \n", "\n", - " 6 7 8 9 ... 28 \\\n", - "Assembly NaN NaN NaN NaN ... NaN \n", - "Bash/Shell 0.061838 0.054497 0.042268 0.032706 ... 0.000038 \n", - "C NaN NaN NaN NaN ... NaN \n", - "C# NaN NaN NaN NaN ... NaN \n", - "C++ NaN NaN NaN NaN ... NaN \n", + " 6 7 8 9 ... 28 29 \\\n", + "Assembly NaN NaN NaN NaN ... NaN NaN \n", + "Bash/Shell 0.061838 0.054497 0.042268 0.032706 ... 0.000038 0.000013 \n", + "C NaN NaN NaN NaN ... NaN NaN \n", + "C# NaN NaN NaN NaN ... NaN NaN \n", + "C++ NaN NaN NaN NaN ... NaN NaN \n", "\n", - " 29 30 31 32 33 34 35 36 \\\n", - "Assembly NaN NaN NaN NaN NaN NaN NaN NaN \n", - "Bash/Shell 0.000013 0.000026 0.000026 NaN 0.000013 NaN NaN 0.000026 \n", - "C NaN NaN NaN NaN NaN NaN NaN NaN \n", - "C# NaN NaN NaN NaN NaN NaN NaN NaN \n", - "C++ NaN NaN NaN NaN NaN NaN NaN NaN \n", - "\n", - " 37 \n", - "Assembly NaN \n", - "Bash/Shell 0.000447 \n", - "C NaN \n", - "C# NaN \n", - "C++ NaN \n", + " 30 31 32 33 34 35 36 37 \n", + "Assembly NaN NaN NaN NaN NaN NaN NaN NaN \n", + "Bash/Shell 0.000026 0.000026 NaN 0.000013 NaN NaN 0.000026 0.000447 \n", + "C NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C# NaN NaN NaN NaN NaN NaN NaN NaN \n", + "C++ NaN NaN NaN NaN NaN NaN NaN NaN \n", "\n", "[5 rows x 38 columns]" ] @@ -1419,7 +1413,7 @@ "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# why for value counts and parameters you need lambda\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf_lang_per\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf_lang\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfillna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSeries\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnormalize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# why for value counts and parameters you need lambda\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf_lang_per\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf_lang\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfillna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSeries\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnormalize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: value_counts() missing 1 required positional argument: 'self'" ] } @@ -1438,15 +1432,15 @@ "data": { "text/plain": [ "0 31.800036\n", - "JavaScript 0.698113 \n", - "HTML 0.684607 \n", - "CSS 0.650790 \n", - "SQL 0.570250 \n", - "Java 0.453456 \n", - "Bash/Shell 0.397937 \n", - "Python 0.387558 \n", - "C# 0.344091 \n", - "PHP 0.307287 \n", + "JavaScript 0.698113\n", + "HTML 0.684607\n", + "CSS 0.650790\n", + "SQL 0.570250\n", + "Java 0.453456\n", + "Bash/Shell 0.397937\n", + "Python 0.387558\n", + "C# 0.344091\n", + "PHP 0.307287\n", "Name: total, dtype: float64" ] }, @@ -1470,10 +1464,10 @@ "data": { "text/plain": [ "0 2491024.0\n", - "JavaScript 54686.0 \n", - "HTML 53628.0 \n", - "CSS 50979.0 \n", - "SQL 44670.0 \n", + "JavaScript 54686.0\n", + "HTML 53628.0\n", + "CSS 50979.0\n", + "SQL 44670.0\n", "Name: total, dtype: float64" ] }, @@ -1664,19 +1658,19 @@ "" ], "text/plain": [ - " 0 1 2 3 4 5 6 \\\n", - "0 JavaScript Python HTML CSS None None None \n", - "1 JavaScript Python Bash/Shell None None None None \n", - "3 C# JavaScript SQL TypeScript HTML CSS Bash/Shell \n", - "4 C C++ Java Matlab R SQL Bash/Shell \n", - "5 Java JavaScript Python TypeScript HTML CSS None \n", + " 0 1 2 3 4 5 6 \\\n", + "0 JavaScript Python HTML CSS None None None \n", + "1 JavaScript Python Bash/Shell None None None None \n", + "3 C# JavaScript SQL TypeScript HTML CSS Bash/Shell \n", + "4 C C++ Java Matlab R SQL Bash/Shell \n", + "5 Java JavaScript Python TypeScript HTML CSS None \n", "\n", - " 7 8 9 ... 28 29 30 31 32 33 34 35 \\\n", - "0 None None None ... None None None None None None None None \n", - "1 None None None ... None None None None None None None None \n", - "3 None None None ... None None None None None None None None \n", - "4 None None None ... None None None None None None None None \n", - "5 None None None ... None None None None None None None None \n", + " 7 8 9 ... 28 29 30 31 32 33 34 35 \\\n", + "0 None None None ... None None None None None None None None \n", + "1 None None None ... None None None None None None None None \n", + "3 None None None ... None None None None None None None None \n", + "4 None None None ... None None None None None None None None \n", + "5 None None None ... None None None None None None None None \n", "\n", " 36 37 \n", "0 None None \n", @@ -1709,7 +1703,7 @@ "C 13335\n", "JavaScript 12150\n", "Java 12087\n", - "C++ 7042 \n", + "C++ 7042\n", "Name: 0, dtype: int64" ] }, @@ -1733,9 +1727,9 @@ "text/plain": [ "JavaScript 19532\n", "Java 10175\n", - "C++ 9275 \n", - "PHP 6450 \n", - "C 4707 \n", + "C++ 9275\n", + "PHP 6450\n", + "C 4707\n", "Name: 1, dtype: int64" ] }, @@ -1754,16 +1748,6 @@ "execution_count": 19, "metadata": {}, "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/generic.py:7441: RuntimeWarning: '<' not supported between instances of 'str' and 'float', sort order is undefined for incomparable objects\n", - " return_indexers=True)\n", - "/home/vanx/Software/Tensorflow/environments/venv36/lib/python3.6/site-packages/pandas/core/generic.py:7441: RuntimeWarning: '<' not supported between instances of 'float' and 'str', sort order is undefined for incomparable objects\n", - " return_indexers=True)\n" - ] - }, { "data": { "text/plain": [ @@ -1878,10 +1862,10 @@ "CSS 50979\n", "SQL 44670\n", "Java 35521\n", - "Rust 1857 \n", - "Kotlin 3508 \n", - "Cobol 590 \n", - "Ocaml 470 \n", + "Rust 1857\n", + "Kotlin 3508\n", + "Cobol 590\n", + "Ocaml 470\n", "CSS 50979" ] }, @@ -1984,11 +1968,11 @@ "CSS 50979\n", "SQL 44670\n", "Java 35521\n", - "Erlang 886 \n", - "Cobol 590 \n", - "Ocaml 470 \n", - "Julia 430 \n", - "Hack 254 " + "Erlang 886\n", + "Cobol 590\n", + "Ocaml 470\n", + "Julia 430\n", + "Hack 254" ] }, "execution_count": 22, @@ -2001,12 +1985,65 @@ "df_comb.head(rows).append(df_comb.tail(rows))" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note**: In some cases the iteration example is not working properly - when the first column doesn't contain all values. It can be replaced with the example below:" + ] + }, { "cell_type": "code", - "execution_count": null, + "execution_count": 24, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "JavaScript 54686.0\n", + "HTML 53628.0\n", + "CSS 50979.0\n", + "SQL 44670.0\n", + "Java 35521.0\n", + "Bash/Shell 31172.0\n", + "Python 30359.0\n", + "C# 26954.0\n", + "PHP 24071.0\n", + "C++ 19872.0\n", + "Delphi/Object Pascal 2025.0\n", + "Haskell 1961.0\n", + "Rust 1857.0\n", + "F# 1115.0\n", + "Clojure 1032.0\n", + "Erlang 886.0\n", + "Cobol 590.0\n", + "Ocaml 470.0\n", + "Julia 430.0\n", + "Hack 254.0\n", + "dtype: float64" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_comb = pd.DataFrame()\n", + "temp = []\n", + "val_count_tmp = pd.Series(dtype=float)\n", + "\n", + "# sum all columns in dataframe with iteration\n", + "for col in df_lang.columns:\n", + " temp.append(df_lang[col].fillna(0).value_counts())\n", + "\n", + "for val_count in temp:\n", + " val_count_tmp = val_count_tmp.add(val_count,fill_value=0)\n", + "\n", + "y = val_count_tmp.dropna().drop(0) \n", + "y.sort_values(ascending=False, inplace=True)\n", + "y.head(10).append(y.tail(10))" + ] }, { "cell_type": "code", @@ -2033,6 +2070,19 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false } }, "nbformat": 4, From 34cb4e65bdae975ab2db4edbd5b21c2be3b707a0 Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Fri, 10 Dec 2021 10:15:54 +0200 Subject: [PATCH 67/76] Update README.md add https://datascientyst.com/ --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 607f136..07a9848 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,12 @@ Jupyter notebooks and datasets for the interesting pandas/python/data science vi # Who is this repo for? -For people who are interested in data science, data analysis and finding interesting relation for data. This repository is related to site: https://blog.softhints.com/tag/pandas/ where you can find more interesting videos. The youtube channel is: +For people who are interested in data science, data analysis and finding interesting relation for data. This repository is related to site: https://blog.softhints.com/tag/pandas/ where you can find more interesting videos. + +New website dedicated to Pandas and Data Science was started: https://datascientyst.com/. It has better organization and covers topics in many areas. + + +The youtube channel is: https://www.youtube.com/channel/UCg5rvP_D735oSBatdcH5ZFA From 5d52124d1cae1af968f97733af8324b163240e49 Mon Sep 17 00:00:00 2001 From: softhints Date: Thu, 10 Mar 2022 13:18:12 +0200 Subject: [PATCH 68/76] add Think_Python_Chapter_12__Tuples.ipynb --- .../Think_Python_Chapter_12__Tuples.ipynb | 1497 +++++++++++++++++ 1 file changed, 1497 insertions(+) create mode 100644 notebooks/Books/Think Python/Think_Python_Chapter_12__Tuples.ipynb diff --git a/notebooks/Books/Think Python/Think_Python_Chapter_12__Tuples.ipynb b/notebooks/Books/Think Python/Think_Python_Chapter_12__Tuples.ipynb new file mode 100644 index 0000000..fc2e3da --- /dev/null +++ b/notebooks/Books/Think Python/Think_Python_Chapter_12__Tuples.ipynb @@ -0,0 +1,1497 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chapter 12  Tuples\n", + "\n", + "\n", + "* 12.1  Tuples are immutable\n", + "* 12.2  Tuple assignment\n", + "* 12.3  Tuples as return values\n", + "* 12.4  Variable-length argument tuples\n", + "* 12.5  Lists and tuples\n", + "* 12.6  Dictionaries and tuples\n", + "* 12.7  Sequences of sequences" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.1 Tuples are immutable" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This chapter presents one more built-in type, the tuple, and then\n", + "shows how lists, dictionaries, and tuples work together.\n", + "I also present a useful feature for variable-length argument lists,\n", + "the gather and scatter operators." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One note: there is no consensus on how to pronounce “tuple”. Some people say **“tuh-ple”**, which rhymes with “supple”. But in the context of programming, most people say **“too-ple”**, which rhymes with “quadruple”." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " A tuple is a sequence of values. The values can be any type, and\n", + "they are indexed by integers, so in that respect tuples are a lot\n", + "like lists. The important difference is that tuples are immutable.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Syntactically, a tuple is a comma-separated list of values:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = 'a', 'b', 'c', 'd', 'e'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Although it is not necessary, it is common to enclose tuples in\n", + "parentheses:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ('a', 'b', 'c', 'd', 'e')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "To create a tuple with a single element, you have to include a final\n", + "comma:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t1 = 'a',\n", + "type(t1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A value in parentheses is not a tuple:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t2 = ('a')\n", + "type(t2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Another way to create a tuple is the built-in function tuple.\n", + "With no argument, it creates an empty tuple:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = tuple()\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the argument is a sequence (string, list or tuple), the result\n", + "is a tuple with the elements of the sequence:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = tuple('lupins')\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Because tuple is the name of a built-in function, you should\n", + "avoid using it as a variable name." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Most list operators also work on tuples. The bracket operator\n", + "indexes an element:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ('a', 'b', 'c', 'd', 'e')\n", + "t[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "And the slice operator selects a range of elements.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[1:3]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " But if you try to modify one of the elements of the tuple, you get\n", + "an error:\n", + "
\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t[0] = 'A'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Because tuples are immutable, you can’t modify the elements. But you\n", + "can replace one tuple with another:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = ('A',) + t[1:]\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This statement makes a new tuple and then makes t refer to it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The relational operators work with tuples and other sequences;\n", + "Python starts by comparing the first element from each\n", + "sequence. If they are equal, it goes on to the next elements,\n", + "and so on, until it finds elements that differ. Subsequent\n", + "elements are not considered (even if they are really big).\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "(0, 1, 2) < (0, 3, 4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "(0, 1, 2000000) < (0, 3, 4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.2 Tuple assignment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is often useful to swap the values of two variables.\n", + "With conventional assignments, you have to use a temporary\n", + "variable. For example, to swap a and b:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 4\n", + "b = 3\n", + "print(f'a: {a}, b: {b}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "temp = a\n", + "a = b\n", + "b = temp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(f'a: {a}, b: {b}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "

Bonus: Tower of Hanoi

\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This solution is cumbersome; tuple assignment is more elegant:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a, b = b, a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The left side is a tuple of variables; the right side is a tuple of\n", + "expressions. Each value is assigned to its respective variable. \n", + "All the expressions on the right side are evaluated before any\n", + "of the assignments." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "
\n", + " The number of variables on the left and the number of\n", + "values on the right have to be the same:\n", + "
\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a, b = 1, 2, 3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "More generally, the right side can be any kind of sequence\n", + "(string, list or tuple). For example, to split an email address\n", + "into a user name and a domain, you could write:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "addr = 'monty@python.org'\n", + "uname, domain = addr.split('@')\n", + "uname, domain" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = ['Everest', 8849, 27.9881, 86.9250]\n", + "name, height, latitude, longitude = data\n", + "\n", + "print(name, height, latitude, longitude)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The return value from split is a list with two elements;\n", + "the first element is assigned to uname, the second to\n", + "domain." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "uname #'monty'\n", + "domain #'python.org'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.3 Tuples as return values" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Strictly speaking, a function can only return one value, but\n", + "if the value is a tuple, the effect is the same as returning\n", + "multiple values. For example, if you want to divide two integers\n", + "and compute the quotient and remainder, it is inefficient to\n", + "compute x//y and then x%y. It is better to compute\n", + "them both at the same time.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The built-in function divmod takes two arguments and\n", + "returns a tuple of two values, the quotient and remainder.\n", + "You can store the result as a tuple:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = divmod(7, 3)\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Or use tuple assignment to store the elements separately:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "quot, rem = divmod(7, 3)\n", + "quot" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "rem" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Here is an example of a function that returns a tuple:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def min_max(t):\n", + " return min(t), max(t)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "max and min are built-in functions that find\n", + "the largest and smallest elements of a sequence. min_max\n", + "computes both and returns a tuple of two values.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.4 Variable-length argument tuples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Functions can take a variable number of arguments. A parameter\n", + "name that begins with * gathers arguments into\n", + "a tuple. For example, printall\n", + "takes any number of arguments and prints them:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def printall(*args):\n", + " print(args)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The gather parameter can have any name you like, but args is\n", + "conventional. Here’s how the function works:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "printall(1, 2.0, '3','x')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "
\n", + " The complement of gather is scatter. If you have a\n", + "sequence of values and you want to pass it to a function\n", + "as multiple arguments, you can use the * operator.\n", + "For example, divmod takes exactly two arguments; it\n", + "doesn’t work with a tuple:\n", + "
\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = (7, 3)\n", + "divmod(t)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " But if you scatter the tuple, it works:\n", + "
\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "divmod(*t)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Many of the built-in functions use\n", + "variable-length argument tuples. For example, max\n", + "and min can take any number of arguments:\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "max(1, 2, 3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "But sum does not.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sum(1, 2, 3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As an exercise, write a function called sum_all that takes any number\n", + "of arguments and returns their sum." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.5 Lists and tuples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "zip is a built-in function that takes two or more sequences and\n", + "interleaves them. The name of the function refers to\n", + "a zipper, which interleaves two rows of teeth." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This example zips a string and a list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'abc'\n", + "t = [0, 1, 2]\n", + "zip(s, t)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result is a zip object that knows how to iterate through\n", + "the pairs. The most common use of zip is in a for loop:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for pair in zip(s, t):\n", + " print(pair)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A zip object is a kind of iterator, which is any object\n", + "that iterates through a sequence. Iterators are similar to lists in some\n", + "ways, but unlike lists, you can’t use an index to select an element from\n", + "an iterator.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you want to use list operators and methods, you can\n", + "use a zip object to make a list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "zip(s, t)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(zip(s, t))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result is a list of tuples; in this example, each tuple contains\n", + "a character from the string and the corresponding element from\n", + "the list.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If the sequences are not the same length, the result has the\n", + "length of the shorter one." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(zip('Anne', 'Elk'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "You can use tuple assignment in a for loop to traverse a list of\n", + "tuples:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = [('a', 0), ('b', 1), ('c', 2)]\n", + "for letter, number in t:\n", + " print(number, letter)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " Each time through the loop, Python selects the next tuple in\n", + "the list and assigns the elements to letter and \n", + "number. The output of this loop is:\n", + "
\n", + "\n", + "0 a\n", + "\n", + "1 b\n", + "\n", + "2 c" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you combine zip, for and tuple assignment, you get a\n", + "useful idiom for traversing two (or more) sequences at the same\n", + "time. For example, has_match takes two sequences, t1 and\n", + "t2, and returns True if there is an index i\n", + "such that t1[i] == t2[i]:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def has_match(t1, t2):\n", + " for x, y in zip(t1, t2):\n", + " if x == y:\n", + " return True\n", + " return False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you need to traverse the elements of a sequence and their\n", + "indices, you can use the built-in function enumerate:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for index, element in enumerate('abc'):\n", + " print(index, element)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result from enumerate is an enumerate object, which\n", + "iterates a sequence of pairs; each pair contains an index (starting\n", + "from 0) and an element from the given sequence.\n", + "In this example, the output is" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "0 a\n", + "1 b\n", + "2 c" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Again.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.6 Dictionaries and tuples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Dictionaries have a method called items that returns a sequence of\n", + "tuples, where each tuple is a key-value pair.\n", + "
\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = {'a':0, 'b':1, 'c':2}\n", + "t = d.items()\n", + "t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The result is a dict_items object, which is an iterator that\n", + "iterates the key-value pairs. You can use it in a for loop\n", + "like this:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for key, value in d.items():\n", + " print(key, value)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "As you should expect from a dictionary, the items are in no\n", + "particular order." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "
\n", + "Going in the other direction, you can use a list of tuples to\n", + "initialize a new dictionary: \n", + "
\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = [('a', 0), ('c', 2), ('b', 1)]\n", + "d = dict(t)\n", + "d" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Combining dict with zip yields a concise way\n", + "to create a dictionary:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = dict(zip('abc', range(3)))\n", + "d" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The dictionary method update also takes a list of tuples\n", + "and adds them, as key-value pairs, to an existing dictionary.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is common to use tuples as keys in dictionaries (primarily because\n", + "you can’t use lists). For example, a telephone directory might map\n", + "from last-name, first-name pairs to telephone numbers. Assuming\n", + "that we have defined last, first and number, we\n", + "could write:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "directory[last, first] = number" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The expression in brackets is a tuple. We could use tuple\n", + "assignment to traverse this dictionary.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for last, first in directory:\n", + " print(first, last, directory[last,first])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "This loop traverses the keys in directory, which are tuples. It\n", + "assigns the elements of each tuple to last and first, then\n", + "prints the name and corresponding telephone number." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are two ways to represent tuples in a state diagram. The more\n", + "detailed version shows the indices and elements just as they appear in\n", + "a list. For example, the tuple ('Cleese', 'John') would appear\n", + "as in Figure 12.1.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But in a larger diagram you might want to leave out the\n", + "details. For example, a diagram of the telephone directory might\n", + "appear as in Figure 12.2." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here the tuples are shown using Python syntax as a graphical\n", + "shorthand. The telephone number in the diagram is the complaints line\n", + "for the BBC, so please don’t call it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.7 Sequences of sequences" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I have focused on lists of tuples, but almost all of the examples in\n", + "this chapter also work with lists of lists, tuples of tuples, and\n", + "tuples of lists. To avoid enumerating the possible combinations, it\n", + "is sometimes easier to talk about sequences of sequences." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In many contexts, the different kinds of sequences (strings, lists and\n", + "tuples) can be used interchangeably. So how should you choose one\n", + "over the others?\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "To start with the obvious, strings are more limited than other\n", + "sequences because the elements have to be characters. They are\n", + "also immutable. If you need the ability to change the characters\n", + "in a string (as opposed to creating a new string), you might\n", + "want to use a list of characters instead.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "
\n", + "Lists are more common than tuples, mostly because they are mutable.\n", + "But there are a few cases where you might prefer tuples:\n", + "\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Because tuples are immutable, they don’t provide methods like sort and reverse, which modify existing lists. But Python\n", + "provides the built-in function sorted, which takes any sequence\n", + "and returns a new list with the same elements in sorted order, and\n", + "reversed, which takes a sequence and returns an iterator that\n", + "traverses the list in reverse order.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.8 Debugging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lists, dictionaries and tuples are examples of data\n", + "structures; in this chapter we are starting to see compound data\n", + "structures, like lists of tuples, or dictionaries that contain tuples\n", + "as keys and lists as values. Compound data structures are useful, but\n", + "they are prone to what I call shape errors; that is, errors\n", + "caused when a data structure has the wrong type, size, or structure.\n", + "For example, if you are expecting a list with one integer and I\n", + "give you a plain old integer (not in a list), it won’t work.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To help debug these kinds of errors, I have written a module\n", + "called structshape that provides a function, also called\n", + "structshape, that takes any kind of data structure as\n", + "an argument and returns a string that summarizes its shape.\n", + "You can download it from http://thinkpython2.com/code/structshape.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here’s the result for a simple list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from structshape import structshape\n", + "t = [1, 2, 3]\n", + "structshape(t)\n", + "'list of 3 int'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "A fancier program might write “list of 3 ints”, but it\n", + "was easier not to deal with plurals. Here’s a list of lists:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t2 = [[1,2], [3,4], [5,6]]\n", + "structshape(t2)\n", + "'list of 3 list of 2 int'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If the elements of the list are not the same type,\n", + "structshape groups them, in order, by type:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t3 = [1, 2, 3, 4.0, '5', '6', [7], [8], 9]\n", + "structshape(t3)\n", + "'list of (3 int, float, 2 str, 2 list of int, int)'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Here’s a list of tuples:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'abc'\n", + "lt = list(zip(t, s))\n", + "structshape(lt)\n", + "'list of 3 tuple of (int, str)'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "And here’s a dictionary with 3 items that map integers to strings." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = dict(lt) \n", + "structshape(d)\n", + "'dict of 3 int->str'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you are having trouble keeping track of your data structures,\n", + "structshape can help." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 12.9 Glossary" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ipytracer\n", + "from IPython.core.display import display\n", + "\n", + "def bubble_sort(unsorted_list):\n", + " x = ipytracer.ChartTracer(unsorted_list)\n", + " display(x)\n", + " length = len(x)-1\n", + " for i in range(length):\n", + " for j in range(length-i):\n", + " if x[j] > x[j+1]:\n", + " x[j], x[j+1] = x[j+1], x[j]\n", + " return x.tolist()\n", + "\n", + "bubble_sort([6,4,7,9])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ipytracer\n", + "from IPython.core.display import display\n", + "\n", + "def bubble_sort(unsorted_list):\n", + " x = ipytracer.List1DTracer(unsorted_list)\n", + " display(x)\n", + " length = len(x)-1\n", + " for i in range(length):\n", + " for j in range(length-i):\n", + " if x[j] > x[j+1]:\n", + " x[j], x[j+1] = x[j+1], x[j]\n", + " print(unsorted_list) \n", + " return x.tolist()\n", + "\n", + "bubble_sort([6,4,7,9])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ipytracer\n", + "from IPython.core.display import display\n", + "import re\n", + "\n", + " \n", + "def quick_sort(arr): \n", + " input_list = ipytracer.ChartTracer(arr)\n", + " display(input_list)\n", + "\n", + " def alphanum_key(key):\n", + " return [int(s) if s.isdigit() else s.lower() for s in re.split(\"([0-9]+)\", key)]\n", + "\n", + " return sorted(input_list, key=alphanum_key)\n", + "\n", + "\n", + "quick_sort(['6','4','7','9','3','5','1','8'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import random\n", + "def merge_sort(collectionx: list) -> list:\n", + " collectionx = ipytracer.List1DTracer(collectionx)\n", + " display(collectionx)\n", + " \n", + " for i in range(0, 8):\n", + " collectionx[i] = i\n", + " collectionx[i-1] = i-1\n", + " collectionx[i-2] = i*2\n", + "\n", + "\n", + "merge_sort([6,4,7,9,3,5,1,8,2])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def merge_sort(collection: list) -> list:\n", + "\n", + "\n", + " def merge(left: list, right: list) -> list:\n", + " \"\"\"merge left and right\n", + " :param left: left collection\n", + " :param right: right collection\n", + " :return: merge result\n", + " \"\"\"\n", + "\n", + " def _merge():\n", + " while left and right:\n", + " yield (left if left[0] <= right[0] else right).pop(0)\n", + " yield from left\n", + " yield from right\n", + "\n", + " return list(_merge())\n", + "\n", + " if len(collection) <= 1:\n", + " return collection\n", + " mid = len(collection) // 2\n", + " display(ipytracer.List1DTracer(collection))\n", + " left = merge_sort(collection[:mid])\n", + " right = merge_sort(collection[mid:])\n", + " x = merge(left, right)\n", + " display(x)\n", + " return merge(left, right)\n", + "\n", + "merge_sort([6,4,7,9,3,5,1,8,2])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def shell_sort(collection):\n", + " collection = ipytracer.List1DTracer(collection)\n", + " display(collection)\n", + " gaps = [701, 301, 132, 57, 23, 10, 4, 1]\n", + "\n", + " for gap in gaps:\n", + " for i in range(gap, len(collection)):\n", + " j = i\n", + " while j >= gap and collection[j] < collection[j - gap]:\n", + " collection[j], collection[j - gap] = collection[j - gap], collection[j]\n", + " j -= gap\n", + " return collection\n", + "\n", + "shell_sort([6,4,7,9,3,5,1,8,2])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 56981a890a33f5e517d3a9f3f9019dfa7cd196c1 Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Tue, 20 Jun 2023 18:30:57 +0300 Subject: [PATCH 69/76] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 07a9848..e05f20c 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,10 @@ # python Jupyter notebooks and datasets for the interesting pandas/python/data science video series. +# Contribution + +Feel free to contribute or suggest new ideas. [Improvements](mailto:grouprivl@gmail.com?subject=[GitHub]%20Source%20Python) + # Who is this repo for? For people who are interested in data science, data analysis and finding interesting relation for data. This repository is related to site: https://blog.softhints.com/tag/pandas/ where you can find more interesting videos. From d7d07588e40723b514657b319584c7c6733e17da Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Sat, 24 Jun 2023 10:23:43 +0300 Subject: [PATCH 70/76] Update README.md --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e05f20c..c033865 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,9 @@ Jupyter notebooks and datasets for the interesting pandas/python/data science vi # Contribution -Feel free to contribute or suggest new ideas. [Improvements](mailto:grouprivl@gmail.com?subject=[GitHub]%20Source%20Python) +Feel free to contribute or suggest new ideas. +For getting in touch you can write us on [mail](mailto:grouprivl@gmail.com?subject=[GitHub]%20Source%20Python). +You can find nice guide about GitHub contribution: [Step-by-step guide to contributing on GitHub](https://www.dataschool.io/how-to-contribute-on-github/0 # Who is this repo for? From 8900fba3b00e199e51f87f9dd478c52dbb559f49 Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Sat, 24 Jun 2023 10:24:42 +0300 Subject: [PATCH 71/76] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index c033865..b634c13 100644 --- a/README.md +++ b/README.md @@ -3,8 +3,8 @@ Jupyter notebooks and datasets for the interesting pandas/python/data science vi # Contribution -Feel free to contribute or suggest new ideas. -For getting in touch you can write us on [mail](mailto:grouprivl@gmail.com?subject=[GitHub]%20Source%20Python). +Feel free to contribute or suggest new ideas. To get in touch write on [mail](mailto:grouprivl@gmail.com?subject=[GitHub]%20Source%20Python). + You can find nice guide about GitHub contribution: [Step-by-step guide to contributing on GitHub](https://www.dataschool.io/how-to-contribute-on-github/0 # Who is this repo for? From fde14019e007af186cbaa1ea793a41e71bfcf41b Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Sat, 24 Jun 2023 10:25:13 +0300 Subject: [PATCH 72/76] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b634c13..c872b72 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ Jupyter notebooks and datasets for the interesting pandas/python/data science vi Feel free to contribute or suggest new ideas. To get in touch write on [mail](mailto:grouprivl@gmail.com?subject=[GitHub]%20Source%20Python). -You can find nice guide about GitHub contribution: [Step-by-step guide to contributing on GitHub](https://www.dataschool.io/how-to-contribute-on-github/0 +You can find nice guide about GitHub contribution: [Step-by-step guide to contributing on GitHub](https://www.dataschool.io/how-to-contribute-on-github/) # Who is this repo for? From 6a37136b10c1e01da8fd0234034631afb6cf9e65 Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Sat, 24 Jun 2023 10:27:21 +0300 Subject: [PATCH 73/76] Update README.md --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index c872b72..9e6b9bd 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,11 @@ You can find nice guide about GitHub contribution: [Step-by-step guide to contri # Who is this repo for? -For people who are interested in data science, data analysis and finding interesting relation for data. This repository is related to site: https://blog.softhints.com/tag/pandas/ where you can find more interesting videos. +For people who are interested in data science, data analysis and finding interesting relation for data. This repository is related to sites: +* [DataScientYst.com - Data Science Tutorials, Exercises, Guides, Videos with Python and Pandas](https://datascientyst.com/) +* [SoftHints.com - Python, Pandas, Linux, SQL Tutorials and Guides](https://softhints.com/) + +where you can find more interesting articles. New website dedicated to Pandas and Data Science was started: https://datascientyst.com/. It has better organization and covers topics in many areas. From 51da9fd52869f6e4bbbea62785fc122f70cea212 Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Sat, 24 Jun 2023 10:29:18 +0300 Subject: [PATCH 74/76] Update README.md --- README.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 9e6b9bd..24dacdd 100644 --- a/README.md +++ b/README.md @@ -5,11 +5,13 @@ Jupyter notebooks and datasets for the interesting pandas/python/data science vi Feel free to contribute or suggest new ideas. To get in touch write on [mail](mailto:grouprivl@gmail.com?subject=[GitHub]%20Source%20Python). -You can find nice guide about GitHub contribution: [Step-by-step guide to contributing on GitHub](https://www.dataschool.io/how-to-contribute-on-github/) +You can find nice guide about GitHub contribution: +* [Contributing to projects](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) +* [Step-by-step guide to contributing on GitHub](https://www.dataschool.io/how-to-contribute-on-github/) # Who is this repo for? -For people who are interested in data science, data analysis and finding interesting relation for data. This repository is related to sites: +For people who are interested in data science, data analysis and finding interesting insights for data. This repository is related to sites: * [DataScientYst.com - Data Science Tutorials, Exercises, Guides, Videos with Python and Pandas](https://datascientyst.com/) * [SoftHints.com - Python, Pandas, Linux, SQL Tutorials and Guides](https://softhints.com/) From e3922a24bca5e23023d8832a0adf1ad2af0397f9 Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Sat, 24 Jun 2023 10:32:10 +0300 Subject: [PATCH 75/76] Update README.md --- README.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 24dacdd..32615a6 100644 --- a/README.md +++ b/README.md @@ -22,11 +22,8 @@ New website dedicated to Pandas and Data Science was started: https://datascient The youtube channel is: -https://www.youtube.com/channel/UCg5rvP_D735oSBatdcH5ZFA - -# Popular Videos - -https://softhints.com/youtube-videos.html +* [SoftHints Youtube](https://www.youtube.com/@softhints/) +* [Popular Videos](https://www.youtube.com/@softhints/videos) # Latest Videos From a256a054d74ca397f41874b3e26f1c4b84214432 Mon Sep 17 00:00:00 2001 From: Softhints <44205770+softhints@users.noreply.github.com> Date: Mon, 10 Feb 2025 16:02:39 +0200 Subject: [PATCH 76/76] Add files via upload --- notebooks/csv/data.csv.zip | Bin 0 -> 255 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 notebooks/csv/data.csv.zip diff --git a/notebooks/csv/data.csv.zip b/notebooks/csv/data.csv.zip new file mode 100644 index 0000000000000000000000000000000000000000..1acfcc2ff36b9ff3d92943578be8d3b05cc97b1a GIT binary patch literal 255 zcmWIWW@Zs#U|`^2xL)lQ<-0X7I1|Xr1&c5+q$HLk>LnMKu^u$!J7mDadg1Q!6O8`? zwuU6>vL0!Om?7c2A*lPvv`I{lzSW-J!C}FU0%kJR>m`0#64v%O)-jwj?&62DZ({J9)vev2W%n~}+$8JB}p zfX)H|g@!GSAR6X8E(Qe