From a2a6a3f1694c5eab8ed81eda6e15b61a05fa40e9 Mon Sep 17 00:00:00 2001 From: HenryGuo Date: Wed, 14 Mar 2018 15:21:24 +0800 Subject: [PATCH] Add files via upload --- intro_to_pandas.ipynb | 2164 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2164 insertions(+) create mode 100644 intro_to_pandas.ipynb diff --git a/intro_to_pandas.ipynb b/intro_to_pandas.ipynb new file mode 100644 index 00000000..fb20c1c3 --- /dev/null +++ b/intro_to_pandas.ipynb @@ -0,0 +1,2164 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "intro_to_pandas.ipynb", + "version": "0.3.2", + "views": {}, + "default_view": {}, + "provenance": [], + "collapsed_sections": [ + "JndnmDMp66FL", + "YHIWvc9Ms-Ll", + "TJffr5_Jwqvd" + ] + } + }, + "cells": [ + { + "metadata": { + "id": "JndnmDMp66FL", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "#### Copyright 2017 Google LLC." + ] + }, + { + "metadata": { + "id": "hMqWDc_m6rUC", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [], + "base_uri": "https://localhost:8080/", + "height": 10 + }, + "cellView": "both", + "outputId": "05f2545d-b3d6-48dd-ffdd-af3910b78f45", + "executionInfo": { + "status": "ok", + "timestamp": 1521010393734, + "user_tz": -480, + "elapsed": 1017, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ], + "execution_count": 1, + "outputs": [] + }, + { + "metadata": { + "id": "rHLcriKWLRe4", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " # Pandas 简介" + ] + }, + { + "metadata": { + "id": "QvJBqX8_Bctk", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "**学习目标:**\n", + " * 大致了解 *pandas* 库的 `DataFrame` 和 `Series` 数据结构\n", + " * 存取和处理 `DataFrame` 和 `Series` 中的数据\n", + " * 将 CSV 数据导入 pandas 库的 `DataFrame`\n", + " * 对 `DataFrame` 重建索引来随机打乱数据" + ] + }, + { + "metadata": { + "id": "TIFJ83ZTBctl", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " [*pandas*](http://pandas.pydata.org/) 是一种列存数据分析 API。它是用于处理和分析输入数据的强大工具,很多机器学习框架都支持将 *pandas* 数据结构作为输入。\n", + "虽然全方位介绍 *pandas* API 会占据很长篇幅,但它的核心概念非常简单,我们会在下文中进行说明。有关更完整的参考,请访问 [*pandas* 文档网站](http://pandas.pydata.org/pandas-docs/stable/index.html),其中包含丰富的文档和教程资源。" + ] + }, + { + "metadata": { + "id": "s_JOISVgmn9v", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " ## 基本概念\n", + "\n", + "以下行导入了 *pandas* API 并输出了相应的 API 版本:" + ] + }, + { + "metadata": { + "id": "aSRYu62xUi3g", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "outputId": "9a9a623b-d844-4589-82b6-3d77576392b0", + "executionInfo": { + "status": "ok", + "timestamp": 1521010399674, + "user_tz": -480, + "elapsed": 680, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "import pandas as pd\n", + "pd.__version__" + ], + "execution_count": 2, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "u'0.22.0'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 2 + } + ] + }, + { + "metadata": { + "id": "daQreKXIUslr", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " *pandas* 中的主要数据结构被实现为以下两类:\n", + "\n", + " * **`DataFrame`**,您可以将它想象成一个关系型数据表格,其中包含多个行和已命名的列。\n", + " * **`Series`**,它是单一列。`DataFrame` 中包含一个或多个 `Series`,每个 `Series` 均有一个名称。\n", + "\n", + "数据框架是用于数据操控的一种常用抽象实现形式。[Spark](https://spark.apache.org/) 和 [R](https://www.r-project.org/about.html) 中也有类似的实现。" + ] + }, + { + "metadata": { + "id": "fjnAk1xcU0yc", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " 创建 `Series` 的一种方法是构建 `Series` 对象。例如:" + ] + }, + { + "metadata": { + "id": "DFZ42Uq7UFDj", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 86 + }, + "outputId": "adfba658-149f-49a9-fdea-f2b9731415d0", + "executionInfo": { + "status": "ok", + "timestamp": 1521010401070, + "user_tz": -480, + "elapsed": 1136, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "pd.Series(['San Francisco', 'San Jose', 'Sacramento'])" + ], + "execution_count": 3, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 San Francisco\n", + "1 San Jose\n", + "2 Sacramento\n", + "dtype: object" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 3 + } + ] + }, + { + "metadata": { + "id": "U5ouUp1cU6pC", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " 您可以将映射 `string` 列名称的 `dict` 传递到它们各自的 `Series`,从而创建`DataFrame`对象。如果 `Series` 在长度上不一致,系统会用特殊的 [NA/NaN](http://pandas.pydata.org/pandas-docs/stable/missing_data.html) 值填充缺失的值。例如:" + ] + }, + { + "metadata": { + "id": "avgr6GfiUh8t", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 142 + }, + "outputId": "8d24a6a0-3389-43c8-d82e-fbf4380b14a5", + "executionInfo": { + "status": "ok", + "timestamp": 1521010401875, + "user_tz": -480, + "elapsed": 638, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "city_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento'])\n", + "population = pd.Series([852469, 1015785, 485199])\n", + "\n", + "pd.DataFrame({ 'City name': city_names, 'Population': population })" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
City namePopulation
0San Francisco852469
1San Jose1015785
2Sacramento485199
\n", + "
" + ], + "text/plain": [ + " City name Population\n", + "0 San Francisco 852469\n", + "1 San Jose 1015785\n", + "2 Sacramento 485199" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 4 + } + ] + }, + { + "metadata": { + "id": "oa5wfZT7VHJl", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " 但是在大多数情况下,您需要将整个文件加载到 `DataFrame` 中。下面的示例加载了一个包含加利福尼亚州住房数据的文件。请运行以下单元格以加载数据,并创建特征定义:" + ] + }, + { + "metadata": { + "id": "av6RYOraVG1V", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 317 + }, + "outputId": "f2133493-5b8f-46c1-cabf-8781faac8b76", + "executionInfo": { + "status": "ok", + "timestamp": 1521010403215, + "user_tz": -480, + "elapsed": 1145, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "california_housing_dataframe = pd.read_csv(\"https://storage.googleapis.com/mledu-datasets/california_housing_train.csv\", sep=\",\")\n", + "california_housing_dataframe.describe()" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
count17000.00000017000.00000017000.00000017000.00000017000.00000017000.00000017000.00000017000.00000017000.000000
mean-119.56210835.62522528.5893532643.664412539.4108241429.573941501.2219413.883578207300.912353
std2.0051662.13734012.5869372179.947071421.4994521147.852959384.5208411.908157115983.764387
min-124.35000032.5400001.0000002.0000001.0000003.0000001.0000000.49990014999.000000
25%-121.79000033.93000018.0000001462.000000297.000000790.000000282.0000002.566375119400.000000
50%-118.49000034.25000029.0000002127.000000434.0000001167.000000409.0000003.544600180400.000000
75%-118.00000037.72000037.0000003151.250000648.2500001721.000000605.2500004.767000265000.000000
max-114.31000041.95000052.00000037937.0000006445.00000035682.0000006082.00000015.000100500001.000000
\n", + "
" + ], + "text/plain": [ + " longitude latitude housing_median_age total_rooms \\\n", + "count 17000.000000 17000.000000 17000.000000 17000.000000 \n", + "mean -119.562108 35.625225 28.589353 2643.664412 \n", + "std 2.005166 2.137340 12.586937 2179.947071 \n", + "min -124.350000 32.540000 1.000000 2.000000 \n", + "25% -121.790000 33.930000 18.000000 1462.000000 \n", + "50% -118.490000 34.250000 29.000000 2127.000000 \n", + "75% -118.000000 37.720000 37.000000 3151.250000 \n", + "max -114.310000 41.950000 52.000000 37937.000000 \n", + "\n", + " total_bedrooms population households median_income \\\n", + "count 17000.000000 17000.000000 17000.000000 17000.000000 \n", + "mean 539.410824 1429.573941 501.221941 3.883578 \n", + "std 421.499452 1147.852959 384.520841 1.908157 \n", + "min 1.000000 3.000000 1.000000 0.499900 \n", + "25% 297.000000 790.000000 282.000000 2.566375 \n", + "50% 434.000000 1167.000000 409.000000 3.544600 \n", + "75% 648.250000 1721.000000 605.250000 4.767000 \n", + "max 6445.000000 35682.000000 6082.000000 15.000100 \n", + "\n", + " median_house_value \n", + "count 17000.000000 \n", + "mean 207300.912353 \n", + "std 115983.764387 \n", + "min 14999.000000 \n", + "25% 119400.000000 \n", + "50% 180400.000000 \n", + "75% 265000.000000 \n", + "max 500001.000000 " + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 5 + } + ] + }, + { + "metadata": { + "id": "WrkBjfz5kEQu", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " 上面的示例使用 `DataFrame.describe` 来显示关于 `DataFrame` 的有趣统计信息。另一个实用函数是 `DataFrame.head`,它显示 `DataFrame` 的前几个记录:" + ] + }, + { + "metadata": { + "id": "s3ND3bgOkB5k", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 204 + }, + "outputId": "ce0e27d5-aa8d-4fae-98eb-4cb068d73b5f", + "executionInfo": { + "status": "ok", + "timestamp": 1521010404215, + "user_tz": -480, + "elapsed": 801, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "california_housing_dataframe.head()" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
0-114.3134.1915.05612.01283.01015.0472.01.493666900.0
1-114.4734.4019.07650.01901.01129.0463.01.820080100.0
2-114.5633.6917.0720.0174.0333.0117.01.650985700.0
3-114.5733.6414.01501.0337.0515.0226.03.191773400.0
4-114.5733.5720.01454.0326.0624.0262.01.925065500.0
\n", + "
" + ], + "text/plain": [ + " longitude latitude housing_median_age total_rooms total_bedrooms \\\n", + "0 -114.31 34.19 15.0 5612.0 1283.0 \n", + "1 -114.47 34.40 19.0 7650.0 1901.0 \n", + "2 -114.56 33.69 17.0 720.0 174.0 \n", + "3 -114.57 33.64 14.0 1501.0 337.0 \n", + "4 -114.57 33.57 20.0 1454.0 326.0 \n", + "\n", + " population households median_income median_house_value \n", + "0 1015.0 472.0 1.4936 66900.0 \n", + "1 1129.0 463.0 1.8200 80100.0 \n", + "2 333.0 117.0 1.6509 85700.0 \n", + "3 515.0 226.0 3.1917 73400.0 \n", + "4 624.0 262.0 1.9250 65500.0 " + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 6 + } + ] + }, + { + "metadata": { + "id": "w9-Es5Y6laGd", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " *pandas* 的另一个强大功能是绘制图表。例如,借助 `DataFrame.hist`,您可以快速了解一个列中值的分布:" + ] + }, + { + "metadata": { + "id": "nqndFVXVlbPN", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + }, + { + "item_id": 2 + } + ], + "base_uri": "https://localhost:8080/", + "height": 397 + }, + "outputId": "87ada848-2007-4a35-ddbd-7c94cfd807be", + "executionInfo": { + "status": "ok", + "timestamp": 1521010405311, + "user_tz": -480, + "elapsed": 932, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "california_housing_dataframe.hist('housing_median_age')" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[]],\n", + " dtype=object)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 7 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeoAAAFZCAYAAABXM2zhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X1UlHX+//HXMDAH0UEEGTfLarf0\naEmaa5l4U0Iokp7IVRPWdU3q6Iqtlql499WTlajRmmZZmunRU7GNtofcAjJxyyRanT0uuu0p2VOr\neTejKCqgSPP7o9Os/FRguP1Az8dfcTEz1+d6H+3pdQ1zYfF6vV4BAAAjBTT3AgAAwPURagAADEao\nAQAwGKEGAMBghBoAAIMRagAADEaogVo6cuSI7rjjjkbdxz//+U+lpKQ06j4a0h133KEjR47o448/\n1ty5c5t7OUCrZOFz1EDtHDlyREOHDtW//vWv5l6KMe644w7l5ubqpptuau6lAK0WZ9SAn5xOp0aO\nHKn7779f27dv1w8//KA//elPio+PV3x8vNLS0lRaWipJiomJ0d69e33P/enry5cva/78+Ro2bJji\n4uI0bdo0nT9/XgUFBYqLi5MkrV69Ws8++6xSU1MVGxur0aNH6+TJk5KkgwcPaujQoRo6dKheeeUV\njRw5UgUFBdWue/Xq1Vq0aJEmT56sgQMHatasWcrLy9OoUaM0cOBA5eXlSZIuXbqk5557TsOGDVNM\nTIzWrl3re42//e1viouL0/Dhw7V+/Xrf9m3btmnixImSJI/Ho5SUFMXHxysmJkZvvfVWleN/9913\nNXr0aA0cOFDp6ek1zrusrEwzZszwrWfZsmW+71U3hx07dmjkyJGKjY3VpEmTdPr06Rr3BZiIUAN+\n+OGHH1RRUaEPPvhAc+fO1cqVK/XRRx/p008/1bZt2/TXv/5VJSUl2rhxY7Wvs3v3bh05ckTZ2dnK\nzc3V7bffrn/84x9XPS47O1vz5s3Tjh07FBERoa1bt0qSFi5cqIkTJyo3N1ft2rXTt99+W6v179q1\nSy+88II++OADZWdn+9Y9ZcoUrVu3TpK0bt06HTp0SB988IG2b9+unJwc5eXlqbKyUvPnz9eiRYv0\n0UcfKSAgQJWVlVft47XXXtNNN92k7Oxsbdq0SRkZGTp27Jjv+3//+9+VmZmprVu3asuWLTp+/Hi1\na37nnXd04cIFZWdn6/3339e2bdt8//i53hwOHz6s2bNnKyMjQ5988on69eunxYsX12pGgGkINeAH\nr9erxMREST9e9j1+/Lh27dqlxMREhYSEyGq1atSoUfr888+rfZ3w8HAVFRXp448/9p0xDho06KrH\n9e3bVzfeeKMsFot69OihY8eOqby8XAcPHtSIESMkSb/97W9V23ew7r77bkVERKhDhw6KjIzU4MGD\nJUndunXzna3n5eUpOTlZNptNISEhevjhh5Wbm6tvv/1Wly5d0sCBAyVJjzzyyDX3sWDBAi1cuFCS\n1KVLF0VGRurIkSO+748cOVJWq1WdOnVSRERElYhfy6RJk/Tqq6/KYrGoffv26tq1q44cOVLtHD79\n9FPde++96tatmyRp3Lhx2rlz5zX/YQGYLrC5FwC0JFarVW3atJEkBQQE6IcfftDp06fVvn1732Pa\nt2+vU6dOVfs6d911lxYsWKDNmzdrzpw5iomJ0aJFi656nN1ur7LvyspKnT17VhaLRaGhoZKkoKAg\nRURE1Gr9bdu2rfJ6ISEhVY5Fks6dO6elS5fqpZdekvTjpfC77rpLZ8+eVbt27aoc57UUFhb6zqID\nAgLkdrt9ry2pymv8dEzV+fbbb5Wenq7//Oc/CggI0PHjxzVq1Khq53Du3Dnt3btX8fHxVfZ75syZ\nWs8KMAWhBuqpY8eOOnPmjO/rM2fOqGPHjpKqBlCSzp496/vvn97TPnPmjObNm6c333xT0dHRNe6v\nXbt28nq9KisrU5s2bXT58uUGff/V4XBo0qRJGjJkSJXtRUVFOn/+vO/r6+1z1qxZ+v3vf6+kpCRZ\nLJZrXinwx7PPPqs777xTa9askdVq1bhx4yRVPweHw6Ho6GitWrWqXvsGTMClb6CeHnjgAWVlZams\nrEyXL1+W0+nU/fffL0mKjIzUv//9b0nShx9+qIsXL0qStm7dqjVr1kiSwsLC9Ktf/arW+2vbtq1u\nu+02ffTRR5KkzMxMWSyWBjue2NhYvffee6qsrJTX69Wrr76qTz/9VDfffLOsVqvvh7W2bdt2zf2e\nOnVKPXv2lMVi0fvvv6+ysjLfD9fVxalTp9SjRw9ZrVZ9/vnn+u6771RaWlrtHAYOHKi9e/fq8OHD\nkn782Ntzzz1X5zUAzYlQA/UUHx+vwYMHa9SoURoxYoR+8YtfaMKECZKkqVOnauPGjRoxYoSKiop0\n++23S/oxhj/9xPLw4cN16NAhPfbYY7Xe56JFi7R27Vo99NBDKi0tVadOnRos1snJyercubMeeugh\nxcfHq6ioSL/+9a8VFBSkJUuWaN68eRo+fLgsFovv0vmVpk+frtTUVI0cOVKlpaV69NFHtXDhQv33\nv/+t03r+8Ic/aNmyZRoxYoS+/PJLTZs2TatXr9a+ffuuOweHw6ElS5YoNTVVw4cP17PPPquEhIT6\njgZoFnyOGmihvF6vL8733XefNm7cqO7duzfzqpoec0Brxxk10AL98Y9/9H2cKj8/X16vV7feemvz\nLqoZMAf8HHBGDbRARUVFmjt3rs6ePaugoCDNmjVLN910k1JTU6/5+Ntuu833nrhpioqK6rzua83h\np58PAFoLQg0AgMG49A0AgMEINQAABjPyhidu9zm/Ht+hQ4iKi+v+Oc2fO+ZXd8yufphf3TG7+jFt\nfpGR9ut+r1WcUQcGWpt7CS0a86s7Zlc/zK/umF39tKT5tYpQAwDQWhFqAAAMRqgBADBYjT9MVlZW\nprS0NJ06dUoXL17U1KlT1b17d82ePVuVlZWKjIzUihUrZLPZlJWVpU2bNikgIEBjx47VmDFjVFFR\nobS0NB09elRWq1VLly5Vly5dmuLYAABo8Wo8o87Ly1PPnj21ZcsWrVy5Uunp6Vq1apWSk5P19ttv\n65ZbbpHT6VRpaanWrFmjjRs3avPmzdq0aZPOnDmj7du3KzQ0VO+8846mTJmijIyMpjguAABahRpD\nnZCQoCeeeEKSdOzYMXXq1EkFBQWKjY2VJA0ZMkT5+fnav3+/oqKiZLfbFRwcrD59+sjlcik/P19x\ncXGSpOjoaLlcrkY8HAAAWpdaf4563LhxOn78uNauXavHHntMNptNkhQRESG32y2Px6Pw8HDf48PD\nw6/aHhAQIIvFokuXLvmeDwAArq/WoX733Xf11VdfadasWbry9uDXu1W4v9uv1KFDiN+fcavuw+Ko\nGfOrO2ZXP8yv7phd/bSU+dUY6gMHDigiIkI33HCDevToocrKSrVt21bl5eUKDg7WiRMn5HA45HA4\n5PF4fM87efKkevfuLYfDIbfbre7du6uiokJer7fGs2l/7xYTGWn3+25m+B/mV3fMrn6YX90xu/ox\nbX71ujPZ3r17tWHDBkmSx+NRaWmpoqOjlZOTI0nKzc3VoEGD1KtXLxUWFqqkpEQXLlyQy+VS3759\nNWDAAGVnZ0v68QfT+vXr1xDHBADAz0KNZ9Tjxo3T/PnzlZycrPLycv3f//2fevbsqTlz5igzM1Od\nO3dWYmKigoKCNHPmTKWkpMhisSg1NVV2u10JCQnas2ePkpKSZLPZlJ6e3hTHBQBAq2Dk76P293KE\naZcwWhrmV3fMrn6YX90xu/oxbX7VXfo28rdnAcC1TErf2dxLqNGGtJjmXgJaGW4hCgCAwQg1AAAG\nI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCA\nwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMA\nYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QA\nABiMUAMAYDBCDQCAwQg1AAAGC6zNg5YvX659+/bp8uXLmjx5snbu3KmDBw8qLCxMkpSSkqIHHnhA\nWVlZ2rRpkwICAjR27FiNGTNGFRUVSktL09GjR2W1WrV06VJ16dKlUQ8KAIDWosZQf/HFF/rmm2+U\nmZmp4uJiPfLII7rvvvv09NNPa8iQIb7HlZaWas2aNXI6nQoKCtLo0aMVFxenvLw8hYaGKiMjQ7t3\n71ZGRoZWrlzZqAcFAEBrUeOl73vuuUcvv/yyJCk0NFRlZWWqrKy86nH79+9XVFSU7Ha7goOD1adP\nH7lcLuXn5ysuLk6SFB0dLZfL1cCHAABA61VjqK1Wq0JCQiRJTqdTgwcPltVq1ZYtWzRhwgQ99dRT\nOn36tDwej8LDw33PCw8Pl9vtrrI9ICBAFotFly5daqTDAQCgdanVe9SStGPHDjmdTm3YsEEHDhxQ\nWFiYevTooTfeeEOvvPKK7r777iqP93q913yd622/UocOIQoMtNZ2aZKkyEi7X49HVcyv7phd/bS2\n+TXl8bS22TW1ljK/WoX6s88+09q1a7V+/XrZ7Xb179/f972YmBgtXrxYw4YNk8fj8W0/efKkevfu\nLYfDIbfbre7du6uiokJer1c2m63a/RUXl/p1EJGRdrnd5/x6Dv6H+dUds6uf1ji/pjqe1ji7pmTa\n/Kr7R0ONl77PnTun5cuX6/XXX/f9lPeTTz6pw4cPS5IKCgrUtWtX9erVS4WFhSopKdGFCxfkcrnU\nt29fDRgwQNnZ2ZKkvLw89evXryGOCQCAn4Uaz6g//PBDFRcXa8aMGb5to0aN0owZM9SmTRuFhIRo\n6dKlCg4O1syZM5WSkiKLxaLU1FTZ7XYlJCRoz549SkpKks1mU3p6eqMeEAAArYnFW5s3jZuYv5cj\nTLuE0dIwv7pjdvXj7/wmpe9sxNU0jA1pMU2yH/7s1Y9p86vXpW8AANB8CDUAAAYj1AAAGIxQAwBg\nMEINAIDBCDUAAAYj1AAAGIxQAwBgMEINAIDBCDUAAAYj1AAAGIxQAwBgMEINAIDBCDUAAAYj1AAA\nGIxQAwBgMEINAIDBCDUAAAYj1AAAGIxQAwBgMEINAIDBCDUAAAYLbO4FAA1lUvrO5l5CtTakxTT3\nEgC0QJxRAwBgMEINAIDBCDUAAAYj1AAAGIxQAwBgMEINAIDBCDUAAAYj1AAAGIxQAwBgMEINAIDB\nCDUAAAYj1AAAGIxQAwBgMEINAIDBCDUAAAbj91EDTcT035ct8TuzARNxRg0AgMFqdUa9fPly7du3\nT5cvX9bkyZMVFRWl2bNnq7KyUpGRkVqxYoVsNpuysrK0adMmBQQEaOzYsRozZowqKiqUlpamo0eP\nymq1aunSperSpUtjHxcAAK1CjaH+4osv9M033ygzM1PFxcV65JFH1L9/fyUnJ2v48OF66aWX5HQ6\nlZiYqDVr1sjpdCooKEijR49WXFyc8vLyFBoaqoyMDO3evVsZGRlauXJlUxwbAAAtXo2Xvu+55x69\n/PLLkqTQ0FCVlZWpoKBAsbGxkqQhQ4YoPz9f+/fvV1RUlOx2u4KDg9WnTx+5XC7l5+crLi5OkhQd\nHS2Xy9WIhwMAQOtS4xm11WpVSEiIJMnpdGrw4MHavXu3bDabJCkiIkJut1sej0fh4eG+54WHh1+1\nPSAgQBaLRZcuXfI9/1o6dAhRYKDVrwOJjLT79XhUxfwgNc+fg9b2Z68pj6e1za6ptZT51fqnvnfs\n2CGn06kNGzZo6NChvu1er/eaj/d3+5WKi0truyxJPw7b7T7n13PwP8wPP2nqPwet8c9eUx1Pa5xd\nUzJtftX9o6FWP/X92Wefae3atVq3bp3sdrtCQkJUXl4uSTpx4oQcDoccDoc8Ho/vOSdPnvRtd7vd\nkqSKigp5vd5qz6YBAMD/1Bjqc+fOafny5Xr99dcVFhYm6cf3mnNyciRJubm5GjRokHr16qXCwkKV\nlJTowoULcrlc6tu3rwYMGKDs7GxJUl5envr169eIhwMAQOtS46XvDz/8UMXFxZoxY4ZvW3p6uhYs\nWKDMzEx17txZiYmJCgoK0syZM5WSkiKLxaLU1FTZ7XYlJCRoz549SkpKks1mU3p6eqMeEAAArUmN\noX700Uf16KOPXrX9rbfeumpbfHy84uPjq2z76bPTAADAf9xCFIBPS7jNKfBzwy1EAQAwGKEGAMBg\nhBoAAIMRagAADEaoAQAwGKEGAMBghBoAAIMRagAADEaoAQAwGHcmQ61wxyoAaB6cUQMAYDBCDQCA\nwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMA\nYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABgssLkXAADAlSal72zuJdRoQ1pM\nk+2LM2oAAAxGqAEAMBihBgDAYIQaAACDEWoAAAxGqAEAMBihBgDAYLX6HPXXX3+tqVOnauLEiRo/\nfrzS0tJ08OBBhYWFSZJSUlL0wAMPKCsrS5s2bVJAQIDGjh2rMWPGqKKiQmlpaTp69KisVquWLl2q\nLl26NOpBAUBz4TPAaGg1hrq0tFRLlixR//79q2x/+umnNWTIkCqPW7NmjZxOp4KCgjR69GjFxcUp\nLy9PoaGhysjI0O7du5WRkaGVK1c2/JEAANAK1Xjp22azad26dXI4HNU+bv/+/YqKipLdbldwcLD6\n9Okjl8ul/Px8xcXFSZKio6PlcrkaZuUAAPwM1BjqwMBABQcHX7V9y5YtmjBhgp566imdPn1aHo9H\n4eHhvu+Hh4fL7XZX2R4QECCLxaJLly414CEAANB61ele3w8//LDCwsLUo0cPvfHGG3rllVd09913\nV3mM1+u95nOvt/1KHTqEKDDQ6teaIiPtfj0eVTE/4OeDv+/115QzrFOor3y/OiYmRosXL9awYcPk\n8Xh820+ePKnevXvL4XDI7Xare/fuqqiokNfrlc1mq/b1i4tL/VpPZKRdbvc5/w4CPswP+Hnh73v9\nNfQMqwt/nT6e9eSTT+rw4cOSpIKCAnXt2lW9evVSYWGhSkpKdOHCBblcLvXt21cDBgxQdna2JCkv\nL0/9+vWryy4BAPhZqvGM+sCBA1q2bJm+//57BQYGKicnR+PHj9eMGTPUpk0bhYSEaOnSpQoODtbM\nmTOVkpIii8Wi1NRU2e12JSQkaM+ePUpKSpLNZlN6enpTHBcAAK1CjaHu2bOnNm/efNX2YcOGXbUt\nPj5e8fHxVbb99NlpAADgP+5MBgCAwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMA\nYDBCDQCAwQg1AAAGI9QAABiMUAMAYLA6/T5qAEDLNSl9Z3MvAX7gjBoAAIMRagAADEaoAQAwGKEG\nAMBghBoAAIMRagAADEaoAQAwGKEGAMBghBoAAIMRagAADEaoAQAwGKEGAMBghBoAAIMRagAADEao\nAQAwGKEGAMBghBoAAIMRagAADEaoAQAwGKEGAMBghBoAAIMRagAADEaoAQAwGKEGAMBghBoAAIMR\nagAADFarUH/99dd68MEHtWXLFknSsWPH9Lvf/U7JycmaPn26Ll26JEnKysrSb37zG40ZM0bvvfee\nJKmiokIzZ85UUlKSxo8fr8OHDzfSoQAA0PrUGOrS0lItWbJE/fv3921btWqVkpOT9fbbb+uWW26R\n0+lUaWmp1qxZo40bN2rz5s3atGmTzpw5o+3btys0NFTvvPOOpkyZooyMjEY9IAAAWpMaQ22z2bRu\n3To5HA7ftoKCAsXGxkqShgwZovz8fO3fv19RUVGy2+0KDg5Wnz595HK5lJ+fr7i4OElSdHS0XC5X\nIx0KAACtT42hDgwMVHBwcJVtZWVlstlskqSIiAi53W55PB6Fh4f7HhMeHn7V9oCAAFksFt+lcgAA\nUL3A+r6A1+ttkO1X6tAhRIGBVr/WERlp9+vxqIr5AUDtNeX/M+sU6pCQEJWXlys4OFgnTpyQw+GQ\nw+GQx+PxPebkyZPq3bu3HA6H3G63unfvroqKCnm9Xt/Z+PUUF5f6tZ7ISLvc7nN1ORSI+QGAvxr6\n/5nVhb9OH8+Kjo5WTk6OJCk3N1eDBg1Sr169VFhYqJKSEl24cEEul0t9+/bVgAEDlJ2dLUnKy8tT\nv3796rJLAAB+lmo8oz5w4ICWLVum77//XoGBgcrJydGLL76otLQ0ZWZmqnPnzkpMTFRQUJBmzpyp\nlJQUWSwWpaamym63KyEhQXv27FFSUpJsNpvS09Ob4rgAAGgVLN7avGncxPy9pMCl2/qpzfwmpe9s\notUAgPk2pMU06Os1+KVvAADQNOr9U99oGJyxAgCuhTNqAAAMRqgBADAYoQYAwGCEGgAAgxFqAAAM\nRqgBADAYoQYAwGCEGgAAgxFqAAAMRqgBADAYoQYAwGCEGgAAgxFqAAAMRqgBADAYoQYAwGCEGgAA\ngxFqAAAMRqgBADAYoQYAwGCEGgAAgxFqAAAMRqgBADAYoQYAwGCEGgAAgxFqAAAMRqgBADAYoQYA\nwGCEGgAAgxFqAAAMRqgBADAYoQYAwGCEGgAAgxFqAAAMFtjcC2gKk9J3NvcSAACoE86oAQAwGKEG\nAMBghBoAAIMRagAADFanHyYrKCjQ9OnT1bVrV0lSt27d9Pjjj2v27NmqrKxUZGSkVqxYIZvNpqys\nLG3atEkBAQEaO3asxowZ06AHAABAa1bnn/q+9957tWrVKt/Xc+fOVXJysoYPH66XXnpJTqdTiYmJ\nWrNmjZxOp4KCgjR69GjFxcUpLCysQRYPAEBr12CXvgsKChQbGytJGjJkiPLz87V//35FRUXJbrcr\nODhYffr0kcvlaqhdAgDQ6tX5jPrQoUOaMmWKzp49q2nTpqmsrEw2m02SFBERIbfbLY/Ho/DwcN9z\nwsPD5Xa7a3ztDh1CFBho9Ws9kZF2/w4AAIA6asrm1CnUt956q6ZNm6bhw4fr8OHDmjBhgiorK33f\n93q913ze9bb//4qLS/1aT2SkXW73Ob+eAwBAXTV0c6oLf50ufXfq1EkJCQmyWCy6+eab1bFjR509\ne1bl5eWSpBMnTsjhcMjhcMjj8fied/LkSTkcjrrsEgCAn6U6hTorK0tvvvmmJMntduvUqVMaNWqU\ncnJyJEm5ubkaNGiQevXqpcLCQpWUlOjChQtyuVzq27dvw60eAIBWrk6XvmNiYvTMM8/ok08+UUVF\nhRYvXqwePXpozpw5yszMVOfOnZWYmKigoCDNnDlTKSkpslgsSk1Nld3Oe8kAANSWxVvbN46bkL/X\n/mt6j5pfygEAaEgb0mIa9PUa/D1qAADQNAg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiM\nUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAG\nI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCA\nwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABiMUAMAYDBCDQCAwQg1AAAGI9QAABgssCl2\n8sILL2j//v2yWCyaN2+e7rrrrqbYLQAALV6jh/rLL7/Ud999p8zMTBUVFWnevHnKzMxs7N0CANAq\nNPql7/z8fD344IOSpNtuu01nz57V+fPnG3u3AAC0Co0eao/How4dOvi+Dg8Pl9vtbuzdAgDQKjTJ\ne9RX8nq9NT4mMtLu9+tW95wPMh72+/UAADBBo59ROxwOeTwe39cnT55UZGRkY+8WAIBWodFDPWDA\nAOXk5EiSDh48KIfDoXbt2jX2bgEAaBUa/dJ3nz59dOedd2rcuHGyWCxatGhRY+8SAIBWw+KtzZvG\nAACgWXBnMgAADEaoAQAwWJN/PKuhcXtS/3399deaOnWqJk6cqPHjx+vYsWOaPXu2KisrFRkZqRUr\nVshmszX3Mo20fPly7du3T5cvX9bkyZMVFRXF7GqhrKxMaWlpOnXqlC5evKipU6eqe/fuzM5P5eXl\nGjFihKZOnar+/fszv1oqKCjQ9OnT1bVrV0lSt27d9Pjjj7eY+bXoM+orb0/6/PPP6/nnn2/uJRmv\ntLRUS5YsUf/+/X3bVq1apeTkZL399tu65ZZb5HQ6m3GF5vriiy/0zTffKDMzU+vXr9cLL7zA7Gop\nLy9PPXv21JYtW7Ry5Uqlp6czuzp47bXX1L59e0n8vfXXvffeq82bN2vz5s1auHBhi5pfiw41tyf1\nn81m07p16+RwOHzbCgoKFBsbK0kaMmSI8vPzm2t5Rrvnnnv08ssvS5JCQ0NVVlbG7GopISFBTzzx\nhCTp2LFj6tSpE7PzU1FRkQ4dOqQHHnhAEn9v66slza9Fh5rbk/ovMDBQwcHBVbaVlZX5LvlEREQw\nw+uwWq0KCQmRJDmdTg0ePJjZ+WncuHF65plnNG/ePGbnp2XLliktLc33NfPzz6FDhzRlyhQlJSXp\n888/b1Hza/HvUV+JT5rVHzOs2Y4dO+R0OrVhwwYNHTrUt53Z1ezdd9/VV199pVmzZlWZF7Or3l/+\n8hf17t1bXbp0ueb3mV/1br31Vk2bNk3Dhw/X4cOHNWHCBFVWVvq+b/r8WnSouT1pwwgJCVF5ebmC\ng4N14sSJKpfFUdVnn32mtWvXav369bLb7cyulg4cOKCIiAjdcMMN6tGjhyorK9W2bVtmV0u7du3S\n4cOHtWvXLh0/flw2m40/e37o1KmTEhISJEk333yzOnbsqMLCwhYzvxZ96ZvbkzaM6Oho3xxzc3M1\naNCgZl6Rmc6dO6fly5fr9ddfV1hYmCRmV1t79+7Vhg0bJP34llVpaSmz88PKlSu1detW/fnPf9aY\nMWM0depU5ueHrKwsvfnmm5Ikt9utU6dOadSoUS1mfi3+zmQvvvii9u7d67s9affu3Zt7SUY7cOCA\nli1bpu+//16BgYHq1KmTXnytKYqYAAAArElEQVTxRaWlpenixYvq3Lmzli5dqqCgoOZeqnEyMzO1\nevVq/fKXv/RtS09P14IFC5hdDcrLyzV//nwdO3ZM5eXlmjZtmnr27Kk5c+YwOz+tXr1aN954owYO\nHMj8aun8+fN65plnVFJSooqKCk2bNk09evRoMfNr8aEGAKA1a9GXvgEAaO0INQAABiPUAAAYjFAD\nAGAwQg0AgMEINQAABiPUAAAYjFADAGCw/wdkB5RjykY3PgAAAABJRU5ErkJggg==\n", + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "metadata": { + "id": "XtYZ7114n3b-", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " ## 访问数据\n", + "\n", + "您可以使用熟悉的 Python dict/list 指令访问 `DataFrame` 数据:" + ] + }, + { + "metadata": { + "id": "_TFm7-looBFF", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + }, + { + "item_id": 2 + } + ], + "base_uri": "https://localhost:8080/", + "height": 103 + }, + "outputId": "bbe17a44-79e3-461f-c0a7-e8a223ddfd51", + "executionInfo": { + "status": "ok", + "timestamp": 1521010406315, + "user_tz": -480, + "elapsed": 862, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "cities = pd.DataFrame({ 'City name': city_names, 'Population': population })\n", + "print type(cities['City name'])\n", + "cities['City name']" + ], + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "text": [ + "\n" + ], + "name": "stdout" + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 San Francisco\n", + "1 San Jose\n", + "2 Sacramento\n", + "Name: City name, dtype: object" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 8 + } + ] + }, + { + "metadata": { + "id": "V5L6xacLoxyv", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + }, + { + "item_id": 2 + } + ], + "base_uri": "https://localhost:8080/", + "height": 52 + }, + "outputId": "a1e39b8f-2cb2-4d78-9db0-def04b01601b", + "executionInfo": { + "status": "ok", + "timestamp": 1521010407120, + "user_tz": -480, + "elapsed": 674, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "print type(cities['City name'][1])\n", + "cities['City name'][1]" + ], + "execution_count": 9, + "outputs": [ + { + "output_type": "stream", + "text": [ + "\n" + ], + "name": "stdout" + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'San Jose'" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 9 + } + ] + }, + { + "metadata": { + "id": "gcYX1tBPugZl", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + }, + { + "item_id": 2 + } + ], + "base_uri": "https://localhost:8080/", + "height": 128 + }, + "outputId": "a3b3266a-5ec3-4845-a783-22efb1ce0442", + "executionInfo": { + "status": "ok", + "timestamp": 1521010408033, + "user_tz": -480, + "elapsed": 731, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "print type(cities[0:2])\n", + "cities[0:2]" + ], + "execution_count": 10, + "outputs": [ + { + "output_type": "stream", + "text": [ + "\n" + ], + "name": "stdout" + }, + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
City namePopulation
0San Francisco852469
1San Jose1015785
\n", + "
" + ], + "text/plain": [ + " City name Population\n", + "0 San Francisco 852469\n", + "1 San Jose 1015785" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 10 + } + ] + }, + { + "metadata": { + "id": "65g1ZdGVjXsQ", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " 此外,*pandas* 针对高级[索引和选择](http://pandas.pydata.org/pandas-docs/stable/indexing.html)提供了极其丰富的 API(数量过多,此处无法逐一列出)。" + ] + }, + { + "metadata": { + "id": "RM1iaD-ka3Y1", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " ## 操控数据\n", + "\n", + "您可以向 `Series` 应用 Python 的基本运算指令。例如:" + ] + }, + { + "metadata": { + "id": "XWmyCFJ5bOv-", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 86 + }, + "outputId": "d589778a-9330-43a7-9e1c-168a1683753f", + "executionInfo": { + "status": "ok", + "timestamp": 1521010408873, + "user_tz": -480, + "elapsed": 706, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "population / 1000." + ], + "execution_count": 11, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 852.469\n", + "1 1015.785\n", + "2 485.199\n", + "dtype: float64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 11 + } + ] + }, + { + "metadata": { + "id": "TQzIVnbnmWGM", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " [NumPy](http://www.numpy.org/) 是一种用于进行科学计算的常用工具包。*pandas* `Series` 可用作大多数 NumPy 函数的参数:" + ] + }, + { + "metadata": { + "id": "ko6pLK6JmkYP", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 86 + }, + "outputId": "b5167c98-2a6d-450d-d1bf-bf25b9391553", + "executionInfo": { + "status": "ok", + "timestamp": 1521010409849, + "user_tz": -480, + "elapsed": 729, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "import numpy as np\n", + "\n", + "np.log(population)" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 13.655892\n", + "1 13.831172\n", + "2 13.092314\n", + "dtype: float64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 12 + } + ] + }, + { + "metadata": { + "id": "xmxFuQmurr6d", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " 对于更复杂的单列转换,您可以使用 `Series.apply`。像 Python [映射函数](https://docs.python.org/2/library/functions.html#map)一样,`Series.apply` 将以参数形式接受 [lambda 函数](https://docs.python.org/2/tutorial/controlflow.html#lambda-expressions),而该函数会应用于每个值。\n", + "\n", + "下面的示例创建了一个指明 `population` 是否超过 100 万的新 `Series`:" + ] + }, + { + "metadata": { + "id": "Fc1DvPAbstjI", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 86 + }, + "outputId": "c08ade5b-0c08-4894-dc98-fbc948489e4e", + "executionInfo": { + "status": "ok", + "timestamp": 1521010410894, + "user_tz": -480, + "elapsed": 892, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "population.apply(lambda val: val > 1000000)" + ], + "execution_count": 13, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 False\n", + "1 True\n", + "2 False\n", + "dtype: bool" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 13 + } + ] + }, + { + "metadata": { + "id": "ZeYYLoV9b9fB", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " \n", + "`DataFrames` 的修改方式也非常简单。例如,以下代码向现有 `DataFrame` 添加了两个 `Series`:" + ] + }, + { + "metadata": { + "id": "0gCEX99Hb8LR", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 142 + }, + "outputId": "428e6abf-fa8b-4f28-c1c6-0aecaacaf5f7", + "executionInfo": { + "status": "ok", + "timestamp": 1521010412133, + "user_tz": -480, + "elapsed": 1101, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "cities['Area square miles'] = pd.Series([46.87, 176.53, 97.92])\n", + "cities['Population density'] = cities['Population'] / cities['Area square miles']\n", + "cities" + ], + "execution_count": 14, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
City namePopulationArea square milesPopulation density
0San Francisco85246946.8718187.945381
1San Jose1015785176.535754.177760
2Sacramento48519997.924955.055147
\n", + "
" + ], + "text/plain": [ + " City name Population Area square miles Population density\n", + "0 San Francisco 852469 46.87 18187.945381\n", + "1 San Jose 1015785 176.53 5754.177760\n", + "2 Sacramento 485199 97.92 4955.055147" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 14 + } + ] + }, + { + "metadata": { + "id": "6qh63m-ayb-c", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " ## 练习 1\n", + "\n", + "通过添加一个新的布尔值列(当且仅当以下*两项*均为 True 时为 True)修改 `cities` 表格:\n", + "\n", + " * 城市以圣人命名。\n", + " * 城市面积大于 50 平方英里。\n", + "\n", + "**注意:**布尔值 `Series` 是使用“按位”而非传统布尔值“运算符”组合的。例如,执行*逻辑与*时,应使用 `&`,而不是 `and`。\n", + "\n", + "**提示:**\"San\" 在西班牙语中意为 \"saint\"。" + ] + }, + { + "metadata": { + "id": "zCOn8ftSyddH", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [], + "base_uri": "https://localhost:8080/", + "height": 17 + }, + "outputId": "bf6b298a-27f7-4c00-dc77-1e0017ed5a47", + "executionInfo": { + "status": "ok", + "timestamp": 1521010412986, + "user_tz": -480, + "elapsed": 684, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "# Your code here" + ], + "execution_count": 15, + "outputs": [] + }, + { + "metadata": { + "id": "YHIWvc9Ms-Ll", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " ### 解决方案\n", + "\n", + "点击下方,查看解决方案。" + ] + }, + { + "metadata": { + "id": "T5OlrqtdtCIb", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 10 + }, + "outputId": "a31b153b-b64b-4ba3-e1e5-fae5deda18a3", + "executionInfo": { + "status": "ok", + "timestamp": 1521010414046, + "user_tz": -480, + "elapsed": 930, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "cities['Is wide and has saint name'] = (cities['Area square miles'] > 50) & cities['City name'].apply(lambda name: name.startswith('San'))\n", + "cities" + ], + "execution_count": 16, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
City namePopulationArea square milesPopulation densityIs wide and has saint name
0San Francisco85246946.8718187.945381False
1San Jose1015785176.535754.177760True
2Sacramento48519997.924955.055147False
\n", + "
" + ], + "text/plain": [ + " City name Population Area square miles Population density \\\n", + "0 San Francisco 852469 46.87 18187.945381 \n", + "1 San Jose 1015785 176.53 5754.177760 \n", + "2 Sacramento 485199 97.92 4955.055147 \n", + "\n", + " Is wide and has saint name \n", + "0 False \n", + "1 True \n", + "2 False " + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 16 + } + ] + }, + { + "metadata": { + "id": "f-xAOJeMiXFB", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " ## 索引\n", + "`Series` 和 `DataFrame` 对象也定义了 `index` 属性,该属性会向每个 `Series` 项或 `DataFrame` 行赋一个标识符值。\n", + "\n", + "默认情况下,在构造时,*pandas* 会赋可反映源数据顺序的索引值。索引值在创建后是稳定的;也就是说,它们不会因为数据重新排序而发生改变。" + ] + }, + { + "metadata": { + "id": "2684gsWNinq9", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "outputId": "75170199-ca02-4ca6-a8b5-bdd159f90475", + "executionInfo": { + "status": "ok", + "timestamp": 1521010415086, + "user_tz": -480, + "elapsed": 931, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "city_names.index" + ], + "execution_count": 17, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "RangeIndex(start=0, stop=3, step=1)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 17 + } + ] + }, + { + "metadata": { + "id": "F_qPe2TBjfWd", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "outputId": "41b493bf-81e5-4474-b195-d7ee456ddf22", + "executionInfo": { + "status": "ok", + "timestamp": 1521010416219, + "user_tz": -480, + "elapsed": 1016, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "cities.index" + ], + "execution_count": 18, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "RangeIndex(start=0, stop=3, step=1)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 18 + } + ] + }, + { + "metadata": { + "id": "hp2oWY9Slo_h", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " 调用 `DataFrame.reindex` 以手动重新排列各行的顺序。例如,以下方式与按城市名称排序具有相同的效果:" + ] + }, + { + "metadata": { + "id": "sN0zUzSAj-U1", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 142 + }, + "outputId": "5096cd4a-7348-4365-9994-5ee21ecd37b5", + "executionInfo": { + "status": "ok", + "timestamp": 1521010417285, + "user_tz": -480, + "elapsed": 937, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "cities.reindex([2, 0, 1])" + ], + "execution_count": 19, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
City namePopulationArea square milesPopulation densityIs wide and has saint name
2Sacramento48519997.924955.055147False
0San Francisco85246946.8718187.945381False
1San Jose1015785176.535754.177760True
\n", + "
" + ], + "text/plain": [ + " City name Population Area square miles Population density \\\n", + "2 Sacramento 485199 97.92 4955.055147 \n", + "0 San Francisco 852469 46.87 18187.945381 \n", + "1 San Jose 1015785 176.53 5754.177760 \n", + "\n", + " Is wide and has saint name \n", + "2 False \n", + "0 False \n", + "1 True " + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 19 + } + ] + }, + { + "metadata": { + "id": "-GQFz8NZuS06", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " 重建索引是一种随机排列 `DataFrame` 的绝佳方式。在下面的示例中,我们会取用类似数组的索引,然后将其传递至 NumPy 的 `random.permutation` 函数,该函数会随机排列其值的位置。如果使用此重新随机排列的数组调用 `reindex`,会导致 `DataFrame` 行以同样的方式随机排列。\n", + "尝试多次运行以下单元格!" + ] + }, + { + "metadata": { + "id": "mF8GC0k8uYhz", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 142 + }, + "outputId": "98adbf2b-3793-49e1-f280-f133ef3e7c96", + "executionInfo": { + "status": "ok", + "timestamp": 1521010418890, + "user_tz": -480, + "elapsed": 1426, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "cities.reindex(np.random.permutation(cities.index))" + ], + "execution_count": 20, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
City namePopulationArea square milesPopulation densityIs wide and has saint name
0San Francisco85246946.8718187.945381False
2Sacramento48519997.924955.055147False
1San Jose1015785176.535754.177760True
\n", + "
" + ], + "text/plain": [ + " City name Population Area square miles Population density \\\n", + "0 San Francisco 852469 46.87 18187.945381 \n", + "2 Sacramento 485199 97.92 4955.055147 \n", + "1 San Jose 1015785 176.53 5754.177760 \n", + "\n", + " Is wide and has saint name \n", + "0 False \n", + "2 False \n", + "1 True " + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 20 + } + ] + }, + { + "metadata": { + "id": "fSso35fQmGKb", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " 有关详情,请参阅[索引文档](http://pandas.pydata.org/pandas-docs/stable/indexing.html#index-objects)。" + ] + }, + { + "metadata": { + "id": "8UngIdVhz8C0", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " ## 练习 2\n", + "\n", + "`reindex` 方法允许使用未包含在原始 `DataFrame` 索引值中的索引值。请试一下,看看如果使用此类值会发生什么!您认为允许此类值的原因是什么?" + ] + }, + { + "metadata": { + "id": "PN55GrDX0jzO", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "outputId": "d8b8e1cc-7704-4dc0-e39b-3d26e5b412f7", + "executionInfo": { + "status": "ok", + "timestamp": 1521010457101, + "user_tz": -480, + "elapsed": 1118, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "# Your code here\n", + "print 'hello'" + ], + "execution_count": 24, + "outputs": [ + { + "output_type": "stream", + "text": [ + "hello\n" + ], + "name": "stdout" + } + ] + }, + { + "metadata": { + "id": "SQ6zmohATKVV", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + } + } + }, + "cell_type": "code", + "source": [ + "" + ], + "execution_count": 0, + "outputs": [] + }, + { + "metadata": { + "id": "TJffr5_Jwqvd", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " ### 解决方案\n", + "\n", + "点击下方,查看解决方案。" + ] + }, + { + "metadata": { + "id": "8oSvi2QWwuDH", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " 如果您的 `reindex` 输入数组包含原始 `DataFrame` 索引值中没有的值,`reindex` 会为此类“丢失的”索引添加新行,并在所有对应列中填充 `NaN` 值:" + ] + }, + { + "metadata": { + "id": "yBdkucKCwy4x", + "colab_type": "code", + "colab": { + "autoexec": { + "startup": false, + "wait_interval": 0 + }, + "output_extras": [ + { + "item_id": 1 + } + ], + "base_uri": "https://localhost:8080/", + "height": 173 + }, + "outputId": "b6524051-6112-4a88-a269-8370d6ae015a", + "executionInfo": { + "status": "ok", + "timestamp": 1521010420678, + "user_tz": -480, + "elapsed": 896, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + } + } + }, + "cell_type": "code", + "source": [ + "cities.reindex([0, 4, 5, 2])" + ], + "execution_count": 22, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
City namePopulationArea square milesPopulation densityIs wide and has saint name
0San Francisco852469.046.8718187.945381False
4NaNNaNNaNNaNNaN
5NaNNaNNaNNaNNaN
2Sacramento485199.097.924955.055147False
\n", + "
" + ], + "text/plain": [ + " City name Population Area square miles Population density \\\n", + "0 San Francisco 852469.0 46.87 18187.945381 \n", + "4 NaN NaN NaN NaN \n", + "5 NaN NaN NaN NaN \n", + "2 Sacramento 485199.0 97.92 4955.055147 \n", + "\n", + " Is wide and has saint name \n", + "0 False \n", + "4 NaN \n", + "5 NaN \n", + "2 False " + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 22 + } + ] + }, + { + "metadata": { + "id": "2l82PhPbwz7g", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + " 这种行为是可取的,因为索引通常是从实际数据中提取的字符串(请参阅 [*pandas* reindex 文档](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html),查看索引值是浏览器名称的示例)。\n", + "\n", + "在这种情况下,如果允许出现“丢失的”索引,您将可以轻松使用外部列表重建索引,因为您不必担心会将输入清理掉。" + ] + } + ] +} \ No newline at end of file