|
19 | 19 | "cell_type": "markdown",
|
20 | 20 | "metadata": {},
|
21 | 21 | "source": [
|
22 |
| - "Now that you understand about `list` and `dict` as well as how to write your own functions with loops and conditional statements, you can already write simply programs that perform quite useful operations. \n", |
| 22 | + "You have learned about lists as well as how to write your own functions with loops and conditional statements. As such, you can already write programs performing a variety of tasks. \n", |
23 | 23 | "\n",
|
24 |
| - "However, as a research assistant, you will likely need to access data that are (locally or remotely) stored in files. A file provides a mechanism for **permanently store** information so that they can be retrieved when your program and/or your machine are restarted.\n", |
| 24 | + "However, something that is currently missing is a mechanism to access the data that you want to analyze. A very common way to access these data is through (local or remotely-stored) [files](https://en.wikipedia.org/wiki/Computer_file)." |
| 25 | + ] |
| 26 | + }, |
| 27 | + { |
| 28 | + "cell_type": "markdown", |
| 29 | + "metadata": {}, |
| 30 | + "source": [ |
| 31 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
25 | 32 | "\n",
|
26 |
| - "There are two main types of files:" |
| 33 | + "A **file** provides a mechanism for **permanently storing information** so that they can be retrieved when your program and/or your machine are restarted." |
| 34 | + ] |
| 35 | + }, |
| 36 | + { |
| 37 | + "cell_type": "markdown", |
| 38 | + "metadata": {}, |
| 39 | + "source": [ |
| 40 | + "There are two main types of files: text files and binary files." |
27 | 41 | ]
|
28 | 42 | },
|
29 | 43 | {
|
|
57 | 71 | "source": [
|
58 | 72 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n",
|
59 | 73 | "\n",
|
60 |
| - "A very simple test to evaluate whether a given file is or not a text file is to try opening it using a text editor. If you can understand the visualized content, then that is file is likely a text file. *(Be warned that opening a file in this way can take a long time depending on the size of the file.)*" |
| 74 | + "A very simple test to evaluate whether a given file is a text file is to open it in a text editor. If you can understand the visualized content of an opened file, then the file is likely a text file. *(Be warned that opening a file in this way can take a long time depending on the size of the file.)*" |
61 | 75 | ]
|
62 | 76 | },
|
63 | 77 | {
|
|
105 | 119 | "source": [
|
106 | 120 | "In particular, we will explore the `os.path` sub-module to retrieve some data files that are stored on the server's hard disk.\n",
|
107 | 121 | "\n",
|
108 |
| - "The first required operation is to **import** the `os` module. Then, we will use some of the `os.path` sub-module functionalities and variables to write a function that returns the full path of the server's folder where this notebook is located:\n", |
| 122 | + "The first required operation is to **import** the `os` module. Then, we will use some of the `os.path` sub-module functionalities and variables to write a function that returns the full path of the folder where this notebook is located:\n", |
109 | 123 | "\n",
|
110 | 124 | "- `curdir`: The constant string used by the operating system to refer to the current directory. E.g., `.` for Windows and Linux.\n",
|
111 | 125 | "- `abspath()`: A function that returns the full, absolute version of a path."
|
112 | 126 | ]
|
113 | 127 | },
|
114 | 128 | {
|
115 | 129 | "cell_type": "code",
|
116 |
| - "execution_count": null, |
117 |
| - "metadata": {}, |
118 |
| - "outputs": [], |
| 130 | + "execution_count": 1, |
| 131 | + "metadata": {}, |
| 132 | + "outputs": [ |
| 133 | + { |
| 134 | + "name": "stdout", |
| 135 | + "output_type": "stream", |
| 136 | + "text": [ |
| 137 | + "The current folder is: C:\\code\\hyo2\\epom\\python_basics\n" |
| 138 | + ] |
| 139 | + } |
| 140 | + ], |
119 | 141 | "source": [
|
120 | 142 | "import os\n",
|
121 | 143 | "\n",
|
|
130 | 152 | "cell_type": "markdown",
|
131 | 153 | "metadata": {},
|
132 | 154 | "source": [
|
133 |
| - "The data are inside a `data` sub-folder. We now extend the previous code using `os.path.join()` and `os.path.exist()` functions to:\n", |
| 155 | + "As show in the figure below, the data are inside a `data` sub-folder: \n", |
| 156 | + "\n", |
| 157 | + "" |
| 158 | + ] |
| 159 | + }, |
| 160 | + { |
| 161 | + "cell_type": "markdown", |
| 162 | + "metadata": {}, |
| 163 | + "source": [ |
| 164 | + "As such, we extend the previous code using `os.path.join()` and `os.path.exist()` functions to:\n", |
134 | 165 | "\n",
|
135 | 166 | "- Create the full path to the `data` sub-folder.\n",
|
136 | 167 | "- Check whether the resulting path actually exists."
|
|
140 | 171 | "cell_type": "markdown",
|
141 | 172 | "metadata": {},
|
142 | 173 | "source": [
|
143 |
| - "In case that the `data` sub-folder does not exist, we raise an error using the `raise` keyword." |
| 174 | + "In case that the `data` sub-folder does not exist, we raise an error using the [`raise`](https://docs.python.org/3.6/tutorial/errors.html#raising-exceptions) keyword." |
144 | 175 | ]
|
145 | 176 | },
|
146 | 177 | {
|
147 | 178 | "cell_type": "code",
|
148 |
| - "execution_count": null, |
149 |
| - "metadata": {}, |
150 |
| - "outputs": [], |
| 179 | + "execution_count": 2, |
| 180 | + "metadata": {}, |
| 181 | + "outputs": [ |
| 182 | + { |
| 183 | + "name": "stdout", |
| 184 | + "output_type": "stream", |
| 185 | + "text": [ |
| 186 | + "The data folder is: C:\\code\\hyo2\\epom\\python_basics\\data\n" |
| 187 | + ] |
| 188 | + } |
| 189 | + ], |
151 | 190 | "source": [
|
152 | 191 | "def get_data_folder():\n",
|
153 | 192 | " cur_folder = os.path.abspath(os.path.curdir)\n",
|
|
182 | 221 | "cell_type": "markdown",
|
183 | 222 | "metadata": {},
|
184 | 223 | "source": [
|
185 |
| - "We will now retrieve all the paths to the files in the `data` sub-folder. Specifically, we will create a function `get_data_paths()` that will returns a list containing all the files in that folder, using the `os.listdir()` function." |
| 224 | + "We will now retrieve all the paths to the files in the `data` sub-folder. Specifically, we will create a function `get_data_paths()` that returns a list containing all the files in that folder, using the `os.listdir()` function." |
186 | 225 | ]
|
187 | 226 | },
|
188 | 227 | {
|
189 | 228 | "cell_type": "code",
|
190 |
| - "execution_count": null, |
191 |
| - "metadata": {}, |
192 |
| - "outputs": [], |
| 229 | + "execution_count": 3, |
| 230 | + "metadata": {}, |
| 231 | + "outputs": [ |
| 232 | + { |
| 233 | + "name": "stdout", |
| 234 | + "output_type": "stream", |
| 235 | + "text": [ |
| 236 | + "The data paths are: ['C:\\\\code\\\\hyo2\\\\epom\\\\python_basics\\\\data\\\\ctd.txt', 'C:\\\\code\\\\hyo2\\\\epom\\\\python_basics\\\\data\\\\sal.txt', 'C:\\\\code\\\\hyo2\\\\epom\\\\python_basics\\\\data\\\\temp.txt']\n" |
| 237 | + ] |
| 238 | + } |
| 239 | + ], |
193 | 240 | "source": [
|
194 | 241 | "def get_data_paths():\n",
|
195 | 242 | " data_paths = list() # create a empty list that will be populate and returned\n",
|
|
215 | 262 | "source": [
|
216 | 263 | "In the above code, we wrote a function in which:\n",
|
217 | 264 | "\n",
|
218 |
| - "- We created and populated a list: `data_paths`\n", |
219 |
| - "- We reused a function that we previously created: `get_data_folder()`.\n", |
220 |
| - "- We used several Python functions from the `os` module: e.g., `listdir()`, `join()`.\n", |
221 |
| - "- We executed a `for` loop to populate the `data_paths` list.\n", |
222 |
| - "- We returned the populated list." |
| 265 | + "- We create and populate a list: `data_paths`\n", |
| 266 | + "- We reuse a function that was previously created: `get_data_folder()`.\n", |
| 267 | + "- We use several Python functions from the `os` module: e.g., `listdir()`, `join()`.\n", |
| 268 | + "- We execute a `for` loop to populate the `data_paths` list.\n", |
| 269 | + "- We return the populated list." |
223 | 270 | ]
|
224 | 271 | },
|
225 | 272 | {
|
|
228 | 275 | "source": [
|
229 | 276 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n",
|
230 | 277 | "\n",
|
231 |
| - "You don't need to remember all the names of the available Python functions. But you need to learn how to search for them. The [official Python documentation](https://docs.python.org/3.6/index.html) is usually a good place to start with." |
| 278 | + "You don't need to remember all the names of the available Python functions. But you need to learn how to search for them. The [official Python documentation](https://docs.python.org/3.6/index.html) is a good place to start with." |
232 | 279 | ]
|
233 | 280 | },
|
234 | 281 | {
|
235 | 282 | "cell_type": "markdown",
|
236 | 283 | "metadata": {},
|
237 | 284 | "source": [
|
238 |
| - "From the [Lists of Variables notebook](002_Lists_of_Variables.ipynb), you should remember how to access a value in a list by its index. \n", |
| 285 | + "From the [Lists of Variables notebook](002_Lists_of_Variables.ipynb), you know how to access a value in a list by its index. \n", |
239 | 286 | "\n",
|
240 | 287 | "Thus, to access the file named `sal.txt`, we can use `1` as index since it is the **second** element in the list."
|
241 | 288 | ]
|
242 | 289 | },
|
243 | 290 | {
|
244 | 291 | "cell_type": "code",
|
245 |
| - "execution_count": null, |
246 |
| - "metadata": {}, |
247 |
| - "outputs": [], |
| 292 | + "execution_count": 4, |
| 293 | + "metadata": {}, |
| 294 | + "outputs": [ |
| 295 | + { |
| 296 | + "name": "stdout", |
| 297 | + "output_type": "stream", |
| 298 | + "text": [ |
| 299 | + "The file path with index 1 is: C:\\code\\hyo2\\epom\\python_basics\\data\\sal.txt\n" |
| 300 | + ] |
| 301 | + } |
| 302 | + ], |
248 | 303 | "source": [
|
249 | 304 | "sal_path = retrieved_paths[1]\n",
|
250 | 305 | "print(\"The file path with index 1 is: \" + sal_path)"
|
|
254 | 309 | "cell_type": "markdown",
|
255 | 310 | "metadata": {},
|
256 | 311 | "sour
1CF5
ce": [
|
257 |
| - "In the next section, you will learn how to open and read the content of these text files." |
| 312 | + "In the next section, you will learn how to open and read the content of `sal_path`." |
258 | 313 | ]
|
259 | 314 | },
|
260 | 315 | {
|
|
282 | 337 | "cell_type": "markdown",
|
283 | 338 | "metadata": {},
|
284 | 339 | "source": [
|
285 |
| - "The Python `open()` function takes the name of the file (as a parameter) and returns a file object. \n", |
| 340 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
286 | 341 | "\n",
|
287 |
| - "This object can be used to read the sequence of characters in a few ways:" |
| 342 | + "The Python `open()` function takes the name of the file (as a parameter) and returns a [file object](https://docs.python.org/3.6/glossary.html#term-file-object). " |
| 343 | + ] |
| 344 | + }, |
| 345 | + { |
| 346 | + "cell_type": "markdown", |
| 347 | + "metadata": {}, |
| 348 | + "source": [ |
| 349 | + "This file object can be used to read the sequence of characters in a few ways:" |
288 | 350 | ]
|
289 | 351 | },
|
290 | 352 | {
|
|
317 | 379 | },
|
318 | 380 | {
|
319 | 381 | "cell_type": "code",
|
320 |
| - "execution_count": null, |
321 |
| - "metadata": {}, |
322 |
| - "outputs": [], |
| 382 | + "execution_count": 5, |
| 383 | + "metadata": {}, |
| 384 | + "outputs": [ |
| 385 | + { |
| 386 | + "name": "stdout", |
| 387 | + "output_type": "stream", |
| 388 | + "text": [ |
| 389 | + "31.4\n", |
| 390 | + "31.6\n", |
| 391 | + "30.5\n", |
| 392 | + "30.8\n", |
| 393 | + "30.4\n", |
| 394 | + "31.4\n", |
| 395 | + "31.6\n", |
| 396 | + "30.5\n", |
| 397 | + "30.3\n", |
| 398 | + "30.2\n", |
| 399 | + "31.4\n", |
| 400 | + "31.6\n", |
| 401 | + "32.5\n", |
| 402 | + "30.8\n", |
| 403 | + "31.4\n", |
| 404 | + "31.7\n", |
| 405 | + "31.6\n", |
| 406 | + "31.5\n", |
| 407 | + "30.2\n", |
| 408 | + "30.4\n", |
| 409 | + "\n" |
| 410 | + ] |
| 411 | + } |
| 412 | + ], |
323 | 413 | "source": [
|
324 | 414 | "sal_file = open(sal_path)\n",
|
325 | 415 | "\n",
|
|
333 | 423 | "cell_type": "markdown",
|
334 | 424 | "metadata": {},
|
335 | 425 | "source": [
|
336 |
| - "The execution of the above code will print the 20 salinity values in the text file. Although they look like numbers, they are actually just a single `str` of 100 characters!" |
| 426 | + "The execution of the above code will print the 20 salinity values in the text file. Although they look like numbers, they are **actually** a single `str` of 100 characters!" |
337 | 427 | ]
|
338 | 428 | },
|
339 | 429 | {
|
|
360 | 450 | "source": [
|
361 | 451 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n",
|
362 | 452 | "\n",
|
363 |
| - "Why the characters are 100? Each row has 4 visible characters (e.g., `30.8`) but there is also an invisible character that the text editor interprets as a new line. Thus, `(4+1) * 20 = 100` characters." |
| 453 | + "Why the characters are 100? Each row has 4 visible characters (e.g., `30.8`) but there is also an invisible character (i.e., `\\n`) that the text editor interprets as a new line. Thus, `(4+1) * 20 = 100` characters." |
364 | 454 | ]
|
365 | 455 | },
|
366 | 456 | {
|
367 | 457 | "cell_type": "markdown",
|
368 | 458 | "metadata": {},
|
369 | 459 | "source": [
|
370 |
| - "We will now write a function that reads the sequence of characters, but also split them by line (using the `str` method named `splitlines()`) and convert the result in the corresponding `float` value." |
| 460 | + "We will now write a function that not only reads the sequence of characters, but also splits them by line (using the `str` method named `splitlines()`) and converts the result in the corresponding `float` value." |
371 | 461 | ]
|
372 | 462 | },
|
373 | 463 | {
|
|
400 | 490 | "source": [
|
401 | 491 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n",
|
402 | 492 | "\n",
|
403 |
| - "There are more efficient ways to read a text file. We adopted an approach that is simple to understand for a first learner." |
| 493 | + "There are more efficient ways to read a text file. We adopted an approach that is simple to understand for a first-time learner." |
404 | 494 | ]
|
405 | 495 | },
|
406 | 496 | {
|
|
473 | 563 | "cell_type": "markdown",
|
474 | 564 | "metadata": {},
|
475 | 565 | "source": [
|
476 |
| - "The first required decision is the location on where to store the text file. For this collection of notebook, we will use the `output` sub-folder that can be retrieved running the following code:" |
| 566 | + "If you want to write a text file, the first decision to take is the location on where to store the text file. For this collection of notebook, we will use the `output` sub-folder that can be retrieved running the following code:" |
477 | 567 | ]
|
478 | 568 | },
|
479 | 569 | {
|
480 | 570 | "cell_type": "code",
|
481 |
| - "execution_count": null, |
482 |
| - "metadata": {}, |
483 |
| - "outputs": [], |
| 571 | + "execution_count": 6, |
| 572 | + "metadata": {}, |
| 573 | + "outputs": [ |
| 574 | + { |
| 575 | + "name": "stdout", |
| 576 | + "output_type": "stream", |
| 577 | + "text": [ |
| 578 | + "The output folder is: C:\\code\\hyo2\\epom\\python_basics\\output\n" |
| 579 | + ] |
| 580 | + } |
| 581 | + ], |
484 | 582 | "source": [
|
485 | 583 | "def get_output_folder():\n",
|
486 | 584 | " cur_folder = os.path.abspath(os.path.curdir)\n",
|
|
498 | 596 | "cell_type": "markdown",
|
499 | 597 | "metadata": {},
|
500 | 598 | "source": [
|
501 |
| - "We then use `join()` function to store the output file: e.g., `depths.txt`." |
| 599 | + "We then use the `join()` function to set the output file: e.g., `depths.txt`." |
502 | 600 | ]
|
503 | 601 | },
|
504 | 602 | {
|
|
515 | 613 | "cell_type": "markdown",
|
516 | 614 | "metadata": {},
|
517 | 615 | "source": [
|
518 |
| - "To write a file, you have to use the `open()` passing the mode `w` as second parameter. We put this function within a function that take a list as a second parameter and write the content into the text file." |
| 616 | + "To write a file, you have to use the `open()` function and pass the `w` mode (`w` is for *write*) as second parameter. We put this function within a function that take a list as a second parameter and write the content into the text file." |
519 | 617 | ]
|
520 | 618 | },
|
521 | 619 | {
|
522 | 620 | "cell_type": "code",
|
523 |
| - "execution_count": null, |
| 621 | + "execution_count": 7, |
524 | 622 | "metadata": {},
|
525 | 623 | "outputs": [],
|
526 | 624 | "source": [
|
|
636 | 734 | " * [The os module](https://docs.python.org/3.6/library/os.html)\n",
|
637 | 735 | " * [Input and Output](https://docs.python.org/3.6/tutorial/inputoutput.html)\n",
|
638 | 736 | "* [Cross-platform software](https://en.wikipedia.org/wiki/Cross-platform_software)\n",
|
639 |
| - "* [Text file](https://en.wikipedia.org/wiki/Text_file)\n", |
640 |
| - "* [Binary file](https://en.wikipedia.org/wiki/Binary_file)\n", |
641 |
| - "* [Filename extension](https://en.wikipedia.org/wiki/Filename_extension)" |
| 737 | + "* [Computer file](https://en.wikipedia.org/wiki/Computer_file)\n", |
| 738 | + " * [Text file](https://en.wikipedia.org/wiki/Text_file)\n", |
| 739 | + " * [Binary file](https://en.wikipedia.org/wiki/Binary_file)\n", |
| 740 | + " * [Filename extension](https://en.wikipedia.org/wiki/Filename_extension)" |
642 | 741 | ]
|
643 | 742 | },
|
644 | 743 | {
|
|
655 | 754 | "metadata": {},
|
656 | 755 | "source": [
|
657 | 756 | "<!--NAVIGATION-->\n",
|
658 |
| - "[< Dictionaries and Metadata](006_Dictionaries_and_Metadata.ipynb) | [Contents](index.ipynb) | [A Class as a Data Container>](008_A_Class_as_a_Data_Container.ipynb)" |
| 757 | + "[< Write Your Own Functions](005_Write_Your_Own_Functions.ipynb) | [Contents](index.ipynb) | [Dictionaries and Metadata >](007_Dictionaries_and_Metadata.ipynb)" |
659 | 758 | ]
|
660 | 759 | }
|
661 | 760 | ],
|
|
0 commit comments