8000 reading · Issue #4 · UBC-DSCI/introduction-to-datascience-python · GitHub
[go: up one dir, main page]

Skip to content

reading #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
13 tasks done
nd265 opened this issue May 12, 2022 · 2 comments · Fixed by #49
Closed
13 tasks done

reading #4

nd265 opened this issue May 12, 2022 · 2 comments · Fixed by #49
Assignees

Comments

@nd265
Copy link
Contributor
nd265 commented May 12, 2022

Progress:

Basic code has been converted from R to python. The conversion of text specific to R still needs to be converted to python. All the data files required for reading purpose have been pushed to the data folder. Have kept some R code still in the notebook and will take action on it after confirmation. The changes can be found on the reading branch of this repository in file reading.md

Issues and doubts regarding the Reading chapter:

  • reading.py not getting updated after building the reading.md file with jb build reading.md.
  • What is the meaning of \index{loading|see{reading}}\index{reading!definition}, I think this is something specific to R markdown?
  • Should the documentation only specific to R be deleted?
  • What to do regarding the absolute path, when reading happiness_report.csv
  • Should the column names be included while reading from data/can_lang-meta-data.csv
  • No need for read_delim dection, as it is already covered in the previous section using sep=\t, or should I just replace the documentation with - 'Any delimiter can be used with sep='''
  • 3 ways of assigning column names- while reading(using names=), using rename, and using df.columns. which one to be used here? In the python tutorial, I saw names being used, but in R, rename is being used, so confused, which one to use
  • Installation of openpyxl, pgdb
  • What to do about lazy evaluation and collect, as python can read from Postres simply. should these be deleted from the documentation?
  • As the server fakeserver.stat.ubc.ca is an alias, the code in the python code-cell would error out, so should the code be in commented form?
  • To find the min of average_rating, can it be done in just one code cell? Unlike done in R?
  • What to do about additional resources section and exercises?
  • Confirm: I am reading images using the Myst syntax :
{figure} ./path/to/figure.jpg
:name: label
caption
@nd265 nd265 self-assigned this May 12, 2022
@ttimbers
Copy link

What is the meaning of \index{loading|see{reading}}\index{reading!definition}, I think this is something specific to R markdown?
see https://jupyterbook.org/en/stable/content/content-blocks.html#indexes

These are flagging terms to be included in the index. They LaTeX and can also be used in Jupyter book (see this example here: https://github.com/py-pkgs/py-pkgs/blob/main/py-pkgs/03-how-to-package-a-python.ipynb)

@nd265
Copy link
Contributor Author
nd265 commented May 24, 2022

Progress

  1. Have converted all code to python and corresponding text as well
  2. Have added tags in the code cells to hide output of some cells or not show it at all
  3. As discussed in the previous call, have left the \index{}... untouched as the course of action for it is yet to be decided
  4. As discussed in the previous call, have left the optional sections starting from Obtaining data from the web until the end of the chapter Additional Resources untouched, as it is yet to be decided by the teaching team.

Queries

  • What is to be done for the large output returned by the SQL query? Is it to be truncated? or completely shown.
  • What is to be done regarding the figure that shows the data referencing in R (it uses terms like tibble and the captions are part of the figure, which are specific to R)
    Example:

image

  • In Postgres section, the output in the R notebook is R specific, so to get a python specific output, can the actual server credentials be provided on which the queries can be executed, or the R specific output suffices?
    Example:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants
0