8000 :books: adjust the position of readme content · blog2i2j/pyexcel.._..pyexcel@f7f8d30 · GitHub
[go: up one dir, main page]

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit f7f8d30

Browse files
committed
📚 adjust the position of readme content
1 parent 4fc6901 commit f7f8d30

File tree

4 files changed

+169
-169
lines changed

4 files changed

+169
-169
lines changed

.moban.d/partial-data.rst.jj2

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Why did not I see above benefit?
1414

1515
This feature depends heavily on the implementation details.
1616

17-
`pyexcel-xls(xlrd)`, `pyexcel-xlsx(openpyxl)`, `pyexcel-ods(odfpy)` and `pyexcel-ods3(pyexcel-ezodf)`
17+
`pyexcel-xls`_(xlrd), `pyexcel-xlsx`_(openpyxl), `pyexcel-ods`_(odfpy) and `pyexcel-ods3`_(pyexcel-ezodf)
1818
will read all data into memory. Because xls, xlsx and ods file are effective a zipped folder,
1919
all four will unzip the folder and read the content in xml format in **full**, so as to make sense
2020
of all details.
@@ -24,7 +24,7 @@ consumption won't differ from reading the whole data back. Only after the partia
2424
data is returned, the memory comsumption curve shall jump the cliff. So pagination
2525
code here only limits the data returned to your program.
2626

27-
With that said, `pyexcel-xlsxr`, `pyexcel-odsr` and `pyexcel-htmlr` DOES read partial data into memory.
27+
With that said, `pyexcel-xlsxr`_, `pyexcel-odsr`_ and `pyexcel-htmlr`_ DOES read partial data into memory.
2828
Those three are implemented in such a way that they consume the xml(html) when needed. When they
2929
have read designated portion of the data, they stop, even if they are half way through.
3030

.moban.d/pyexcel-README.rst.jj2

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@ Feature Highlights
2424

2525
{%include "one-liners.rst.jj2" %}
2626

27-
{%include "two-liners.rst.jj2" %}
2827

2928
Hidden feature: partial read
3029
===============================================
@@ -33,6 +32,7 @@ Most pyexcel users do not know, but other library users were requesting `the sim
3332

3433
{%include "partial-data.rst.jj2" %}
3534

35+
{%include "two-liners.rst.jj2" %}
3636

3737
Available Plugins
3838
=================

README.rst

Lines changed: 164 additions & 164 deletions
Original file line numberDiff line numberDiff line change
@@ -525,6 +525,170 @@ Suppose you just want to extract one sheet from many sheets that exists in a wor
525525
for the output file, you can specify any of the supported formats
526526

527527

528+
529+
Hidden feature: partial read
530+
===============================================
531+
532+
Most pyexcel users do not know, but other library users were requesting `the similar features <https://github.com/jazzband/tablib/issues/467>`_
533+
534+
535+
When you are dealing with huge amount of data, e.g. 64GB, obviously you would not
536+
like to fill up your memory with those data. What you may want to do is, record
537+
data from Nth line, take M records and stop. And you only want to use your memory
538+
for the M records, not for beginning part nor for the tail part.
539+
540+
Hence partial read feature is developed to read partial data into memory for processing.
541+
You can paginate by row, by column and by both, hence you dictate what portion of the
542+
data to read back. But remember only row limit features help you save memory. Let's
543+
you use this feature to record data from Nth column, take M number of columns and skip
544+
the rest. You are not going to reduce your memory footprint.
545+
546+
Why did not I see above benefit?
547+
548+
This feature depends heavily on the implementation details.
549+
550+
`pyexcel-xls`_(xlrd), `pyexcel-xlsx`_(openpyxl), `pyexcel-ods`_(odfpy) and `pyexcel-ods3`_(pyexcel-ezodf)
551+
will read all data into memory. Because xls, xlsx and ods file are effective a zipped folder,
552+
all four will unzip the folder and read the content in xml format in **full**, so as to make sense
553+
of all details.
554+
555+
Hence, during the partial data is been returned, the memory
556+
consumption won't differ from reading the whole data back. Only after the partial
557+
data is returned, the memory comsumption curve shall jump the cliff. So pagination
558+
code here only limits the data returned to your program.
559+
560+
With that said, `pyexcel-xlsxr`_, `pyexcel-odsr`_ and `pyexcel-htmlr`_ DOES read partial data into memory.
561+
Those three are implemented in such a way that they consume the xml(html) when needed. When they
562+
have read designated portion of the data, they stop, even if they are half way through.
563+
564+
In addition, pyexcel's csv readers can read partial data into memory too.
565+
566+
567+
568+
Let's assume the following file is a huge csv file:
569+
570+
.. code-block:: python
571+
572+
>>> import datetime
573+
>>> import pyexcel as pe
574+
>>> data = [
575+
... [1, 21, 31],
576+
... [2, 22, 32],
577+
... [3, 23, 33],
578+
... [4, 24, 34],
579+
... [5, 25, 35],
580+
... [6, 26, 36]
581+
... ]
582+
>>> pe.save_as(array=data, dest_file_name="your_file.csv")
583+
584+
585+
And let's pretend to read partial data:
586+
587+
588+
.. code-block:: python
589+
590+
>>> pe.get_sheet(file_name="your_file.csv", start_row=2, row_limit=3)
591+
your_file.csv:
592+
+---+----+----+
593+
| 3 | 23 | 33 |
594+
+---+----+----+
595+
| 4 | 24 | 34 |
596+
+---+----+----+
597+
| 5 | 25 | 35 |
598+
+---+----+----+
599+
600+
And you could as well do the same for columns:
601+
602+
.. code-block:: python
603+
604+
>>> pe.get_sheet(file_name="your_file.csv", start_column=1, column_limit=2)
605+
your_file.csv:
606+
+----+----+
607+
| 21 | 31 |
608+
+----+----+
609+
| 22 | 32 |
610+
+----+----+
611+
| 23 | 33 |
612+
+----+----+
613+
| 24 | 34 |
614+
+----+----+
615+
| 10000 25 | 35 |
616+
+----+----+
617+
| 26 | 36 |
618+
+----+----+
619+
620+
Obvious, you could do both at the same time:
621+
622+
.. code-block:: python
623+
624+
>>> pe.get_sheet(file_name="your_file.csv",
625+
... start_row=2, row_limit=3,
626+
... start_column=1, column_limit=2)
627+
your_file.csv:
628+
+----+----+
629+
| 23 | 33 |
630+
+----+----+
631+
| 24 | 34 |
632+
+----+----+
633+
| 25 | 35 |
634+
+----+----+
635+
636+
637+
The pagination support is available across all pyexcel plugins.
638+
639+
.. note::
640+
641+
No column pagination support for query sets as data source.
642+
643+
644+
Formatting while transcoding a big data file
645+
--------------------------------------------------------------------------------
646+
647+
If you are transcoding a big data set, conventional formatting method would not
648+
help unless a on-demand free RAM is available. However, there is a way to minimize
649+
the memory footprint of pyexcel while the formatting is performed.
650+
651+
Let's continue from previous example. Suppose we want to transcode "your_file.csv"
652+
to "your_file.xls" but increase each element by 1.
653+
654+
What we can do is to define a row renderer function as the following:
655+
656+
>>> def increment_by_one(row):
657+
... for element in row:
658+
... yield element + 1
659+
660+
Then pass it onto save_as function using row_renderer:
661+
662+
>>> pe.isave_as(file_name="your_file.csv",
663+
... row_renderer=increment_by_one,
664+
... dest_file_name="your_file.xlsx")
665+
666+
667+
.. note::
668+
669+
If the data content is from a generator, isave_as has to be used.
670+
671+
We can verify if it was done correctly:
672+
673+
.. code-block:: python
674+
675+
>>> pe.get_sheet(file_name="your_file.xlsx")
676+
your_file.csv:
677+
+---+----+----+
678+
| 2 | 22 | 32 |
679+
+---+----+----+
680+
| 3 | 23 | 33 |
681+
+---+----+----+
682+
| 4 | 24 | 34 |
683+
+---+----+----+
684+
| 5 | 25 | 35 |
685+
+---+----+----+
686+
| 6 | 26 | 36 |
687+
+---+----+----+
688+
| 7 | 27 | 37 |
689+
+---+----+----+
690+
691+
528692
Stream APIs for big file : A set of two liners
529693
================================================================================
530694

@@ -829,170 +993,6 @@ Again let's verify what we have gotten:
829993
+-------+--------+----------+
830994
831995
832-
Hidden feature: partial read
833-
===============================================
834-
835-
Most pyexcel users do not know, but other library users were requesting `the similar features <https://github.com/jazzband/tablib/issues/467>`_
836-
837-
838-
When you are dealing with huge amount of data, e.g. 64GB, obviously you would not
839-
like to fill up your memory with those data. What you may want to do is, record
840-
data from Nth line, take M records and stop. And you only want to use your memory
841-
for the M records, not for beginning part nor for the tail part.
842-
843-
Hence partial read feature is developed to read partial data into memory for processing.
844-
You can paginate by row, by column and by both, hence you dictate what portion of the
845-
data to read back. But remember only row limit features help you save memory. Let's
846-
you use this feature to record data from Nth column, take M number of columns and skip
847-
the rest. You are not going to reduce your memory footprint.
848-
849-
Why did not I see above benefit?
850-
851-
This feature depends heavily on the implementation details.
852-
853-
`pyexcel-xls(xlrd)`, `pyexcel-xlsx(openpyxl)`, `pyexcel-ods(odfpy)` and `pyexcel-ods3(pyexcel-ezodf)`
854-
will read all data into memory. Because xls, xlsx and ods file are effective a zipped folder,
855-
all four will unzip the folder and read the content in xml format in **full**, so as to make sense
856-
of all details.
857-
858-
Hence, during the partial data is been returned, the memory
859-
consumption won't differ from reading the whole data back. Only after the partial
860-
data is returned, the memory comsumption curve shall jump the cliff. So pagination
861-
code here only limits the data returned to your program.
862-
863-
With that said, `pyexcel-xlsxr`, `pyexcel-odsr` and `pyexcel-htmlr` DOES read partial data into memory.
864-
Those three are implemented in such a way that they consume the xml(html) when needed. When they
865-
have read designated portion of the data, they stop, even if they are half way through.
866-
867-
In addition, pyexcel's csv readers can read partial data into memory too.
868-
869-
870-
871-
Let's assume the following file is a huge csv file:
872-
873-
.. code-block:: python
874-
875-
>>> import datetime
876-
>>> import pyexcel as pe
877-
>>> data = [
878-
... [1, 21, 31],
879-
... [2, 22, 32],
880-
... [3, 23, 33],
881-
... [4, 24, 34],
882-
... [5, 25, 35],
883-
... [6, 26, 36]
884-
... ]
885-
>>> pe.save_as(array=data, dest_file_name="your_file.csv")
886-
887-
888-
And let's pretend to read partial data:
889-
890-
891-
.. code-block:: python
892-
893-
>>> pe.get_sheet(file_name="your_file.csv", start_row=2, row_limit=3)
894-
your_file.csv:
895-
+---+----+----+
896-
| 3 | 23 | 33 |
897-
+---+----+----+
898-
| 4 | 24 | 34 |
899-
+---+----+----+
900-
| 5 | 25 | 35 |
901-
+---+----+----+
902-
903-
And you could as well do the same for columns:
904-
905-
.. code-block:: python
906-
907-
>>> pe.get_sheet(file_name="your_file.csv", start_column=1, column_limit=2)
908-
your_file.csv:
909-
+----+----+
910-
| 21 | 31 |
911-
+----+----+
912-
| 22 | 32 |
913-
+----+----+
914-
| 23 | 33 |
915-
+----+----+
916-
| 24 | 34 |
917-
+----+----+
918-
| 25 | 35 |
919-
+----+----+
920-
| 26 | 36 |
921-
+----+----+
922-
923-
Obvious, you could do both at the same time:
924-
925-
.. code-block:: python
926-
927-
>>> pe.get_sheet(file_name="your_file.csv",
928-
... start_row=2, row_limit=3,
929-
... start_column=1, column_limit=2)
930-
your_file.csv:
931-
+----+----+
932-
| 23 | 33 |
933-
+----+----+
934-
| 24 | 34 |
935-
+----+----+
936-
| 25 | 35 |
937-
+----+----+
938-
939-
940-
The pagination support is available across all pyexcel plugins.
941-
942-
.. note::
943-
944-
No column pagination support for query sets as data source.
945-
946-
947-
Formatting while transcoding a big data file
948-
--------------------------------------------------------------------------------
949-
950-
If you are transcoding a big data set, conventional formatting method would not
951-
help unless a on-demand free RAM is available. However, there is a way to minimize
952-
the memory footprint of pyexcel while the formatting is performed.
953-
954-
Let's continue from previous example. Suppose we want to transcode "your_file.csv"
955-
to "your_file.xls" but increase each element by 1.
956-
957-
What we can do is to define a row renderer function as the following:
958-
959-
>>> def increment_by_one(row):
960-
... for element in row:
961-
... yield element + 1
962-
963-
Then pass it onto save_as function using row_renderer:
964-
965-
>>> pe.isave_as(file_name="your_file.csv",
966-
... row_renderer=increment_by_one,
967-
... dest_file_name="your_file.xlsx")
968-
969-
970-
.. note::
971-
972-
If the data content is from a generator, isave_as has to be used.
973-
974-
We can verify if it was done correctly:
975-
976-
.. code-block:: python
977-
978-
>>> pe.get_sheet(file_name="your_file.xlsx")
979-
your_file.csv:
980-
+---+----+----+
981-
| 2 | 22 | 32 |
982-
+---+----+----+
983-
| 3 | 23 | 33 |
984-
+---+----+----+
985-
| 4 | 24 | 34 |
986-
+---+----+----+
987-
| 5 | 25 | 35 |
988-
+---+----+----+
989-
| 6 | 26 | 36 |
990-
+---+----+----+
991-
| 7 | 27 | 37 |
992-
+---+----+----+
993-
994-
995-
996996
Available Plugins
997997
=================
998998

docs/source/bigdata.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Why did not I see above benefit?
1919

2020
This feature depends heavily on the implementation details.
2121

22-
`pyexcel-xls(xlrd)`, `pyexcel-xlsx(openpyxl)`, `pyexcel-ods(odfpy)` and `pyexcel-ods3(pyexcel-ezodf)`
22+
`pyexcel-xls`_(xlrd), `pyexcel-xlsx`_(openpyxl), `pyexcel-ods`_(odfpy) and `pyexcel-ods3`_(pyexcel-ezodf)
2323
will read all data into memory. Because xls, xlsx and ods file are effective a zipped folder,
2424
all four will unzip the folder and read the content in xml format in **full**, so as to make sense
2525
of all details.
@@ -29,7 +29,7 @@ consumption won't differ from reading the whole data back. Only after the partia
2929
data is returned, the memory comsumption curve shall jump the cliff. So pagination
3030
code here only limits the data returned to your program.
3131

32-
With that said, `pyexcel-xlsxr`, `pyexcel-odsr` and `pyexcel-htmlr` DOES read partial data into memory.
32+
With that said, `pyexcel-xlsxr`_, `pyexcel-odsr`_ and `pyexcel-htmlr`_ DOES read partial data into memory.
3333
Those three are implemented in such a way that they consume the xml(html) when needed. When they
3434
have read designated portion of the data, they stop, even if they are half way through.
3535

0 commit comments

Comments
 (0)
0