8000 Resolve numerous typos (#280) · ajaycode/unstructured@9062d25 · GitHub
[go: up one dir, main page]

Skip to content

Commit 9062d25

Browse files
authored
Resolve numerous typos (Unstructured-IO#280)
* Resolve numerous typos * Resolve typo in mime type
1 parent 956f04d commit 9062d25

File tree

19 files changed

+27
-27
lines changed

19 files changed

+27
-27
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ The output will look the same as the example from the document parsing section a
228228
### E-mail Parsing
229229

230230
The `partition_email` function within `unstructured` is helpful for parsing `.eml` files. Common
231-
e-mail clients such as Microsoft Outlook and Gmail support exproting e-mails as `.eml` files.
231+
e-mail clients such as Microsoft Outlook and Gmail support exporting e-mails as `.eml` files.
232232
`partition_email` accepts filenames, file-like object, and raw text as input. The following
233233
three snippets for parsing `.eml` files are equivalent:
234234

docs/source/bricks.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ titles, narrative text, and tables.
2020
The ``partition`` brick is the simplest way to partition a document in ``unstructured``.
2121
If you call the ``partition`` function, ``unstructured`` will attempt to detect the
2222
file type and route it to the appropriate partitioning brick. All partitioning bricks
23-
called within ``partition`` are called using the defualt kwargs. Use the document-type
23+
called within ``partition`` are called using the default kwargs. Use the document-type
2424
specific bricks if you need to apply non-default settings.
2525
``partition`` currently supports ``.docx``, ``.doc``, ``.pptx``, ``.ppt``, ``.eml``, ``.html``, ``.pdf``,
2626
``.png``, ``.jpg``, and ``.txt`` files.
@@ -539,7 +539,7 @@ Examples:
539539
``clean_ordered_bullets``
540540
-------------------------
541541

542-
Remove alpha-numeric bullets from the beginning of text up to three “sub-section” levels.
542+
Remove alphanumeric bullets from the beginning of text up to three “sub-section” levels.
543543

544544
Examples:
545545

@@ -687,7 +687,7 @@ Extracts text that occurs before the specified pattern.
687687

688688
Options:
689689

690-
* If ``index`` is set, extract before the ``(index + 1)``th occurence of the pattern. The default is ``0``.
690+
* If ``index`` is set, extract before the ``(index + 1)``th occurrence of the pattern. The default is ``0``.
691691
* Strips leading whitespace if ``strip`` is set to ``True``. The default is ``True``.
692692

693693

@@ -710,7 +710,7 @@ Extracts text that occurs after the specified pattern.
710710

711711
Options:
712712

713-
* If ``index`` is set, extract after the ``(index + 1)``th occurence of the pattern. The default is ``0``.
713+
* If ``index`` is set, extract after the ``(index + 1)``th occurrence of the pattern. The default is ``0``.
714714
* Strips trailing whitespace if ``strip`` is set to ``True``. The default is ``True``.
715715

716716

@@ -834,7 +834,7 @@ Examples:
834834
``extract_ordered_bullets``
835835
---------------------------
836836

837-
Extracts alpha-numeric bullets from the beginning of text up to three “sub-section” levels.
837+
Extracts alphanumeric bullets from the beginning of text up to three “sub-section” levels.
838838

839839
Examples:
840840

docs/source/elements.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Elements
22
--------
33

44
The following are the structured page elements that are available within the ``unstructured``
5-
package. Partioning bricks convert raw documents to this common set of elements. If you need
5+
package. Partitioning bricks convert raw documents to this common set of elements. If you need
66
a custom element, the recommended approach is to create a sub-class of one of the default
77
elements.
88

examples/argilla-summarization/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ complete a data science project in hours that previously would have taken weeks.
88
To get started, use the following steps:
99

1010
- Ensure you have Python 3.8 or higher installed on your system
11-
- Create a new Python virtual enviornment
11+
- Create a new Python virtual environment
1212
- Run `pip install -r requirements.txt` to install the dependencies
1313
- Run `PYTHONPATH=. jupyter notebook` from this directory to launch the notebook
1414

examples/sec-sentiment-analysis/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ and several bricks from the `unstructured` library to train a sentiment analysis
55
risk factors section of S-1 filings. To get started, use the following steps:
66

77
- Ensure you have Python 3.8 or higher installed on your system
8-
- Create a new Python virtual enviornment
8+
- Create a new Python virtual environment
99
- Run `pip install -r requirements.txt` to install the dependencies
1010
- Run `PYTHONPATH=. jupyter notebook` from this directory to launch the notebook
1111

examples/sec-sentiment-analysis/fetch.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ def get_form_by_ticker(
125125

126126

127127
def _form_types(form_type: str, allow_amended_filing: Optional[bool] = True):
128-
"""Potentialy expand to include amended filing, e.g.:
128+
"""Potentially expand to include amended filing, e.g.:
129129
"10-Q" -> "10-Q/A"
130130
"""
131131
assert form_type in VALID_FILING_TYPES
@@ -144,7 +144,7 @@ def get_form_by_cik(
144144
) -> str:
145< 67DE code>145
"""For a given CIK, returns the most recent form of a given form_type. By default
146146
an amended version of the form_type may be retrieved (allow_amended_filing=True).
147-
E.g., if form_type is "10-Q", the retrived form could be a 10-Q or 10-Q/A.
147+
E.g., if form_type is "10-Q", the retrieved form could be a 10-Q or 10-Q/A.
148148
"""
149149
session = _get_session(company, email)
150150
acc_num, _ = _get_recent_acc_num_by_cik(

examples/training/0-Core Concepts.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@
187187
" - `Image`\n",
188188
" - `PageBreak`\n",
189189
" \n",
190-
"Other element types that we will add in the future include tables and figures. Different partioning functions use different methods for determining the element type and extracting the associated content. Document elements have a `str` representation. You can print them using the snippet below."
190+
"Other element types that we will add in the future include tables and figures. Different partitioning functions use different methods for determining the element type and extracting the associated content. Document elements have a `str` representation. You A3E2 can print them using the snippet below."
191191
]
192192
},
193193
{

examples/training/1-Intro to Bricks.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@
143143
"id": "e3a8e7f4",
144144
"metadata": {},
145145
"source": [
146-
"The `unstructured` library also includes partitioning bricks targeted at specific document types. The `partition` brick uses these document-specific partitioning bricks under the hood. There are a few reasons you may want to use a document-specific partioning brick instead of `partition`:\n",
146+
"The `unstructured` library also includes partitioning bricks targeted at specific document types. The `partition` brick uses these document-specific partitioning bricks under the hood. There are a few reasons you may want to use a document-specific partitioning brick instead of `partition`:\n",
147147
"\n",
148148
"1. If you already know the document type, filetype detection is unnecessary. Using the document-specific brick directly will make your program run faster.\n",
149149
"2. Fewer dependencies. You don't need to install `libmagic` for filetype detection if you're only using document-specific bricks.\n",
@@ -312,7 +312,7 @@
312312
"id": "358e149b",
313313
"metadata": {},
314314
"source": [
315-
"Since a cleaning brick is just a `str -> str` function, users can also easily include their own cleaning bricks for custom data preparation tasks. In the example below, we partition a Russian offensive campaign assessment from the institute of the study of war and remove citations, which are not natural language text that we want to inclue for model training purposes."
315+
"Since a cleaning brick is just a `str -> str` function, users can also easily include their own cleaning bricks for custom data preparation tasks. In the example below, we partition a Russian offensive campaign assessment from the institute of the study of war and remove citations, which are not natural language text that we want to include for model training purposes."
316316
]
317317
},
318318
{

examples/training/2-File Exploration.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"source": [
88
"# File Exploration\n",
99
"\n",
10-
"In addition to core document processing capabilities, the `unstructured` library includes utilities for summarizing information about raw doucments. We will cover how to use these utilities in this notebook. At the conclusion of this notebook, you should understand:\n",
10+
"In addition to core document processing capabilities, the `unstructured` library includes utilities for summarizing information about raw documents. We will cover how to use these utilities in this notebook. At the conclusion of this notebook, you should understand:\n",
1111
"\n",
1212
"- [Filetype detection in `unstructured`](#filetype)\n",
1313
"- [How to generate summary statistics about documents](#summary)"

requirements/test.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,5 +15,5 @@ types-requests
1515
vcrpy
1616

1717
# NOTE(robinson) - The following pins are to address
18-
# vulernabilities in dependency scans
18+
# vulnerabilities in dependency scans
1919
certifi>=2022.12.07

0 commit comments

Comments
 (0)
0