Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR introduces Markdown reading capabilities to the data ingestion library through two new readers: MarkdownReader for native Markdown files and MarkItDownReader that leverages the external MarkItDown tool to convert various document formats to Markdown before parsing.
Key changes:
- Adds
MarkdownReaderfor parsing.mdfiles using the Markdig library - Adds
MarkItDownReaderthat wraps the MarkItDown CLI tool to convert documents (PDF, DOCX, etc.) to Markdown - Introduces shared
MarkdownParserto parse Markdig AST intoIngestionDocumentmodel - Implements comprehensive test suite with conformance tests and format-specific test cases
Reviewed Changes
Copilot reviewed 12 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
MarkdownReader.cs |
Implements reader for native Markdown files using Markdig parser |
MarkdownParser.cs |
Core parsing logic converting Markdig AST to IngestionDocument model |
MarkItDownReader.cs |
Wraps MarkItDown CLI tool to convert various document formats to Markdown |
Microsoft.Extensions.DataIngestion.Markdown.csproj |
Project file for MarkdownReader with Markdig dependency |
Microsoft.Extensions.DataIngestion.MarkItDown.csproj |
Project file for MarkItDownReader, shares MarkdownParser code |
DocumentReaderConformanceTests.cs |
Base test class defining conformance tests for document readers |
MarkdownReaderTests.cs |
Tests specific to MarkdownReader functionality |
MarkItDownReaderTests.cs |
Tests specific to MarkItDownReader with CLI availability checks |
ArrayUtils.cs |
Test utility for mapping 2D arrays used in table assertions |
Microsoft.Extensions.DataIngestion.Tests.csproj |
Updated project file adding references and test file configuration |
General.props |
Adds Markdig package reference |
Versions.props |
Specifies Markdig version 0.42.0 |
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/MarkdownParser.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/MarkItDownReader.cs
Outdated
Show resolved
Hide resolved
...ies/Microsoft.Extensions.DataIngestion.Tests/Microsoft.Extensions.DataIngestion.Tests.csproj
Outdated
Show resolved
Hide resolved
roji
reviewed
Oct 28, 2025
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/MarkdownParser.cs
Show resolved
Hide resolved
…sions.DataIngestion.Markdig
ericstj
reviewed
Oct 30, 2025
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Readers/MarkItDownReaderTests.cs
Outdated
Show resolved
Hide resolved
ericstj
reviewed
Oct 30, 2025
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/MarkItDownReader.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/MarkdownReader.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/MarkItDownReader.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/MarkdownParser.cs
Show resolved
Hide resolved
ericstj
reviewed
Oct 30, 2025
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/MarkItDownReader.cs
Outdated
Show resolved
Hide resolved
cincuranet
reviewed
Oct 30, 2025
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/MarkdownParser.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/MarkdownParser.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/MarkdownParser.cs
Show resolved
Hide resolved
# Conflicts: # test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Microsoft.Extensions.DataIngestion.Tests.csproj
- delete temporary file when .CopyToAsync fails - handle all image types
…rectory to a "safe location"
This was referenced Nov 26, 2025
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Microsoft Reviewers: Open in CodeFlow