8000 [RFC] Add RFC 3986 and WHATWG compliant URL parsing support by kocsismate · Pull Request #14461 · php/php-src · GitHub
[go: up one dir, main page]

Skip to content

[RFC] Add RFC 3986 and WHATWG compliant URL parsing support #14461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
c43c3a4
Create separate lexbor extension
kocsismate May 11, 2025
027ad79
Update EXTENSIONS
kocsismate May 13, 2025
8dd5894
Trying to fix Windows builds
kocsismate May 13, 2025
9fb1d4c
Upgrade Lexbor
nielsdos May 16, 2025
616b23a
Fix Windows linkage
nielsdos May 16, 2025
dddc830
Small improvements for the patch readme
kocsismate May 17, 2025
5ca89c5
Expose version information for Lexbor
kocsismate May 17, 2025
40bc6a8
Import warning fixes
nielsdos May 17, 2025
6a714bd
Add RFC 3986 and WHATWG compliant URL parsing support
kocsismate Jun 3, 2024
893668d
Serialization
kocsismate Oct 22, 2024
c84f428
Improve error handling
kocsismate Oct 26, 2024
9e413ab
Lot of fixes and added support for equalsTo()
kocsismate Nov 11, 2024
2c27614
Add normalization support
kocsismate Nov 13, 2024
f626eda
SOAP test fixes
kocsismate Nov 13, 2024
734139b
Fix some memory leaks
kocsismate Nov 13, 2024
2ab4813
Some cleanups
kocsismate Nov 18, 2024
2dca643
Changes based on discussion
kocsismate Nov 30, 2024
6e9f731
Removal of Uri\Uri
kocsismate Dec 30, 2024
451bfa4
A lot of fixes and API changes
kocsismate Jan 6, 2025
91af1d8
Updates
kocsismate Feb 5, 2025
03bd9f5
Add new tests, path fixes
kocsismate Feb 9, 2025
6ed4b2a
Add more tests for verifying the behavior of withers
kocsismate Feb 15, 2025
815098e
Fix code review comments
kocsismate Feb 19, 2025
da3092a
A few fixes and improvements after feedback
kocsismate Apr 14, 2025
14d9fe1
Test fixes
kocsismate Apr 14, 2025
591f21a
Remove WHATWG non-raw getters
kocsismate Apr 18, 2025
265c130
Rename WHATWG getters again
kocsismate Apr 26, 2025
ebf154c
Add UriComparisonMode
kocsismate Apr 28, 2025
31df6c2
Expose $softErrors for Uri\WhatWg\Url::resolve()
kocsismate Apr 30, 2025
172e29d
Add SensitiveParameter support
kocsismate May 3, 2025
b686f4e
Proper build support
kocsismate May 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 5 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,11 @@
- any-glob-to-any-file:
- ext/tokenizer/**/*

"Extension: uri":
- changed-files:
- any-glob-to-any-file:
- ext/uri/**/*

"Extension: xml":
- changed-files:
- any-glob-to-any-file:
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,6 @@ tmp-php.ini
!/ext/fileinfo/libmagic/config.h
!/ext/fileinfo/libmagic.patch
!/ext/fileinfo/magicdata.patch
!/ext/dom/lexbor/patches/*.patch
!/ext/lexbor/patches/*.patch
!/ext/pcre/pcre2lib/config.h
!/win32/build/Makefile
13 changes: 13 additions & 0 deletions EXTENSIONS
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,13 @@ PRIMARY MAINTAINER: Thies C. Arntzen <thies@thieso.net> (1999 - 2002)
MAINTENANCE: Maintained
STATUS: Working
-------------------------------------------------------------------------------
EXTENSION: lexbor
PRIMARY MAINTAINER: Niels Dossche <nielsdos@php.net> (2023 - 2025)
Mate Kocsis <kocsismate@php.net> (2025 - 2025)
MAINTENANCE: Maintained
STATUS: Working
SINCE: 8.5
-------------------------------------------------------------------------------
EXTENSION: libxml
PRIMARY MAINTAINER: Rob Richards <rrichards@php.net> (2003 - 2009)
Christian Stocker <chregu@php.net> (2004 - 2011)
Expand Down Expand Up @@ -496,6 +503,12 @@ PRIMARY MAINTAINER: Andrei Zmievski <andrei@php.net> (2002 - 2002)
MAINTENANCE: Maintained
STATUS: Working
-------------------------------------------------------------------------------
EXTENSION: uri
PRIMARY MAINTAINER Máté Kocsis <kocsismate@php.net> (2025 - 2025)
MAINTENANCE: Maintained
STATUS: Working
SINCE: 8.5.0
-------------------------------------------------------------------------------
EXTENSION: zip
PRIMARY MAINTAINER: Pierre-Alain Joye <pajoye@php.net> (2006 - 2011)
Remi Collet <remi@php.net> (2013-2020)
Expand Down
3 changes: 3 additions & 0 deletions Zend/zend_string.h
Original file line number Diff line number Diff line change
Expand Up @@ -596,8 +596,11 @@ EMPTY_SWITCH_DEFAULT_CASE()
_(ZEND_STR_SCHEME, "scheme") \
_(ZEND_STR_HOST, "host") \
_(ZEND_STR_PORT, "port") \
_(ZEND_STR_USERINFO, "userinfo") \
_(ZEND_STR_USER, "user") \
_(ZEND_STR_USERNAME, "username") \
_(ZEND_STR_PASS, "pass") \
_(ZEND_STR_PASSWORD, "password") \
_(ZEND_STR_PATH, "path") \
_(ZEND_STR_QUERY, "query") \
_(ZEND_STR_FRAGMENT, "fragment") \
Expand Down
198 changes: 4 additions & 194 deletions ext/dom/config.m4
Original file line number Diff line number Diff line change
Expand Up @@ -8,185 +8,6 @@ if test "$PHP_DOM" != "no"; then
PHP_SETUP_LIBXML([DOM_SHARED_LIBADD], [
AC_DEFINE([HAVE_DOM], [1],
[Define to 1 if the PHP extension 'dom' is available.])
PHP_LEXBOR_CFLAGS="-I@ext_srcdir@/lexbor -DLEXBOR_STATIC"
LEXBOR_DIR="lexbor/lexbor"
LEXBOR_SOURCES=m4_normalize(["
$LEXBOR_DIR/core/array_obj.c
$LEXBOR_DIR/core/array.c
$LEXBOR_DIR/core/avl.c
$LEXBOR_DIR/core/bst.c
$LEXBOR_DIR/core/conv.c
$LEXBOR_DIR/core/diyfp.c
$LEXBOR_DIR/core/dobject.c
$LEXBOR_DIR/core/dtoa.c
$LEXBOR_DIR/core/hash.c
$LEXBOR_DIR/core/mem.c
$LEXBOR_DIR/core/mraw.c
$LEXBOR_DIR/core/print.c
$LEXBOR_DIR/core/serialize.c
$LEXBOR_DIR/core/shs.c
$LEXBOR_DIR/core/str.c
$LEXBOR_DIR/core/strtod.c
$LEXBOR_DIR/css/css.c
$LEXBOR_DIR/css/log.c
$LEXBOR_DIR/css/parser.c
$LEXBOR_DIR/css/selectors/pseudo_state.c
$LEXBOR_DIR/css/selectors/pseudo.c
$LEXBOR_DIR/css/selectors/selector.c
$LEXBOR_DIR/css/selectors/selectors.c
$LEXBOR_DIR/css/selectors/state.c
$LEXBOR_DIR/css/state.c
$LEXBOR_DIR/css/syntax/anb.c
$LEXBOR_DIR/css/syntax/parser.c
$LEXBOR_DIR/css/syntax/state.c
$LEXBOR_DIR/css/syntax/syntax.c
$LEXBOR_DIR/css/syntax/token.c
$LEXBOR_DIR/css/syntax/tokenizer.c
$LEXBOR_DIR/css/syntax/tokenizer/error.c
$LEXBOR_DIR/dom/interface.c
$LEXBOR_DIR/dom/interfaces/attr.c
$LEXBOR_DIR/dom/interfaces/cdata_section.c
$LEXBOR_DIR/dom/interfaces/character_data.c
$LEXBOR_DIR/dom/interfaces/comment.c
$LEXBOR_DIR/dom/interfaces/document_fragment.c
$LEXBOR_DIR/dom/interfaces/document_type.c
$LEXBOR_DIR/dom/interfaces/document.c
$LEXBOR_DIR/dom/interfaces/element.c
$LEXBOR_DIR/dom/interfaces/node.c
$LEXBOR_DIR/dom/interfaces/processing_instruction.c
$LEXBOR_DIR/dom/interfaces/shadow_root.c
$LEXBOR_DIR/dom/interfaces/text.c
$LEXBOR_DIR/encoding/big5.c
$LEXBOR_DIR/encoding/decode.c
$LEXBOR_DIR/encoding/encode.c
$LEXBOR_DIR/encoding/encoding.c
$LEXBOR_DIR/encoding/euc_kr.c
$LEXBOR_DIR/encoding/gb18030.c
$LEXBOR_DIR/encoding/iso_2022_jp_katakana.c
$LEXBOR_DIR/encoding/jis0208.c
$LEXBOR_DIR/encoding/jis0212.c
$LEXBOR_DIR/encoding/range.c
$LEXBOR_DIR/encoding/res.c
$LEXBOR_DIR/encoding/single.c
$LEXBOR_DIR/html/encoding.c
$LEXBOR_DIR/html/interface.c
$LEXBOR_DIR/html/interfaces/anchor_element.c
$LEXBOR_DIR/html/interfaces/area_element.c
$LEXBOR_DIR/html/interfaces/audio_element.c
$LEXBOR_DIR/html/interfaces/base_element.c
$LEXBOR_DIR/html/interfaces/body_element.c
$LEXBOR_DIR/html/interfaces/br_element.c
$LEXBOR_DIR/html/interfaces/button_element.c
$LEXBOR_DIR/html/interfaces/canvas_element.c
$LEXBOR_DIR/html/interfaces/d_list_element.c
$LEXBOR_DIR/html/interfaces/data_element.c
$LEXBOR_DIR/html/interfaces/data_list_element.c
$LEXBOR_DIR/html/interfaces/details_element.c
$LEXBOR_DIR/html/interfaces/dialog_element.c
$LEXBOR_DIR/html/interfaces/directory_element.c
$LEXBOR_DIR/html/interfaces/div_element.c
$LEXBOR_DIR/html/interfaces/document.c
$LEXBOR_DIR/html/interfaces/element.c
$LEXBOR_DIR/html/interfaces/embed_element.c
$LEXBOR_DIR/html/interfaces/field_set_element.c
$LEXBOR_DIR/html/interfaces/font_element.c
$LEXBOR_DIR/html/interfaces/form_element.c
$LEXBOR_DIR/html/interfaces/frame_element.c
$LEXBOR_DIR/html/interfaces/frame_set_element.c
$LEXBOR_DIR/html/interfaces/head_element.c
$LEXBOR_DIR/html/interfaces/heading_element.c
$LEXBOR_DIR/html/interfaces/hr_element.c
$LEXBOR_DIR/html/interfaces/html_element.c
$LEXBOR_DIR/html/interfaces/iframe_element.c
$LEXBOR_DIR/html/interfaces/image_element.c
$LEXBOR_DIR/html/interfaces/input_element.c
$LEXBOR_DIR/html/interfaces/label_element.c
$LEXBOR_DIR/html/interfaces/legend_element.c
$LEXBOR_DIR/html/interfaces/li_element.c
$LEXBOR_DIR/html/interfaces/link_element.c
$LEXBOR_DIR/html/interfaces/map_element.c
$LEXBOR_DIR/html/interfaces/marquee_element.c
$LEXBOR_DIR/html/interfaces/media_element.c
$LEXBOR_DIR/html/interfaces/menu_element.c
$LEXBOR_DIR/html/interfaces/meta_element.c
$LEXBOR_DIR/html/interfaces/meter_element.c
$LEXBOR_DIR/html/interfaces/mod_element.c
$LEXBOR_DIR/html/interfaces/o_list_element.c
$LEXBOR_DIR/html/interfaces/object_element.c
$LEXBOR_DIR/html/interfaces/opt_group_element.c
$LEXBOR_DIR/html/interfaces/option_element.c
$LEXBOR_DIR/html/interfaces/output_element.c
$LEXBOR_DIR/html/interfaces/paragraph_element.c
$LEXBOR_DIR/html/interfaces/param_element.c
$LEXBOR_DIR/html/interfaces/picture_element.c
$LEXBOR_DIR/html/interfaces/pre_element.c
$LEXBOR_DIR/html/interfaces/progress_element.c
$LEXBOR_DIR/html/interfaces/quote_element.c
$LEXBOR_DIR/html/interfaces/script_element.c
$LEXBOR_DIR/html/interfaces/select_element.c
$LEXBOR_DIR/html/interfaces/slot_element.c
$LEXBOR_DIR/html/interfaces/source_element.c
$LEXBOR_DIR/html/interfaces/span_element.c
$LEXBOR_DIR/html/interfaces/style_element.c
$LEXBOR_DIR/html/interfaces/table_caption_element.c
$LEXBOR_DIR/html/interfaces/table_cell_element.c
$LEXBOR_DIR/html/interfaces/table_col_element.c
$LEXBOR_DIR/html/interfaces/table_element.c
$LEXBOR_DIR/html/interfaces/table_row_element.c
$LEXBOR_DIR/html/interfaces/table_section_element.c
$LEXBOR_DIR/html/interfaces/template_element.c
$LEXBOR_DIR/html/interfaces/text_area_element.c
$LEXBOR_DIR/html/interfaces/time_element.c
$LEXBOR_DIR/html/interfaces/title_element.c
$LEXBOR_DIR/html/interfaces/track_element.c
$LEXBOR_DIR/html/interfaces/u_list_element.c
$LEXBOR_DIR/html/interfaces/unknown_element.c
$LEXBOR_DIR/html/interfaces/video_element.c
$LEXBOR_DIR/html/interfaces/window.c
$LEXBOR_DIR/html/parser.c
$LEXBOR_DIR/html/token_attr.c
$LEXBOR_DIR/html/token.c
$LEXBOR_DIR/html/tokenizer.c
$LEXBOR_DIR/html/tokenizer/error.c
$LEXBOR_DIR/html/tokenizer/state_comment.c
$LEXBOR_DIR/html/tokenizer/state_doctype.c
$LEXBOR_DIR/html/tokenizer/state_rawtext.c
$LEXBOR_DIR/html/tokenizer/state_rcdata.c
$LEXBOR_DIR/html/tokenizer/state_script.c
$LEXBOR_DIR/html/tokenizer/state.c
$LEXBOR_DIR/html/tree.c
$LEXBOR_DIR/html/tree/active_formatting.c
$LEXBOR_DIR/html/tree/error.c
$LEXBOR_DIR/html/tree/insertion_mode/after_after_body.c
$LEXBOR_DIR/html/tree/insertion_mode/after_after_frameset.c
$LEXBOR_DIR/html/tree/insertion_mode/after_body.c
$LEXBOR_DIR/html/tree/insertion_mode/after_frameset.c
$LEXBOR_DIR/html/tree/insertion_mode/after_head.c
$LEXBOR_DIR/html/tree/insertion_mode/before_head.c
$LEXBOR_DIR/html/tree/insertion_mode/before_html.c
$LEXBOR_DIR/html/tree/insertion_mode/foreign_content.c
$LEXBOR_DIR/html/tree/insertion_mode/in_body.c
$LEXBOR_DIR/html/tree/insertion_mode/in_caption.c
$LEXBOR_DIR/html/tree/insertion_mode/in_cell.c
$LEXBOR_DIR/html/tree/insertion_mode/in_column_group.c
$LEXBOR_DIR/html/tree/insertion_mode/in_frameset.c
$LEXBOR_DIR/html/tree/insertion_mode/in_head_noscript.c
$LEXBOR_DIR/html/tree/insertion_mode/in_head.c
$LEXBOR_DIR/html/tree/insertion_mode/in_row.c
$LEXBOR_DIR/html/tree/insertion_mode/in_select_in_table.c
$LEXBOR_DIR/html/tree/insertion_mode/in_select.c
$LEXBOR_DIR/html/tree/insertion_mode/in_table_body.c
$LEXBOR_DIR/html/tree/insertion_mode/in_table_text.c
$LEXBOR_DIR/html/tree/insertion_mode/in_table.c
$LEXBOR_DIR/html/tree/insertion_mode/in_template.c
$LEXBOR_DIR/html/tree/insertion_mode/initial.c
$LEXBOR_DIR/html/tree/insertion_mode/text.c
$LEXBOR_DIR/html/tree/open_elements.c
$LEXBOR_DIR/ns/ns.c
$LEXBOR_DIR/ports/posix/lexbor/core/memory.c
$LEXBOR_DIR/selectors-adapted/selectors.c
$LEXBOR_DIR/tag/tag.c
"])
PHP_NEW_EXTENSION([dom], m4_normalize([
attr.c
cdatasection.c
Expand Down Expand Up @@ -223,25 +44,13 @@ if test "$PHP_DOM" != "no"; then
xml_serializer.c
xpath_callbacks.c
xpath.c
$LEXBOR_SOURCES
lexbor/selectors-adapted/selectors.c
]),
[$ext_shared],,
[$PHP_LEXBOR_CFLAGS])
[])
PHP_ADD_BUILD_DIR([
$ext_builddir/parentnode
$ext_builddir/$LEXBOR_DIR/core
$ext_builddir/$LEXBOR_DIR/css/selectors
$ext_builddir/$LEXBOR_DIR/css/syntax/tokenizer
$ext_builddir/$LEXBOR_DIR/css/tokenizer
$ext_builddir/$LEXBOR_DIR/dom/interfaces
$ext_builddir/$LEXBOR_DIR/encoding
$ext_builddir/$LEXBOR_DIR/html/interfaces
$ext_builddir/$LEXBOR_DIR/html/tokenizer
$ext_builddir/$LEXBOR_DIR/html/tree/insertion_mode
$ext_builddir/$LEXBOR_DIR/ns
$ext_builddir/$LEXBOR_DIR/ports/posix/lexbor/core
$ext_builddir/$LEXBOR_DIR/selectors-adapted
$ext_builddir/$LEXBOR_DIR/tag
$ext_builddir/lexbor/selectors-adapted
])
PHP_SUBST([DOM_SHARED_LIBADD])
PHP_INSTALL_HEADERS([ext/dom], m4_normalize([
Expand All @@ -251,5 +60,6 @@ if test "$PHP_DOM" != "no"; then
xpath_callbacks.h
]))
PHP_ADD_EXTENSION_DEP(dom, libxml)
PHP_ADD_EXTENSION_DEP(dom, lexbor)
])
fi
26 changes: 6 additions & 20 deletions ext/dom/config.w32
Original file line number Diff line number Diff line change
Expand Up @@ -16,27 +16,12 @@ if (PHP_DOM == "yes") {
entityreference.c \
token_list.c \
notation.c xpath.c dom_iterators.c \
namednodemap.c xpath_callbacks.c", null, "-Iext/dom/lexbor");
namednodemap.c xpath_callbacks.c", null, "/I ext/lexbor");

ADD_EXTENSION_DEP('dom', 'lexbor');

ADD_SOURCES("ext/dom/parentnode", "tree.c css_selectors.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/ports/windows_nt/lexbor/core", "memory.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/core", "array_obj.c array.c avl.c bst.c diyfp.c conv.c dobject.c dtoa.c hash.c mem.c mraw.c print.c serialize.c shs.c str.c strtod.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/dom", "interface.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/dom/interfaces", "attr.c cdata_section.c character_data.c comment.c document.c document_fragment.c document_type.c element.c node.c processing_instruction.c shadow_root.c text.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/html/tokenizer", "error.c state_comment.c state_doctype.c state_rawtext.c state_rcdata.c state_script.c state.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/html/tree", "active_formatting.c open_elements.c error.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/html/tree/insertion_mode", "after_after_body.c after_after_frameset.c after_body.c after_frameset.c after_head.c before_head.c before_html.c foreign_content.c in_body.c in_caption.c in_cell.c in_column_group.c in_frameset.c in_head.c in_head_noscript.c initial.c in_row.c in_select.c in_select_in_table.c in_table_body.c in_table.c in_table_text.c in_template.c text.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/html", "encoding.c interface.c parser.c token.c token_attr.c tokenizer.c tree.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/encoding", "big5.c decode.c encode.c encoding.c euc_kr.c gb18030.c iso_2022_jp_katakana.c jis0208.c jis0212.c range.c res.c single.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/html/interfaces", "anchor_element.c area_element.c audio_element.c base_element.c body_element.c br_element.c button_element.c canvas_element.c data_element.c data_list_element.c details_element.c dialog_element.c directory_element.c div_element.c d_list_element.c document.c element.c embed_element.c field_set_element.c font_element.c form_element.c frame_element.c frame_set_element.c head_element.c heading_element.c hr_element.c html_element.c iframe_element.c image_element.c input_element.c label_element.c legend_element.c li_element.c link_element.c map_element.c marquee_element.c media_element.c menu_element.c meta_element.c meter_element.c mod_element.c object_element.c o_list_element.c opt_group_element.c option_element.c output_element.c paragraph_element.c param_element.c picture_element.c pre_element.c progress_element.c quote_element.c script_element.c select_element.c slot_element.c source_element.c span_element.c style_element.c table_caption_element.c table_cell_element.c table_col_element.c table_element.c table_row_element.c table_section_element.c template_element.c text_area_element.c time_element.c title_element.c track_element.c u_list_element.c unknown_element.c video_element.c window.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/selectors-adapted", "selectors.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/css", "state.c log.c parser.c css.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/css/selectors", "state.c selectors.c selector.c pseudo_state.c pseudo.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/css/syntax", "state.c parser.c syntax.c anb.c tokenizer.c token.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/css/syntax/tokenizer", "error.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/ns", "ns.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/tag", "tag.c", "dom");
ADD_FLAG("CFLAGS_DOM", "/D LEXBOR_STATIC ");
ADD_SOURCES("ext/dom/lexbor/selectors-adapted", "selectors.c", "dom");

AC_DEFINE("HAVE_DOM", 1, "Define to 1 if the PHP extension 'dom' is available.");

Expand All @@ -51,7 +36,8 @@ if (PHP_DOM == "yes") {
"dom_ce.h " +
"namespace_compat.h " +
"xml_common.h " +
"xpath_callbacks.h "
"xpath_callbacks.h " +
"lexbor/selectors-adapted/selectors.h "
);
} else {
WARNING("dom support can't be enabled, libxml is not enabled")
Expand Down
Loading
Loading
0