Difool
Welcome to Wikidata, Difool!
Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!
Need some help getting started? Here are some pages you can familiarize yourself with:
- Introduction – An introduction to the project.
- Wikidata tours – Interactive tutorials to show you how Wikidata works.
- Community portal – The portal for community members.
- User options – including the 'Babel' extension, to set your language preferences.
- Contents – The main help page for editing and using the site.
- Project chat – Discussions about the project.
- Tools – A collection of user-developed tools to allow for easier completion of some tasks.
Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.
If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.
Best regards! --Epìdosis 07:19, 11 July 2023 (UTC)
Hi Difool, with your edit to the Bot permission request page [1], you removed some other requests from the list (including mine; see comparison [2]). It would be nice if you could revert/repair your edit :) Thanks! Parnswir (talk) 06:40, 24 October 2023 (UTC)
- Hi Parnswir, sorry about that. I've corrected it now. Difool (talk) 07:31, 24 October 2023 (UTC)
- Awesome, thanks! Parnswir (talk) 09:10, 24 October 2023 (UTC)
Date of death removal
editWhy did you remove this date of death? According to Wikidata:Requests for permissions/Bot/DifoolBot your bot should only work on references, not remove statements. Looking at what you put in Q13564452#P570: Don't ever use unknown value Help for date of birth unless you're working around the year 0. Instead put in a date with lower precision. See Help:Dates for more info. Multichill (talk) 21:10, 13 November 2023 (UTC)
- The removal of the statement falls within "Remove ECARTICO references from outdated statements, or, if it's the only reference, remove the complete statement". I'll read up about your unknown value comment. Difool (talk) 02:25, 14 November 2023 (UTC)
- Could you look at Q19911735, what date/precision would you put in the death date? I did it this way: Ecartico lists a birth year of "circa 1627" and a death year "after 1678". I added 100 years to the earliest listed year (1627) so the death year should be between 1678 and 1727. Because those years are not in the same century I used precision 6, millennium. Difool (talk) 02:43, 15 November 2023 (UTC)
Please have a look on reverted edits
editI guess, 0.05% of the automated edits are wrong, so please have a look on reverted edits, it makes no sense to add a statement four times again, once it was reverted. See Q65558479. Florentyna (talk) 14:10, 27 December 2023 (UTC)
- Hi @Florentyna, sorry about that. I've stopped the bot and will alter the code so this won't happen again. In the case of Andreas Springer (Q65558479), I think the value of J9U ID (P8189) is wrong; I would set the claim to 'deprecated rank' with qualifier reason for deprecated rank (P2241) to refers to different subject (Q28091153). Difool (talk) 15:38, 27 December 2023 (UTC)
- For Andreas Springer (Q65558479) it is only a guessed birth year by me. Andreas Springer it is a very common German name. For such "common" cases, one hase to be very carefully. Elsewise your additions are quite good, no need to stop the bot, the error rate is in the range of a normal human being, because you are adding to quite unique names. --Florentyna (talk) 15:44, 27 December 2023 (UTC)
- @Florentyna - the bot doesn't search in VIAF for a name or birth year; if you have a J9U ID (P8189), you can ask VIAF what the corresponding VIAF ID (P214) is.
- For Andreas Springer, you can do this with this query: https://viaf.org/viaf/sourceID/J9U|987007374268105171/justlinks.json (987007374268105171 is the J9U ID (P8189) on the Andreas Springer page) Difool (talk) 16:10, 27 December 2023 (UTC)
- For Andreas Springer (Q65558479) it is only a guessed birth year by me. Andreas Springer it is a very common German name. For such "common" cases, one hase to be very carefully. Elsewise your additions are quite good, no need to stop the bot, the error rate is in the range of a normal human being, because you are adding to quite unique names. --Florentyna (talk) 15:44, 27 December 2023 (UTC)
Like I already told, for common names I have seen already hundreds of wrong additions. Probably someone before you made a KI addition, that this is the real one Andreas Springer. Usually, I try to identify the primary source, but often the secondary ones (like you) seem to be the bad ones. --Florentyna (talk) 16:16, 27 December 2023 (UTC)
- @Florentyna - Okay, I've added an ignore list and added Andreas Springer (Q65558479) to it and started the bot again. Luckily, your edit was the only reversal, but I'll keep an eye on the reversal notifications in the future.
- I've also added the other Andreas Springer - indeed with a same birth year of 1966 - at Andreas Springer (Q124030875) - Difool (talk) 02:56, 28 December 2023 (UTC)
Thanks and happy new year!
editHi! I would just like to thank you for the last section you added to User:Difool/viaf already somewhere, it will be very useful for solving hundreds of duplications and conflations. I hope to have more time to deal with them from the second half of this month. And, of course, happy 2024! See you soon, --Epìdosis 14:00, 4 January 2024 (UTC)
- Hi, @Epìdosis. Glad you like it, and a happy 2024 to you too! I'm going to add a 'score' column too, with the score based on year of birth and death, and country of birth and death. It still we be a lot of work, I'm sure; I'll just do some when I have the time. I'll let the bot do Union List of Artist Names ID (P245) again, and then start with GND ID (P227) and IdRef ID (P269) - Difool (talk) 02:17, 5 January 2024 (UTC)
- Thanks Difool, it is very helpful. I have added it to my maintenance page Geagea (talk) 12:57, 7 January 2024 (UTC)
I have just finished User:Difool/viaf already somewhere#Union List of Artist Names ID and I will then pass to User:Difool/viaf already somewhere#National Library of Greece ID in the next days. Could you do next SBN author ID (P396)? I'm particularly interested in it. Thanks! --Epìdosis 23:39, 12 January 2024 (UTC)
- Okay, I'll let the bot do the SBN author ID (P396) after it's finished with GND ID (P227), which will take about a week or so - Difool (talk) 03:40, 13 January 2024 (UTC)
- Very good! I have finished this morning with NLG and I'm now on User:Difool/viaf already somewhere#Library of Congress authority ID from Union List of Artist Names ID. --Epìdosis 17:50, 13 January 2024 (UTC)
- I have now finished User:Difool/viaf already somewhere#Library of Congress authority ID from Union List of Artist Names ID and I will slowly proceed on User:Difool/viaf already somewhere#Library of Congress authority ID. --Epìdosis 20:02, 22 January 2024 (UTC)
- The bot finally finished with GND ID (P227) and I let it work through SBN author ID (P396), but VIAF mostly returned 'not found'; only three duplicates were found. It's now busy with NL CR AUT ID (P691) Difool (talk) 03:19, 27 January 2024 (UTC)
- Thanks! After NKC I would suggest BAV (Vatican Library VcBA ID (P8034)) if possible. See you soon, --Epìdosis 08:13, 27 January 2024 (UTC)
- The bot finally finished with GND ID (P227) and I let it work through SBN author ID (P396), but VIAF mostly returned 'not found'; only three duplicates were found. It's now busy with NL CR AUT ID (P691) Difool (talk) 03:19, 27 January 2024 (UTC)
- I have now finished User:Difool/viaf already somewhere#Library of Congress authority ID from Union List of Artist Names ID and I will slowly proceed on User:Difool/viaf already somewhere#Library of Congress authority ID. --Epìdosis 20:02, 22 January 2024 (UTC)
- Very good! I have finished this morning with NLG and I'm now on User:Difool/viaf already somewhere#Library of Congress authority ID from Union List of Artist Names ID. --Epìdosis 17:50, 13 January 2024 (UTC)
Could you include WD:ISNI (ISNI (P213))? They also offer data download https://isni.org/page/linked-data/ . CV213 (talk) 14:55, 3 February 2024 (UTC)
- Okay, I'll start it on Monday - Difool (talk) 01:41, 4 February 2024 (UTC)
- Great, thank you! CV213 (talk) 10:47, 4 February 2024 (UTC)
VIAF counts - meaning
editUser:Difool/viaf counts it says ISNI (P213) : 21913 . What does that mean? CV213 (talk) 15:27, 3 February 2024 (UTC)
- It means that wikidata has 21913 pages about humans with a ISNI (P213) but no VIAF ID (P214). A row in the table includes an example page and three lookup links for that page; two by id, and one by name. The bot uses the lookup by id link, so that gives an indication how successful the bot will be. Difool (talk) 01:39, 4 February 2024 (UTC)
- Thank you. Re lookup: It could be that the ISNI item contains one or even more VIAF IDs, even when the VIAF page doesn't have the ISNI. Maybe link the example ID to the primary source? But you will maybe analyse the ISNI data set separately. CV213 (talk) 10:45, 4 February 2024 (UTC)
- The bot doesn't check the ISNI data, but only checks the VIAF data through the VIAF API. For example, if you have ISNI 0000000032172983, it will call https://viaf.org/viaf/sourceID/ISNI|0000000032172983/justlinks.json. Not all persons in ISNI are added to VIAF, so it will probably result in a lot of "not found"s. Difool (talk) 11:25, 4 February 2024 (UTC)
- Thank you. Re lookup: It could be that the ISNI item contains one or even more VIAF IDs, even when the VIAF page doesn't have the ISNI. Maybe link the example ID to the primary source? But you will maybe analyse the ISNI data set separately. CV213 (talk) 10:45, 4 February 2024 (UTC)
viaf already somewhere ISNI
editThanks a lot for User:Difool/viaf already somewhere#ISNI, I already merged some, mostly duplicates from CZ data set. Could you add the score column, since some have different from, e.g. this https://dicare.toolforge.org/wikidata-diff/?qids=Q12619817+Q48965567&language=en ? CV213 (talk) 22:01, 7 February 2024 (UTC)
Hi Difool,
I don't want to spam the discussion page there, so may I continue the discussion here?
translitua: I've just noticed that I have installed this package in January 2013! It however lacks a Duden translit for Russian which would be required more often.
If you open a Github repo I'd contribute.
Please exclude German from this Bot run. I'm tired of the influx of bad (usually English) translits into my language.
Best wishes, Tadarrius Bean (talk) 13:14, 16 April 2024 (UTC)
- Hi @Tadarrius Bean, the Github repo is here; What is your username? then I add you as a collaborator. Difool (talk) 14:45, 16 April 2024 (UTC)
Hello Difool, since you added the image, you might wish to replace it by a larger one. Cheers. Lotje (talk) 05:28, 7 July 2024 (UTC)
- @Lotje Thanks, indeed a better image - Difool (talk) 01:34, 9 July 2024 (UTC)
- You are vey welcome Difool :-) Lotje (talk) 11:42, 17 July 2024 (UTC)
Bot problems
editAs explained in User:ASarabadani (WMF)/Growth of databases of Wikidata, the database is growing too quickly. In particular, the number of previous revisions of items.
Your bot is currently editing items repeatedly in a short space of time, e.g. on Special:History/Q455040 it made 8 edits in just over a minute. You should group these changes into fewer edits, ideally a single edit. Please stop the bot until you have done that.
I also recommend using the syntax [[Property:P268]] in the summary when referring to properties, instead of just an ID (e.g. P268) or a name (e.g. "reference url"). That will create a link like Bibliothèque nationale de France ID (P268), with the property name in the user's language.
- Nikki (talk) 09:59, 4 September 2024 (UTC)
- @Nikki: Alright, I’ve stopped the bot. I’ll check out what you mentioned. Difool (talk) 10:27, 4 September 2024 (UTC)
- @Nikki: I've updated the bot script and restarted it. Thanks for informing me, I didn't know the revisions were taking up so much storage space. Difool (talk) 02:06, 5 September 2024 (UTC)
Also, with this edit a reference was removed, and one reference was transferred to a different source without manual checking. --EncycloPetey (talk) 00:12, 7 September 2024 (UTC)
- @EncycloPetey: It might appear that way because the difference viewer doesn't display the unchanged content. For instance, the place of birth originally had three references - one duplicate - and after the edit, it has two references. Difool (talk) 01:41, 7 September 2024 (UTC)
- There was a reference for sex or gender from the GND, which is no longer present. That information was not a duplicate. --EncycloPetey (talk) 02:16, 7 September 2024 (UTC)
- @EncycloPetey: The sex or gender claim included two references with the same value 118615688 for GND ID (P227). The stated in (P248) value of these references is disregarded because the GND ID has only one applicable 'stated in' value (P9073), which is Integrated Authority File (Q36578). Therefore, the script merges the two GND ID references. Difool (talk) 03:25, 7 September 2024 (UTC)
- My mistake. --EncycloPetey (talk) 03:59, 7 September 2024 (UTC)
- @EncycloPetey: The sex or gender claim included two references with the same value 118615688 for GND ID (P227). The stated in (P248) value of these references is disregarded because the GND ID has only one applicable 'stated in' value (P9073), which is Integrated Authority File (Q36578). Therefore, the script merges the two GND ID references. Difool (talk) 03:25, 7 September 2024 (UTC)
- There was a reference for sex or gender from the GND, which is no longer present. That information was not a duplicate. --EncycloPetey (talk) 02:16, 7 September 2024 (UTC)
Withdrawn ID value
edithttps://www.wikidata.org/w/index.php?title=Q127548319&diff=prev&oldid=2210581071 - it wasn't withdrawn, it is in an invalid format. ISNIplus (talk) 18:30, 9 September 2024 (UTC)
- Ah, yes, that is of course also possible. I'll adjust the script accordingly. Difool (talk) 00:35, 10 September 2024 (UTC)
- Thank you. The format constraint could be out of date, I just added "sj" [3] which was added to "format as a regular expression (Property:P1793)" on 20 July 2020 [4]. @Kolja21: constraint out of date at least more than 4 years. ISNIplus (talk) 01:18, 10 September 2024 (UTC)
- I'll adjust the script to make no changes and only log the ID when it encounters a 404 - Not Found error from the BnF/IdRef/GND/LoC server. Then it must be manually checked, though fortunately, it doesn't happen too often. Difool (talk) 01:56, 10 September 2024 (UTC)
- Thank you. The format constraint could be out of date, I just added "sj" [3] which was added to "format as a regular expression (Property:P1793)" on 20 July 2020 [4]. @Kolja21: constraint out of date at least more than 4 years. ISNIplus (talk) 01:18, 10 September 2024 (UTC)
P813-only references
editHi! A little proposal of integration to your bot task: I sometimes find references containing only retrieved (P813); unless they are used on properties with datatype "URL" or "external identifier", they are obviously wrong;
- if they are the only reference to the statement, they should be removed (e.g.);
- if there is only one other reference which contains a "URL" or "external identifier" and no value of retrieved (P813), they should be merged with it (e.g.);
- in other cases, I would just avoid editing because probably there should be a manual fix.
This QLever query found nearly 30k cases of statements with datatype "time" and a reference containing only P813; surely, considering statements with other datatypes (excluding "URL" or "external identifier") other tens/hundreds of thousands will emerge. Could you add it to your bot task? No haste, of course. Thanks very much as always! --Epìdosis 08:33, 15 September 2024 (UTC)
- Hi @Epìdosis, I've updated the code based on your suggestions. Some cases might be acceptable, so I'll run the code in debug mode and list any that I encounter:
- Skipped:
- Not sure:
- member of (P463): Example - The meaning of 'retrieved' is here the date the user looked at the mentioned wikidata page and checked the claim
- Done:
- https://www.wikidata.org/w/index.php?title=Q519082&diff=prev&oldid=2248717983
- https://www.wikidata.org/w/index.php?title=Q392189&diff=prev&oldid=2248719170 - but looks like member of (P463)
- https://www.wikidata.org/w/index.php?title=Q29855670&diff=prev&oldid=2248748994 - manner of death (P1196)
- https://www.wikidata.org/w/index.php?title=Q7027443&diff=prev&oldid=2248756283 - copyright representative (P6275). A retrieved for languages spoken, written or signed (P1412) was moved to CONOR.SI ID (P1280): probably not correct, only move to next reference if possible
- Difool (talk) 03:15, 16 September 2024 (UTC)
- Thanks very much! Skipping different from (P1889) (probably also the twin said to be the same as (P460)), copyright status as a creator (P7763), on focus list of Wikimedia project (P5008) makes sense; for {{|463}}, I think it would make sense improving the reference with inferred from (P3452) = value (e.g.); https://www.wikidata.org/w/index.php?title=Q519082&diff=prev&oldid=2248717983 is perfect; for has subsidiary (P355) the same as member of (P463) could be applied, I agree (i.e. improving the reference with inferred from (P3452)). If you have other dubious cases I'm happy to discuss them of course! Epìdosis 06:06, 16 September 2024 (UTC)
- @Epìdosis Thanks! I let the script run now, but log and skip items where it wants to change retrieved references. I'll summarize the results tomorrow. Difool (talk) 06:57, 16 September 2024 (UTC)
- Perfect! On the case of "A retrieved for languages spoken, written or signed (P1412) was moved to CONOR.SI ID (P1280): probably not correct, only move to next reference if possible": I think that, if he was a statement value with one P813-only reference and 2(+) non P813-only references, we can just remove the P813 reference without moving it into one of the existing references. I would move P813-only reference into another one only if 1) the value has only one other reference and 2) this reference has no preexisting P813 value. --Epìdosis 07:44, 16 September 2024 (UTC)
- @Epìdosis Thanks! I let the script run now, but log and skip items where it wants to change retrieved references. I'll summarize the results tomorrow. Difool (talk) 06:57, 16 September 2024 (UTC)
- Thanks very much! Skipping different from (P1889) (probably also the twin said to be the same as (P460)), copyright status as a creator (P7763), on focus list of Wikimedia project (P5008) makes sense; for {{|463}}, I think it would make sense improving the reference with inferred from (P3452) = value (e.g.); https://www.wikidata.org/w/index.php?title=Q519082&diff=prev&oldid=2248717983 is perfect; for has subsidiary (P355) the same as member of (P463) could be applied, I agree (i.e. improving the reference with inferred from (P3452)). If you have other dubious cases I'm happy to discuss them of course! Epìdosis 06:06, 16 September 2024 (UTC)
- @Epìdosis: All items from the QLever query result are done now. Once QLever processes a new Wikidata dump, I'll run the script again. I ended up with a list of 163 properties (not datatype "time") permitting a single 'retrieved' reference, and a list of 126 properties from which such references are automatically removed. Statements with multiple references, which included a reference with only a 'retrieved' statement, have been handled by either merging or removing those single 'retrieved' statements, by the script or manually. Difool (talk) 03:50, 21 October 2024 (UTC)
- Wonderful job! Epìdosis 03:54, 21 October 2024 (UTC)
@Epìdosis: Small update: Approximately 21k items in your QLever query are scholarly article (Q13442814) with the same edit [5]. I skip these for now. For claims with only a single retrieved reference, I've created a whitelist of properties where the reference can be removed (name properties like pseudonym (P742) and properties that issue a warning without a reference, like convicted of (P1399)). Additionally, I've made a skip list of properties to ignore (the list above plus said to be the same as (P460)). The script logs the rest, and I'll review these once the bot completes the list. For claims with multiple references, I've coded the reference list as follows:
- R: Reference with single retrieved (P813)
- S: Reference with single stated in (P248)
- W: Reference with an external ID, without a retrieved (P813)
- A: Reference with a retrieved (P813) and an external ID
- B: Reference with a retrieved (P813) and a stated in (P248)
- X: Others
If a reference list contains only R, A, and B, I'll remove the R references. For sequences like RS, SR, RW, WR, I'll merge them. For BWR, I'll merge the WR part, and for SRB, I'll merge the SR part. After running the script, I'll check the log and determine actions for other sequences. Difool (talk) 04:28, 19 September 2024 (UTC)
- @Difool: it seems perfect! I think we can also fix the 21k items with the same edit in the future, but probably at the end, they are certainly not prioritary. Thanks very much also for the coding of the reference list, it is a great solution! Bye, Epìdosis 06:43, 19 September 2024 (UTC)
Mis-merging references
editIn this edit, the source was changed to be the same for three references, however, those three sources are not the same. They are all housed at the same base URL, but they are separate publications, with different names and authors. Merging the references to be all at the housing website is misleading, and loses information. --EncycloPetey (talk) 20:40, 24 September 2024 (UTC)
- Hi @EncycloPetey, I see what you mean: the original 'stated in' information is lost, now that all the references point to Treccani's Enciclopedia on line (Q65921422). I'll have the bot skip these URLs for now and reach out to @Epìdosis for assistance. I noticed a similar edit from the bot. Difool (talk) 01:16, 25 September 2024 (UTC)
- Looking closer, I see that URLs like https://www.treccani.it/enciclopedia/aristofane_(Enciclopedia-machiavelliana)/ match the URL match pattern (P8966) of multiple external id properties. I'll update the bot code to handle these cases and write a script to correct the already modified data in Wikidata. Difool (talk) 01:47, 25 September 2024 (UTC)
- Hi! To clarify, Treccani ID (P3365) has been initially used generically both for the Enciclopedia Treccani online (whose articles have nothing in parenthesis) and for other Istituto Treccani works' like Enciclopedia dei ragazzi ID (P9983) and Enciclopedia machiavelliana ID (P11820), for which we are gradually creating new apposite properties, moving IDs there from generic P3365. Probably the URL match pattern (P8966) of P3365 should be modified to better fit these cases; also @Horcrux: who has proposed most of Treccani properties. Epìdosis 06:21, 25 September 2024 (UTC)
- @Epìdosis: Done: [6] --Horcrux (talk) 07:28, 25 September 2024 (UTC)
- @EncycloPetey, @Epìdosis I ran the script to correct the already modified data in Wikidata. Some values are not corrected (query), notably because URLs like https://www.treccani.it/enciclopedia/raffaello-sanzio_(Enciclopedia-Italiana) match with the URL match pattern (P8966) of Encyclopedia of Italian ID (P11586) and Treccani's Enciclopedia Italiana ID (P4223). @Horcrux: I think the URL match pattern of P11586 misses the dell' part Difool (talk) 04:16, 26 September 2024 (UTC)
- Hi! To clarify, Treccani ID (P3365) has been initially used generically both for the Enciclopedia Treccani online (whose articles have nothing in parenthesis) and for other Istituto Treccani works' like Enciclopedia dei ragazzi ID (P9983) and Enciclopedia machiavelliana ID (P11820), for which we are gradually creating new apposite properties, moving IDs there from generic P3365. Probably the URL match pattern (P8966) of P3365 should be modified to better fit these cases; also @Horcrux: who has proposed most of Treccani properties. Epìdosis 06:21, 25 September 2024 (UTC)
Withdrawn identifier value
editHi Difool, in some cases the (old) DNB catalog is not showing a GND. That does not mean the identifier value has been ithdrawn. It's a known problem with the current version of this catalog. (A new one is in preparation.) You can find the GND for example by using OGND. --Kolja21 (talk) 05:18, 4 October 2024 (UTC)
- @Kolja21: Ow, that's good to know. The script isn't running right now, and I'm planning to rework this part before I start it up again. I'll likely switch it to log only, so if a server gives back a 'not found' message while searching for an identifier, it will need to be manually checked to see if the identifier has been withdrawn. Difool (talk) 06:31, 4 October 2024 (UTC)
- Perfect. If you post a list with the problematic IDs I'll try to fix them. I don't know why they don't show up in the current version of the DNB catalog (the problem has been known for a year) but clicking on edit and saving them again mostly helps. @Epìdosis: FYI. --Kolja21 (talk) 15:10, 4 October 2024 (UTC)
Some missed reference URL on Q361509
editHi there! I see that the bot passed on September 12th on Adolf Faller (Q361509), however some references URL remain (e.g. the VIAF reference on languages spoken, written or signed (P1412) was missed, while the bot correctly adapted other references on the same statement). Would it be possible for you to check if anything went wrong? Thanks ! --Thomas Kerboul (BGE) (talk) 09:02, 21 October 2024 (UTC)
- Hi @Thomas Kerboul (BGE), since that VIAF reference already has a VIAF ID (P214), it is currently not changed by the script. The script now only removes a reference URL if the reference doesn't have a database identifier like VIAF ID (P214), Bibliothèque nationale de France ID (P268) etc. or during a merge of references. However, I'm considering requesting bot permissions to remove such redundant reference URLs too. Difool (talk) 09:38, 21 October 2024 (UTC)
- Thanks for the reply! Let me know if/when you ask for this permission, I'll support it. --Thomas Kerboul (BGE) (talk) 09:53, 21 October 2024 (UTC)