0
Fixed

Not all (667/689) citations from Zotero private group library are available

Adam Rego Johnson 2 years ago updated by Christian Fritz 2 years ago 13

Hi, I am attempting to move my organization's BibBase page hosted off Mendeley to one hosted off Zotero using the same library. There are slight discrepancies, resulting in some citations that are in the Zotero private group library to not appear on the BibBase page. I believe I have ensured it is using the correct link path, as I don't think I would be getting anywhere near the full amount of citations if I was not. Here is the link to my page:

https://bibbase.org/show?bib=http%3A%2F%2Fbibbase.org%2Fzotero-group%2FGCWealthProject/4501563&msg=embed#

Please let me know any other information I can give that may be helpful. I am also encountering an issue with PDFs (which are cloud-synced in Zotero storage) not being available, if it is somehow at all related, but I have left a comment about that in another related thread -- this particular problem in this post is more unique, however.

Thanks.

Under review

Hi Adam,

This took me a while to figure out, but eventually I realized that at least in part this seems to be due to duplicate entry keys in the bibtex:

> grep @ 4501563.bib  | sort | uniq -c | sort -nr | head -n 20
      2 @unpublished{piketty_long-run_2010,
      2 @unpublished{pellegrini_what_2016,
      2 @techreport{noauthor_global_2015,
      2 @techreport{noauthor_global_2014,
      2 @techreport{noauthor_global_2013,
      2 @techreport{noauthor_global_2012,
      2 @techreport{noauthor_global_2011,
      2 @techreport{noauthor_global_2010,
      2 @misc{noauthor_oecd_nodate,
      2 @misc{noauthor_new_2020-1,
      2 @misc{noauthor_new_2020,
      2 @article{saez_wealth_2016,
      2 @article{piketty_income_2003,
      1 @unpublished{zucman_missing_2013,
      1 @unpublished{zoutman_effect_2014,
      1 @unpublished{xavier_wealth_2020,
      1 @unpublished{wolff_household_2021,
      1 @unpublished{wolff_household_2017,
      1 @unpublished{waltl_multidimensional_2020,
      1 @unpublished{waldenstrom_national_2015,

I see...some of this seems like it may be accidental duplicates that I can look into, but this also appears to be happening with the references for working papers and their published alternative editions. Given that our library deals in economics, there are a fair amount of pairs of very similar, though different, references like this. It is strange, because when I look at the .bib output from Mendeley that I fed into Zotero, it had different entry keys which did not contain these duplicates it seems. But exporting a .bib now from Zotero, which is what BibBase is utilizing clearly, I see the entries you've highlighted and their duplicate counterparts appended with "-1".

Thank you so much for looking into this so quickly. Do you have any idea as to how I could fix this and separate the working papers from their published versions while still utilizing Zotero (i.e., other than manually adjusting a .bib and uploading it, losing the utility of Zotero and it's automatic uploading and cloud storage)? I'm not sure what I can do to affect what entry key Zotero utilizes in it's BibTeX output.

Fixed

Hi Adam,

Yes, that makes sense. Regarding the "-1" suffix: Zotero seems to be using different algorithms when exporting to users via the UI and to other services via their API, because the bibtex we receive from them does not have those suffixes. However, it was easy for us to add that feature on our end as well, so we just did. Personally I don't see much of a reason to enforce the uniqueness of bibtex entry keys and it doesn't break anything.

Your page now shows all 689 publications.

Hope that helps! If you like BibBase and want to help us, please consider giving us a shout out on Twitter: https://twitter.com/bibbase.

Thanks,

Christian

Hi Christian,


Thank you so much for fixing this so quickly! Wow. That is very much appreciated. I will definitely give you a shout on Twitter. If there is a way to donate, please let me know -- I can't see one.

Hi Adam,

Awesome, thanks in advance for the shout on Twitter! In terms of donating: no, we don't have a mechanism for that, but you can sign up for one of our premium plans instead if you want. They start at $4/month and you can cancel any time.

Hi Christian,

That's too bad -- I know about the premium plans but am using BibBase as part of an organization, and am not sure we need any of the upgraded features for our specific use. But I will keep it in mind.

Also, apologies if this is an improper use of the thread, but are you aware of the issue I raised in a comment on another thread regarding PDF links being unavailable through the same Zotero-linked BibBase page? I figure you probably are but it can't hurt to check -- it would be nice to have confirmation that it is being looked at and may potentially be solved sometime soon, as otherwise I need to perhaps soon move back to Mendeley or another method of interacting with BibBase.

Thank you,

Adam

Yes, confirmed, I saw that comment. We haven't yet found the time to investigate it more so I can't, at this point, make any promises. There are a number of issues that rank higher in priority right now, either because they are required by premium users, or because they affect more users. So I can't quite tell how long it will be until we get to it. If using straight-out bibtex files, hosted either on your own server, on github, or directly on bibbase, then I highly recommend that. They are best supported and give you the greatest level of control.

Hi Christian,

Got it, that's understandable, thank you. I actually was just about to edit my comment that I realized I could go ahead and purchase a $20 premium subscription on a personal account and cancel the reoccuring subscription as a one-time donation, so I've gone ahead and done that. Thank you for all your help.

Regarding directly uploading .bib files, while the lack of automatic cloud updating isn't too much of an issue, the real reason we have been using Mendeley/Zotero is for the easy PDF storage. I saw when upgrading to the professional premium plan mention of 8GB of storage, but I'm having trouble figuring out how that works. Is it possible to have a BibBase page generated directly from a user-uploaded .bib file that can point to PDFs for the references, hosted by BibBase (or even some other server/service?).

Hi Adam,

Thanks for the contribution!

Regarding your question: Yes, exactly. In the bibtex file you upload (and which you can later edit directly on bibbase), you can include url fields that point to files you've also uploaded. You can specify absolute URLs in those fields, or, for convenience, you can also just list the filename, in which case they will resolve to URLs on the same host and same sub-path as the bib file itself. 


For example, if I have these files in my file manager on BibBase:


and the cf.bib file contains an entry with the field:

  url = {1909.11604.pdf}

then when I render that bib file it will resolve that filename to the file in my account.



Hi Christian,

Thank you for the explanation. This sounds like it would be a workable solution, allowing my group to utilize Zotero and without too much hassle manually extract a .bib and relevant PDFs to be uploaded directly to BibBase. I have run into some problems implementing this, however.


I have attempted to create a BibTeX file from my Zotero library that could have each reference link to it's title.pdf, as in your example. I have uploaded this all to my personal account and have uploaded two PDFs as a test, but the links to the PDFs on the BibBase page do not work. On my website editor file manager, I can access another link to the files there, and I have verified they are named the same as is referenced in the .bib, but I cannot seem to get it to work.


I have tried it without underscores in the PDF titles, replacing them with commas (i.e., author, year, title.pdf, instead of author_year_title.pdf), as I read on one of the documentation pages that these would be removed for something (though when editing the .bib on the website it seems like they are still there), but this did not help either. I have made different .bibs and pages with different variations of the url feature in the .bib (i.e., differentiating it from any other url's in the reference by naming it url_Paper, and then making sure any other urls were named url_Link), but this also doesn't seem to fix it.


I have made my page public so hopefully you can see it if you need to, here. The two references for which I've uploaded the PDFs for are "The Concentration of Personal Wealth in Italy 1995–2016. Acciari, P., Alvaredo, F., & Morelli, S. 2021." and "Intergenerational Wealth Transfers and Wealth Inequality in Rich Countries: What Do We Learn from Gini Decomposition?. Nolan et al. 2021." I'm not sure if these are public as well, but this is the link given to me for Nolan et al. on the website editor, which works, and here is the link that the linked BibBase page tries to direct to, resulting only in "error: file not found".

Also--another issue I have noticed before but particularly today--I have found that while on a particular publications list page the drop-down for the BibTeX on a particular reference or anything else will return what came from the specific .bib that page is pulling from, when visiting the link to a particular reference (e.g., bibbase.org/network/publication/ . . .) sometimes this will lead me to an alternate version of the reference, seemingly pulled from one of our other BibBase pages, resulting in a different BibTeX being displayed (unwanted, particularly when it pulls from our Mendeley-linked page which is what we are trying to switch from as the BibTeX output from that page is very messy for whatever reason) and other details. At first I noticed the references on my original Zotero-linked BibBase page referenced at the start of this thread were now being linked ultimately to the same references in our Mendeley-linked page, but now at this moment they are being linked to one of the pages I have created today on my personal account (I can tell because the BibTeX contains the url = title.pdf field).

This makes things very messy and hard to control, and I'm not sure how much of this is resolvable as I assume it is somewhat fundamental to BibBase's structure. I assume perhaps this would be fixed if the BibBase API access for the Mendeley account linked to the other page was revoked and my other pages deleted, but I am quite remiss to do that in hopes that it would fix anything until I know it actually would -- the Mendeley-linked page in particular is live on our organization's website.

Apologies for the long message, hopefully this is clear. If you'd like me to split any of this off into another thread please just let me know.

Thank you so much,

Adam

Hi Adam,

Thanks for the elaborate message and attempts to debug. This helped me diagnose this pretty quickly.

Regarding the linked PDFs: You've found a bug! The issue was that we weren't yet handling filenames with spaces and special characters correctly in this context. This is fixed now and I've verified that the two papers you mentioned now download fine from the link on the page.

Regarding the per-publication pages on BibBase: yes, you are right. We deduplicate papers by their authors, title, and year and the database keeps getting updated by the last paper our servers saw with that combination. So while your other, Mendeley-based page is still live and being visited by people browsing the web, our database record for the paper keeps getting written by that old version. Once that page is removed from the web, or otherwise made inaccessible to people, the new page you are creating will be the one setting that record.

However, I recognize that this can be frustrating in some situations, like in your current migration. So we've just added a new option "noIndex=1" that you can use in the bibbase URL of your old (Mendeley-based) page to prevent it's data from being used to update the database.

Christian

Hi Christian,

Wow! Thank you for the quick fix and response. I'm glad the detail was helpful. The links do indeed seem to be working, which is great! Thank you for adding an option to get around that problem for this specific use case, that is very appreciated.

To clarify regarding the noIndex=1 option, how exactly is this to be included in the link? For example, in this link for the Mendeley-linked page, https://bibbase.org/service/mendeley/7a51828e-9782-3fa4-8bdb-357443954bfc/group/b49ab5c8-edb7-3544-9fc1-ebbc7f3ac750#, would it be appended directly after the "#", so "#noIndex=1", or with a "?", "/", "&" or something? Sorry I'm not well versed in networking / web pages : ).

And in the links we use to embed onto our Wordpress page a publications list for certain groups of references categorized by keyword, would it be the same?

<script src="https://bibbase.org/service/mendeley/7a51828e-9782-3fa4-8bdb-357443954bfc/group/b49ab5c8-edb7-3544-9fc1-ebbc7f3ac750?jsonp=1&filter=keywords:Determinants&nbspof&nbspWealth&nbspand&nbspWealth&nbspInequality&sort=title"></script>

Appending right after "&sort=title"?

Just want to make sure I have things right, as any of the ideas I suggested seem to return the page fine, so I'm not sure which is correct, if any. Once I know I have the right one I should be able to get that updated on our Wordpress site (and make sure to only visit the page with the correct link for now) which would help make sure everything will work as intended if the Mendeley-linked page is ultimately disconnected.

And just to make sure I understand, if I am able to correctly link to the Mendeley-linked page such that it does not update the database as described, what exactly could I do to try to trigger any other of our pages being the "one" that updates the database? I.e., simply creating a new page, or simply visiting the bibbase.org/network/publications . . . links of a particular reference?

Thank you!

Adam

Url parameters, like the new noIndex one, need to come before the hash (#). They are started by a ? and separated by &. So you'll need: https://bibbase.org/service/mendeley/7a51828e-9782-3fa4-8bdb-357443954bfc/group/b49ab5c8-edb7-3544-9fc1-ebbc7f3ac750?noIndex=1

And yes, if there already are url parameters you can just append it with &noIndex=1.

And yes, to trigger an update from a new page, all you need to do is visit it in a browser. The update of the database happens asynchronously (the page itself is rendered from cache), but it should only take a few seconds to a minute depending on the data size.