+1
Completed

Aggregation of Multiple Bibtex Files

Jorge Baier 10 years ago updated by Prof Chaos 10 months ago 14

It would be nice to have support for multiple bibtex files (e.g., bib=<something>&bib1=<something_else>&bib2=..) . Imagine each individual in a research group uses bibbase to maintain their publications. Then, building a publications page for the whole group would be trivial with this feature.

Answer

Answer

Hi Pascal,

Thanks for following up on this! By now we actually support this. It's actually one of the most popular features of our group plans. In those plans, all users in your organization have their own data sources that they can maintain in whichever way they like -- bibtex file directly on BibBase, a URL to a bibtex file hosted elsewhere, Zotero, Mendeley, DBLP. All those sources are merged, with deduplication, at query time when a query for the organization's publications are made.

Hi Jorge,


Thanks for the suggestion. This is a good idea and shouldn't take me long to implement. Until I get to it you can actually do the inverse (like we did for the KR group in Toronto): take one large bibtex file that is the union of all, and then have each member of the group use the "filter" parameter for only showing publications with their names among the authors, for use on their personal page.


Under review

(setting status)

Completed
Hi Jorge,

I have not forgotten about this request. I always had in mind to solve this via a central database that simultaneously supports other functions as well. After long last, I've now implemented this. It is now possible to get bibbase pages for any keyword you use in a bibtex entry. For instance:
http://bibbase.org/network/keyword/golog

While this is mainly intended for use with areas of research, you can of course use it for other things as well, incl. the problem you are trying to solve: if everyone in your research group uses a specific term in their keywords, say "ing.puc", then you get a collected view of all of those publications in one page at the corresponding keyword page.

To embed a keyword page into another page, you can simply use URLs like
http://bibbase.org/all/keyword/golog in your page using the same mechanism as for regular bibbase pages (bib=http://bibbase.org/all/keyword/golog).

Please let me know if this works for you, and have fun at KR and/or AAAI if you are going!
+1
Hi Christian, 

Maybe, I did not fully understand your answer, but does your solution work if you are hosting your own page? I have a public folder in Google Drive, and I have there many .bib files from different authors. All of them contain a unique keyword to relate them (pisis). How can I load and display all of them using bibbase??  For example, the public address is: https://googledrive.com/host/0B92aE0wdpf7TNVhQblE5...

Thanks a lot for your support!!!
+1
Hi Romeo,

yes, it does. In your case, you are looking for these two pages:
http://bibbase.org/network/keyword/pisis
http://bibbase.org/all/keyword/pisis

You need to make sure that each of the individual bib files are being used with bibbase as well -- that is the mechanism by which the database is being kept up to date. Every time someone visits a bibbase page that uses one of these bib files as source, the database will be refreshed. So this is *not* as solution when you simply want to merge bib files. In that case, you should just manually merge them and use them as usual with bibbase. This is specifically a solution for showing an aggregate page from several already existing pages.

In your case, it seems that the second bib file in the drive (pubs_sara) had not yet been used with bibbase, so I just opened it once (http://bibbase.org/show?bib=https://d1055d4f3efc35...) in order to get it into the database. It won't stay up to date with changes to that bib file though unless the file is being used on a publications page where it is being visited somewhat regularly.


Hi Christian,


I have read all entries here several times and tried around quite a lot. But I still don't get the "merge" as asked for by Jorge.

Could you please respond with a clear-to-follow instruction on what to do?

So let's assume, for the sake of simplicity, there is just one URL where the different basefiles are hosted. Let's call it "my-university.com".

Now we have different sub folders in which our many bib-files reside. E.g.,

my-university.com/user1/articles.bib

my-university.com/user1/conferences.bib

my-university.com/user1/allElse.bib

my-university.com/user2/everything.bib

my-university.com/stuff/dissertations.bib

my-university.com/stuff/variousStuff.bib

User 1, although using 3 different files (he/she might his/her reasons for this!) might want to show all his/her publications merged together. So since that "merge in the URL" (as suggested by Jorge) does not exist, he/she defines a new keyword, saying "user1" in the keyword field.

Now what? Should he/she embed https://bibbase.org/show?bib=my-university.com/user1/keywords/user1&theme=default&fullnames=1&msg=embed ? I tried that, this does not work. But what else is it, then?

Similarly, "the University" wants to list all publications, i.e., the fusion of all 6 bib files. Which URL will "the university" use here? There is not even a shared parent folder. Well, the smallest denominator is my-university.com/, but there is no single bib file in here, so what bib URL will "the University" include? (assuming that all files share some "shared" keyword, although this seems like a lot of effort.)

I guess my problem with all your explanations is that your online documentation clearly says that you are supposed to link to one single .bib file, but in your explanations here is not even one single .bib file in the URL. I think you explained that "my individual calls" to those bib files will cause them to be copied to some central repository by you (which surprised me; I thought you only "reformat" them ad-hoc -- why copying the content to some central server?). However, even if we should use such a "bibbase url", which one? You introduced "http://bibbase.org/all/keyword/**the-keyword**" and ``http://bibbase.org/network/keyword/**the-keyword**" What do they mean? What does "all" stand for? What is "network"? I used both with my files and my keyword, and it sill did not work. (I.e., the page https://bibbase.org/show?bib=https://bibbase.org/all/keyword/**the-keyword**&theme=default&fullnames=1&msg=embed does not list any entries, although I added the respective keyword to the keywords entry and I made sure to open that bib file so that your data base gets updated.)

Pascal,

You are looking at a very old thread. The recommended way to solve the problem you seem to be concerned with is indeed to have one large bib file that contains everything and then using filters as described in the options. That is sort of the inverse of the approach you are currently pursuing. Does that work for you? meaning, is there a field you can easily filter on in your current bib files to break them apart again how you need them? Otherwise you can add a new field "mytype = {articles}", "mytype = {conferences}", etc. to your files before merging them and then filter on that as needed.

Hey Christian,

First of all, thank you for your response! I remember to have read it a *long* time ago, but it seems I've never responded to it... I just remembered due to my other question you've just answered. :)

Regarding your question whether it would also be *possible* to use just a single huge bib file: I guess it could? By using these additional fields as you suggested I do not see why that shouldn't be possible *technically*, but it has few clear drawbacks:


(1) One would lose a clear separation of how the bib entries are organized. It's just 'semantically' sooo much nicer to separate bibtex entries into different files if they do have a clear semantic difference. After all, this is what one does with cose as well -- refactoring is a huge part of any successful project. Once one has more than maybe a few dozend entries I think separation might be very beneficial. In my case it's even more severe since I do not only have approx. 100 publications of my own, but I also organize publications by several authors. It doesn't feel right mixing them, just from an organizational point of view.


(2) The second downside is simply the workload of the users. The bibbase service only receives "one large list" anyway, I would assume that technically it wouldn't matter whether this comes from one physical file or whether that "single list" is the result of merging n files together ad hoc (let me know if I'm wrong!), which we'd get with a syntax as suggested by Jorge (e.g., bib1=...&bib2=...). If however the user had to merge all these files together he/she had to add all these additional bibtex fields to remain the desired functionality. Also the call itself would be a bit complicated since we had to use the additional filter. (Though otherwise we of course have to use additional parameters as well, namely those required for the different bib files; but that still seems much more appropriate semantically.)


(3) Lastly, the current workaround would make the bibtex files less nice (and more error-prone as all these categories had to be added for each new entry). The former is "relevant" when recognizing that the bibbase service also allows to show the bibtex entry itself. I argue that this bibtex entry should only contain the information relevant for the website visitor, i.e., all those that are 'semantically' relevant to the paper, but internal stuff like keywords used for filtering shouldn't be accessible to him/her. Thus having them in there is just not very nice. (Though that is clearly the -- be far -- least important argument; the first two are much more important.)


In principle, we (i.e., we users who'd like to have this feature) can just write a program/script that creates such a single bib file from the sources for each source file. E.g., say that I'd like to call your service with bib1, bib2, and bib3 in a merged fashion, I could instead write code that creates a new file bib-1-2-3 that (a) merges all files together wile (b) adding the required keywords into the entries of the new files (even that can be automated since once can just use folder+filename as keyword). The downside (apart from having to write that code :)) is that this code/script had to be called after every single change done to either bib1, bib2, or bib3 in order to create an updated bib-1-2-3. Is seems easier if that exact process would simply be done by the service. Even if bibbase caches the files that it receives, couldn't it even then cache the fusion of inputs? Or would that make any "difference" to the service/underlying data base? I don't know how that tool works internally (though I know that somewhere you have quite an elaborate explanation online...), but at the end I'd assume it simply receives one big "list" (file) with content. Wouldn't it be a trivial task of supporting the concatenation of several input files, or is there indeed a difference? Without knowing the technical details, I'd assume that calling bibbase with bib1=...&bib2=...[...]&bibn=... would be nothing more than calling it with the concatenation of bib1 to bibn -- isn't that the case?

(I'm of course not at all saying that this would be trivial to implement (and maybe others don't even have the demand! though I'd be surprised if others don't organize their works in different files as well), one still had to change the syntax; though it can probably be expanded in a downward-compatible way by just adding *additional* sources as parameters -- to append the cited bibfiles at runtime for each call.)

Answer

Hi Pascal,

Thanks for following up on this! By now we actually support this. It's actually one of the most popular features of our group plans. In those plans, all users in your organization have their own data sources that they can maintain in whichever way they like -- bibtex file directly on BibBase, a URL to a bibtex file hosted elsewhere, Zotero, Mendeley, DBLP. All those sources are merged, with deduplication, at query time when a query for the organization's publications are made.

That is great news! Thanks for clarifying/updating. :)

Christian, 

Thanks a lot for your clarification. I assume that since the individual .bib pages get related by their keywords their locations really do not matter, or do they? What I am trying to say is that if I move my bib pages to a different location, the databases that you build (at bibbase.org) do not get duplicates, is it right my assumption? 

Thanks again, 

Romeo
Yes, that's correct on both accounts. Locations don't matter and paper don't get duplicated. Papers are uniquely identified on bibbase by their "bibbaseid" which is constructed from author last names, title with whitespace removed, and year. Sure, every once in a while someone published the same paper with the same title and coauthors in the same year, so then there is a hash-collision, but I think that's not very good practice, so I'm OK with that.
+1
Christian, 

Thanks again for your prompt reply. By the way, let me tell you that your work with bibbase is outstanding. Excellent work!!!