+3
Fixed

Improper parsing of author field

Raphael ‘kena’ Poss 8 years ago updated by Christian Fritz 5 years ago 5

I believe BibBase improperly handles author lists of the form


   Author = {FamilyName1, FirstName1 and FamilyName2, FirstName2 and ...}


For example with the entry:

@techreport{poss.13.mg,
   Author = {Poss, {Raphael `kena'}},
   Institution = {University of Amsterdam},
   Month = {March},
   Number = {arXiv:1303.4892v1 [cs.AR]},
   Title = {On whether and how {D-RISC} and {Microgrids} can be kept relevant (self-assessment report)},
   Url = {http://arxiv.org/abs/1303.4892},
   Year = {2013}}


the author name is incorrectly reported as "Poss and Raphael kena" instead of "Poss, R."


Answer

Answer
Fixed

The new bibtex parser we've deployed recently is able to parse all these formats correctly. The somewhat degraded case of no "and"s at all is handled by breaking at every second comma. This is fragile for cases that, for instance, involve suffixes (like ", Jr."), but those cases are rare.

First let me say that bibbase is a wonderful tool!

The bug described here is related to the ordering of Author initials problem that I posted elsewhere. My understanding of the problem is the following: Most of the users so far seem to be from computer science backgrounds and use particular reference databases that output bibtex with author name format First Last, i.e.

author={John A Smith and Joe C Punchcard}

and bibbase works splendidly when this is the input bibtex format giving output

Smith, J A; Punchcard, J C

However, there are a number of major scientific services, such as ISI Web of Science (one of the most widely used searchable databases) that output bibtex with author format Last, First, with and separation, i.e.

author={Smith, John A and Punchcard, Joe C}

For this second format bibbase does not work. It interprets the comma separated initials as a separate Author, and also reverses the display of initials to give output

Smith; A J; Punchcard; C J

Is it possible to support input with bibtex author format Last, First with and separation? Presumably the algorithm could branch on the existence of commas, and then run a script to reverse the order, and then pass the result into standard bibbase...

I predict that this would widen the uptake of bibbase by the scientific community enormously.

I note that the output of Scopus would be much harder to accommodate as they use a rather ambiguous format that lacks the and separation:

author={Smith, John A, Punchcard, Joe C}

Presumably this would make a complete mess of bibbase output.

Thanks so much for your consideration!
+1
Ashton,

BibBase is able to parse `author = {Gupta, S. and Fritz, C. and Price, R. and Hoover, R. and de Kleer, J. and Witteveen, C.}` just fine, and even `author = {Gupta, S., Fritz, C., Price, R., Hoover, R., de Kleer, J., Witteveen, C.},' should work. I believe the issue regards the missing "." in your bibtex entries. There are actually BibBase users who have a name component that is a single letter (similar to "von", "van", or "de").

I find it surprising that a scientific service would output bibtex entries that omit the dot. But I'll investigate further as to what the right behavior should be. For now, if you add dots, it will most probably work (I don't see why it wouldn't).

It seems that the actual bibtex tool can handle your case correctly though, so I'll try to replicate that behavior in BibBase.


aahhhhh! Solution tested and verified.
Thanks so much for your help!
Answer
Fixed

The new bibtex parser we've deployed recently is able to parse all these formats correctly. The somewhat degraded case of no "and"s at all is handled by breaking at every second comma. This is fragile for cases that, for instance, involve suffixes (like ", Jr."), but those cases are rare.