Find papers authored by a specific number of authors

PubMed has an interface which you can call from a script. The intend to develop it is exactly your class of problem, which cannot be solved from the provided user interface.

This is the main page of NCBI Entrez API: https://www.ncbi.nlm.nih.gov/books/NBK25501/

What you need to do is to query PubMed by keyword(s), for example this is a search by "concrete":

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&retmax=100&sort=relevance&term=concrete

Do multiple queries, to cover your field, for example you can also consider "brick" or "cement".

It returns a list of publications. For each publication, you would have to check the number of authors and keep ones with a single author. For each publication, you shall call:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=29510510&retmode=json

Determine the size of the vector "authors", and only keep the ones of size one.

Based on @Razvan P's hint, I wrote a little python3-script which solves your problem:

'''
Created on 01.04.2018

@author: OBu
'''

import requests
import json
from collections import Counter # for histogram

eutils_basepath = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/'
DB = 'pubmed'           # please modify for other databases
RETMAX = '100'          # max 100 results - modify if needed, maximum = 100.000
SEARCHTERM = "concrete" # replace with your search term

# Now build the search URL:
search_url = eutils_basepath + 'esearch.fcgi?db=' + DB + \
                               '&retmode=json&retmax=' + RETMAX + \
                               '&sort=relevance&term=' + SEARCHTERM
# for additional search parameters or mor complex search terms see examples in
# https://www.ncbi.nlm.nih.gov/books/NBK25500/#chapter1.Searching_a_Database
# or the full doc under 
# https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch

s = requests.Session()
r = s.get(search_url)
if r.status_code != 200:
    raise ConnectionError("Search failed with error code " + str(r.status_code))
search_results = json.loads(r.text)

#show some statistics
print(f"{search_results['esearchresult']['count']} publications found.")
if RETMAX < search_results['esearchresult']['count']:
    print(f"Warning: Only the first {RETMAX} publications are processed.")

# walk through all rerieved ids and fetch detailed publication information
# An alternative soloution could use one single query based on the previous search results as shown in 
# https://www.ncbi.nlm.nih.gov/books/NBK25500/#_chapter1_Downloading_Document_Summaries_
# This would reduce the server load
histogram = Counter()
for pub_id in search_results['esearchresult']['idlist']:
    #print(f"Fetching {pub_id}", end=" ") # uncomment for a more verbose versione
    # Now build the fetch URL:
    fetch_url = eutils_basepath + 'esummary.fcgi?db=' + DB + '&retmode=json&id=' + pub_id
    r = s.get(fetch_url)
    if r.status_code != 200:
        raise ConnectionError(f"Fetching of publication {pub_id} failed with error code {r.status_code}")
#     else: # uncomment for a more verbose versione
#         print("...success!") # uncomment for a more verbose versione
    fetch_result = json.loads(r.text)
    authors = fetch_result['result'][pub_id]['authors']
    if len(authors) == 1:
        print(f"UID: {pub_id}, author: {authors[0]['name']}, title: {fetch_result['result'][pub_id]['title']}")
    histogram[len(authors)] += 1    

print("Histogram: (number of authors, number of papers with that many authors)")
print(sorted(histogram.items(), key=lambda x: x[0]))

You will need python 3.6 or above to run this script (please remove f-strings for earlier versions, and you'll hvae to install "requests" via pip install requests.

The script searches for the SEARCHTERM in pubmed and for the search term concrete (I like this running gag ;-) ) produces an output like

14125 publications found.
Warning: Only the first 100 publications are processed.
UID: 28844248, author: Baroody AJ, title: The Use of Concrete Experiences in Early Childhood Mathematics Instruction.
UID: 28772472, author: Wang XY, title: Modeling of Hydration, Compressive Strength, and Carbonation of Portland-Limestone Cement (PLC) Concrete.
UID: 29159238, author: Paul SC, title: Data on optimum recycle aggregate content in production of new structural concrete.
UID: 27012788, author: Kovler K, title: The national survey of natural radioactivity in concrete produced in Israel. 
Histogram: (number of authors, number of papers with that many authors)
[(1, 4), (2, 15), (3, 21), (4, 24), (5, 19), (6, 10), (7, 3), (8, 2), (10, 2)]

It should not be too difficult to modify the script for other search tasks...

If there are questions on how to use the script, please ask!

Find papers authored by a specific number of authors

Tags:

Bibliometrics

Literature Search

Pubmed

Scopus

Web Of Science

Related

Recent Posts