stpy.summarize_gScholarAlerts
This scripts is aiming at automating the process of summarizing the publications from Google Scholar Alerts.
It connects to a Gmail account, fetches all/unread emails from the “gScholarAlerts” label for the last 7 days, extracts publication details, and saves the summary to a CSV file. It also retrieves additional information from the DOI and URL of the publications. The script is designed to work with Google Scholar Alerts, which send notifications about new publications based on user-defined search queries.
The script uses the following libraries: - imaplib: For connecting to the Gmail IMAP server and fetching emails. - email: For parsing the email content. - pandas: For data manipulation and saving to CSV. - BeautifulSoup: For parsing HTML content. - requests: For making HTTP requests to fetch publication details from DOI and URL. - fitz (PyMuPDF): For extracting text from PDF files. The script is designed to be run as a standalone program, and it requires the following environment variables to be set: - GMAIL_USERNAME: The Gmail username (email address). - GMAIL_PASSWORD: The Gmail password (or app password if 2FA is enabled). It is recommended to use an app password for security reasons if 2FA is enabled on the Gmail account.
Functions
|
Extract publication details from the HTML content of the email. |
Extract publication details based on whether the URL is a PDF or a web page. |
|
|
Download a PDF from a URL and extract its text without saving it locally. |
|
Extract text from an HTML webpage. |
|
Fetch_unread_emails from Gmail with specific label and since certain days |
|
Get more information from DOI and URL. |
Retrieve publication details including abstract using DOI from the CrossRef API. |
|
|
Check if the URL points to a PDF file by examining headers. |
|
Parse fetched emails. |
|
Summarize publications from Google Scholar Alerts and save to CSV file. |