Graham Neubig neubig

## openscholar_summary.txt
OpenScholar is a retrieval-augmented language model that assists researchers in synthesizing scientific literature. The system uses a database of 45 million open-access papers to provide citation-backed responses to queries, accurately identifying relevant passages and generating reliable answers across multiple scientific domains. This approach addresses the growing challenge of keeping up with rapidly expanding scientific literature.

The researchers developed ScholarQABench, a multi-domain benchmark for evaluating literature search capabilities, with 2,967 expert-written queries and 208 detailed answers across computer science, physics, neuroscience, and biomedicine. In testing, OpenScholar-8B outperformed GPT-4o by 5% and PaperQA2 by 7% in correctness metrics, despite being a smaller, open model.

Citation accuracy stands as a key strength of OpenScholar. While GPT-4o shows concerning citation hallucination rates of 78-90%, OpenScholar matches human expert-level accuracy in citation verification. The syst

## dispatch_openai_requests.py
# NOTE:
# You can find an updated, more robust and feature-rich implementation
# in Zeno Build
# - Zeno Build: https://github.com/zeno-ml/zeno-build/
# - Implementation: https://github.com/zeno-ml/zeno-build/blob/main/zeno_build/models/providers/openai_utils.py

import openai
import asyncio
from typing import Any

## get_citations.py
import requests
import sys
import time

sleep_time = 20
def query_api(url, session):
  global sleep_time
  time.sleep(sleep_time / 1000.0)
  r = session.get(url)
  while r.status_code == 429:

## orid_to_s2_papers.py
import openreview
import argparse
import requests
import time
import sys
import csv
import json
from tqdm import tqdm  # Progress bar

# This is a utility script to get a CSV of papers from semantic scholar given OpenReview ids

## HelloAction.java
package edu.cmu.empty;

import com.intellij.openapi.actionSystem.AnAction;
import com.intellij.openapi.actionSystem.AnActionEvent;
import com.intellij.openapi.project.Project;
import com.intellij.openapi.ui.Messages;
import com.intellij.openapi.ui.popup.JBPopupFactory;
import com.intellij.openapi.ui.popup.ListPopup;
import com.intellij.openapi.ui.popup.PopupStep;
import com.intellij.openapi.ui.popup.util.BaseListPopupStep;

## identify_japanese_pronouns.py
import sys
import re
from collections import defaultdict

# This is a script to identify pronouns in Japanese
# It requires data segmented by KyTea (http://www.phontron.com/kytea/)
#
# If you have raw Japanese text (with no spaces), use this script like:
#  cat japanese.txt | kytea | python identify_japanese_pronouns.py > japanese_with_pronouns.txt
#

## best-paper-deadline.py
#### Script to calculate the best paper deadline based on the population on earth based on some not-completely-arbitrary assumptions
# by Graham Neubig

# Results are:
# UTC 8:00 deadline, utility is 1476.1150000000002
# UTC 9:00 deadline, utility is 1438.7800000000002
# UTC 14:00 deadline, utility is 1385.2949999999998
# UTC 15:00 deadline, utility is 1345.945
# UTC 13:00 deadline, utility is 1291.4950000000003
# UTC 7:00 deadline, utility is 1287.1649999999997

## petstory.yaml
openapi: "3.0.0"
info:
  version: 1.0.0
  title: Swagger Petstore
  license:
    name: MIT
servers:
  - url: http://petstore.swagger.io/v1
paths:
  /pets:

## dynet-tagger.py
"""
DyNet implementation of a sequence labeler (POS taggger).
This is a translation of this tagger in PyTorch: https://gist.github.com/hal3/8c170c4400576eb8d0a8bd94ab231232

Basic architecture:
 - take words
 - run though bidirectional GRU
 - predict labels one word at a time (left to right), using a recurrent neural network "decoder"
The decoder updates hidden state based on:
 - most recent word

## linalg-calcfunction.py
import numpy as np
import sys

################# Explanation ##################
# This is a function to calculate house prices h(x) = -40 + 0.25x
# The first term (-40) is the base price, and "x" is the number of square feet in the house
################################################

# Set up the function
my_function = np.array([-40, 0.25])
	OpenScholar is a retrieval-augmented language model that assists researchers in synthesizing scientific literature. The system uses a database of 45 million open-access papers to provide citation-backed responses to queries, accurately identifying relevant passages and generating reliable answers across multiple scientific domains. This approach addresses the growing challenge of keeping up with rapidly expanding scientific literature.

	The researchers developed ScholarQABench, a multi-domain benchmark for evaluating literature search capabilities, with 2,967 expert-written queries and 208 detailed answers across computer science, physics, neuroscience, and biomedicine. In testing, OpenScholar-8B outperformed GPT-4o by 5% and PaperQA2 by 7% in correctness metrics, despite being a smaller, open model.

	Citation accuracy stands as a key strength of OpenScholar. While GPT-4o shows concerning citation hallucination rates of 78-90%, OpenScholar matches human expert-level accuracy in citation verification. The syst
	# NOTE:
	# You can find an updated, more robust and feature-rich implementation
	# in Zeno Build
	# - Zeno Build: https://github.com/zeno-ml/zeno-build/
	# - Implementation: https://github.com/zeno-ml/zeno-build/blob/main/zeno_build/models/providers/openai_utils.py

	import openai
	import asyncio
	from typing import Any
	import requests
	import sys
	import time

	sleep_time = 20
	def query_api(url, session):
	global sleep_time
	time.sleep(sleep_time / 1000.0)
	r = session.get(url)
	while r.status_code == 429:
	import openreview
	import argparse
	import requests
	import time
	import sys
	import csv
	import json
	from tqdm import tqdm # Progress bar

	# This is a utility script to get a CSV of papers from semantic scholar given OpenReview ids
	package edu.cmu.empty;

	import com.intellij.openapi.actionSystem.AnAction;
	import com.intellij.openapi.actionSystem.AnActionEvent;
	import com.intellij.openapi.project.Project;
	import com.intellij.openapi.ui.Messages;
	import com.intellij.openapi.ui.popup.JBPopupFactory;
	import com.intellij.openapi.ui.popup.ListPopup;
	import com.intellij.openapi.ui.popup.PopupStep;
	import com.intellij.openapi.ui.popup.util.BaseListPopupStep;
	import sys
	import re
	from collections import defaultdict

	# This is a script to identify pronouns in Japanese
	# It requires data segmented by KyTea (http://www.phontron.com/kytea/)
	#
	# If you have raw Japanese text (with no spaces), use this script like:
	# cat japanese.txt \| kytea \| python identify_japanese_pronouns.py > japanese_with_pronouns.txt
	#
	#### Script to calculate the best paper deadline based on the population on earth based on some not-completely-arbitrary assumptions
	# by Graham Neubig

	# Results are:
	# UTC 8:00 deadline, utility is 1476.1150000000002
	# UTC 9:00 deadline, utility is 1438.7800000000002
	# UTC 14:00 deadline, utility is 1385.2949999999998
	# UTC 15:00 deadline, utility is 1345.945
	# UTC 13:00 deadline, utility is 1291.4950000000003
	# UTC 7:00 deadline, utility is 1287.1649999999997
	openapi: "3.0.0"
	info:
	version: 1.0.0
	title: Swagger Petstore
	license:
	name: MIT
	servers:
	- url: http://petstore.swagger.io/v1
	paths:
	/pets:
	"""
	DyNet implementation of a sequence labeler (POS taggger).
	This is a translation of this tagger in PyTorch: https://gist.github.com/hal3/8c170c4400576eb8d0a8bd94ab231232

	Basic architecture:
	- take words
	- run though bidirectional GRU
	- predict labels one word at a time (left to right), using a recurrent neural network "decoder"
	The decoder updates hidden state based on:
	- most recent word
	import numpy as np
	import sys

	################# Explanation ##################
	# This is a function to calculate house prices h(x) = -40 + 0.25x
	# The first term (-40) is the base price, and "x" is the number of square feet in the house
	################################################

	# Set up the function
	my_function = np.array([-40, 0.25])