What is Open Syllabus?

Open Syllabus is a non-profit research organization that collects and analyzes millions of syllabi to support novel teaching and learning applications. We help instructors develop classes, libraries manage collections, and presses develop books. We support students and lifelong learners in their exploration of topics and fields. We help faculty understand the impact of their work on teaching and support efforts to adopt open textbooks. We make course transfer easier and help educators align teaching with workforce needs. We also challenge faculty and universities to work together to steward this important data resource.

Open Syllabus currently has a corpus of twenty-one million English-language syllabi from 140 countries. We use machine learning and other techniques to extract citations, dates, fields, and other metadata from these documents. The resulting data is made available through (currently) three online tools:

Analytics provides free and open exploration of most of the data, while reserving the most recent data and advanced capabilities for schools, publishers, and other institutional subscribers. You can try the 'full' Analytics by signing up for a free trial account.
The Coursematcher predicts 'course equivalance' across the catalogs of hundreds of schools as a way to support the course transfer process for students and college staff.
The Co-Assignment Galaxy is a massive plot of the top million titles in the Open Syllabus dataset, grouped by how often they are assigned together. It is the closest thing available to a unified representation of the project of higher education.

Open Syllabus was founded at The American Assembly, a public policy institute associated with Columbia University. It has been independent since 2019.

Tell me more. What is a syllabus?

For our purposes, a syllabus is a document that provides a detailed description of a class or class section beyond what one would find in a course catalog (we have also begun to collect catalog data). A wide variety of documents meet this critia, including reading lists and descriptions produced by curricular portals. The resulting archive is very heterogeneous. Not all syllabi contain all of the major elements. Around 50% of syllabi contain assigned readings. A bit over 50% list learning outcomes.

All of the syllabi in the current collection are English language documents –- including from universities where English is not the primary teaching language. Eventually we will create workflows for the large but currently hidden non-English portions of the collection.

We have no means at present of mapping syllabi back to class size or enrollment. A MOOC and a seminar are treated identically. Nor do we know how much coverage the collection provides overall or how representative it is (although it is now large enough to permit the construction of representative subsets using various criteria). Our best estimates point to around 6% the US curricular universe over the past several years.

How does Open Syllabus get its syllabi?

Primarily by crawling publicly-accessible university websites. We currently update the syllabus dataset twice per year. For a variety of reasons, collection size always lags the current year. Faculty contributions also make up a small but significant portion of the collection. More on sharing syllabi here.

The future of the collection depends on institutional contributions. Open Syllabus services do more for schools that contribute syllabi. As those services grow, the incentives for participation grow.

What are ranks, counts, and scores?

Citation counts-–how often titles are assigned across the collection–-appear throughout Analytics.

If a title appears on a syllabus, it ‘counts.’ If it appears 10 times on a syllabus, it counts only once. If it appears in ‘suggested reading’ or some other secondary list, it still counts. We track distinctions between primary and secondary reading, when available, but don't currently use this information in Analytics.

A title’s ‘Rank’ is calculated with respect to the 9.3 million unique titles identified in the collection. The most frequently taught title in version 2.11 of the dataset–-A Manual for Writers of Term Papers, Theses, and Dissertations–-is ranked #1.

A title’s ‘Score’ is another representation of rank converted to a 1-100 scale (using a dense ranking of appearance counts, converting to a percentile, and shaving off the decimal places). At present, the top twenty-eight titles have a score of 100, while low-count titles in the long tail of the distribution have scores of 1 or 2. The point of the 'score' is to provide a number that is easier to interpret than the raw rankings and counts.

These numbers contribute to an evolving discussion about the role of publication metrics in measuring faculty performance. To date, this conversation has focused on citation in research journals (aka, journal impact factor) as a basis for measuring and comparing research output. We think syllabus counts are a useful addition because they privilege types of work that are commonly underrepresented in journal citation, including more synthetic, accessible, and public-facing forms of work that often represent a large part of faculty writing. In short, we give faculty a way to gain recognition for scholarship with classroom applications. In the process, we create incentives to improve teaching materials.

How are counts calculated?

Analytics relies on a master catalog to identify titles within the syllabus collection. Currently, this catalog is a combination of The Library of Congress, Open Library, OpenAlex, and open access databases such as the Directory of Open Access Books and Open Textbook Library.

Analytics identifies citations by looking for citation-like patterns in the raw text of the syllabi. The title and author elements are then compared against catalog records. This process is accurate a bit over 90% of the time compared to human labeling. Some of the remaining 8-9% are cleaned through rule-based and hand-composed blacklists. The rest are missed.

Why don’t we show results for X?

There are many possible reasons but here are the most likely:

X is not assigned on the syllabi currently in the collection.
Citations of X don't conform to our matching model. Because we rely on 'Title - Author Last Name' as the identifier, we struggle with certain kinds of citations and categories of work. We won't find, for example, movies cited by title and date rather than title and director. Textbooks that change authors across editions can also be a problem. In addition, faculty citation practices on syllabi are often inconsistent.
X is not in the master bibliographical catalog that we use to identify titles. We have nearly 200 million records and it's still incomplete -- especially for but not limited to non-English-language titles and titles published outside the US and UK.
X was improperly merged with another title in building the master catalog. With several hundred million total records, some title/author combinations appear hundreds of times. The process of collapsing large numbers of variants and potential duplicates into single records is imperfect. Different titles can be merged into one record and, by the same token, variations on titles and author names can fragment a title across multiple records. Both are uncommon but it isn't hard to find examples when browsing Analytics.
The original catalog data for X is ambiguous or incorrect. This is common. Records sometimes fail to list all of the authors for a title, or list editors or translators in the author field, or have other erroneous information. Our catalog build process rejects most outlier mistakes, but some get past.

We’ve worked to minimize these problems but if you spend time with Analytics, you will see them.

What about date, location, and field information?

The dates of classes are obtained by analyzing the date strings that appear in the syllabus text or the source URL for the document. This process is around 90% accurate, which means that erroneous dates will appear with some frequency. Some schools, too, use date formats that we have difficulty parsing accurately. And some syllabi don't list dates.

Fields are challenging because there is a great deal of variation in how different educational systems and intellectual traditions divide human knowledge. Our classifiers identify 62 fields derived from the Department of Education’s Classification of Instructional Programs (CIP 2015). This process is not perfect -- especially with syllabi obtained from non-Anglophone universities.

Institutional attribution is based on a mixed analysis of URLs and e-mail strings found in the documents, which are then mapped to a combination of the Research Organization Registry (ROR) and IPEDS data. These methods resolve institutional location for around 94% of the syllabi in the collection. What does it tend to miss? Vocational schools outside the US.

What about people and publishers?

Unlike titles, schools, fields, and countries, authors and publishers do not have unique records in the Open Syllabus database. An author search simply returns hits on a particular name. These results can be nearly unique for people with rare names, but are predictably less reliable for common names. Additionally, our source catalogs often contain multiple versions of the same person’s name, and often duplicates based on the different citation conventions around initials. Our efforts to reconcile these variations are imperfect. Stable author identities remain one of the major challenges of library science. We can’t solve that problem, but we will try to adopt emerging solutions, such as ORCID (go on, get an ORCID ID).

Publisher data is similar to author data but introduces some unique issues. The quality of publisher data in the source catalogs is generally terrible, with no consistent representation of publisher names, ownership structures, or roles in publication. However, the finite number of publishers makes the data easier to clean. We aggressively clean much of it, making relatively complete publisher records possible in many cases (at the expense of some of the complexity embedded in the records).

We also list academic journals and media outlets in the publisher section.

In practice, a title record can be composed of dozens of underlying records, with different publishers, publication years, and even authors. Where we have multiple records for a single title, we show (1) the most frequent (i.e., modal) title and subtitle; (2) the modal author or authors; (3) the dominant publisher (i.e, representing 70% or more of the records); and (4) the earliest publication date from among all the records. The result is a useful indicative record, rather than a ‘true’ representation of the title and edition assigned.

Does Open Syllabus show syllabi?

Yes -- in anonymized, abbreviated form in Analytics and the Course Matcher for logged-in users. These views reproduce only the descriptive content of the syllabus: the description, learning outcomes, and assigned titles.

We do not display syllabi from countries or territories where the disclosure of teaching choices could put faculty at risk. Our list of excluded countries and territories is based on the Academic Freedom score in Freedom House country reports. Syllabi from countries with a score of two or lower (out of four) are not displayed.

How is Open Syllabus funded?

Open Syllabus has been supported by The Arcadia Fund, The ECMC Foundation, The Sloan Foundation, The Hewlett Foundation, and The Templeton Foundation. The project also received a Catalyst Grant from Digital Science in 2018.

Open Syllabus Analytics is part of a shift toward a subscription-based model of support, aimed at schools, publishers, and other educational service providers. We also license books rankings and other anonymized data to publishers.

Can I access the underlying data?

We provide limited, anonymized versions of the OS dataset under some circumstances for academic research. Research leads must be based at a college or university and be able to secure the support of their schools for a 'research use agreement.' If you'd like to inquire about access, write us at [email protected]. If we are slow to follow up, apologies. We receive more of these requests than we can manage.

What is Open Syllabus?

Tell me more. What is a syllabus?​

How does Open Syllabus get its syllabi?​

What are ranks, counts, and scores?​

How are counts calculated?​

Why don’t we show results for X?​

What about date, location, and field information?​

What about people and publishers?​

Does Open Syllabus show syllabi?​

How is Open Syllabus funded?​

Can I access the underlying data?​