Re-classifying knowledge: moving from BIC to Thema
Thu 08 Feb 2024
A little background
Why are we doing this? The subjects of the books and chapters in the OAPEN Library and the Directory of Open Access Books (DOAB) are described in two ways: through keywords and a classification code. Keywords are flexible, but also language-dependent and unstructured. In contrast, a classification is independent of language and highly structured.
When OAPEN was launched in 2010, we chose to use the BIC classification, which was used by many – mostly UK based – publishers. However, the BIC classification was deprecated in February 2024, and is succeeded by the Thema classification.
What does it mean for the OAPEN Library and DOAB?
For us, changing the classification is no simple task. There are several aspects to consider. Firstly, several data-related and technical changes must be made to our platforms. Secondly, we need to change our data ingestion. This has consequences for publishers and for us. And lastly, there is the actual converting of all the existing BIC classification codes to the new Thema classification codes.
Starting with the changes to be made to our platforms, we need to update our data model; the input forms for adding books and chapters; the ONIX metadata import; our metadata export feeds; the search API, and it also affects the OAI harvesting of books from Göttingen University Press, FWF and Knowledge Unlatched. Currently, in cooperation with our technical partner Atmire, this has been set up in a test environment.
What is true for a tango also applies to data ingestion: it takes two partners. On the one hand, publishers need to be aware that they should provide us with Thema classification codes – preferably starting from now. It’s great that many publishers are already doing this. On the other hand, we as the other partner have taken steps to automatically convert BIC to Thema. This is based on a conversion table we have developed internally, which you can download here.
The last part of the puzzle is the actual conversion from BIC to Thema in the data ingestion workflow. In OAPEN, there are currently over 32,000 books and chapters. In DOAB, the number of titles is now over 78,000. All of them have one or more classification codes. So, this is a huge task. To prepare for this, we are currently working on a conversion in our test environment. When we are certain we have evaluated this thoroughly, we can do it in our production environment.
When will we move to Thema?
At this moment, the test environment is all set up. The conversion of the existing BIC codes to Thema codes will take three to four weeks. This includes extensive testing and examination of the results. Once we are sure about the conversion in the test environment, we can apply the conversion in the OAPEN Library production (live) environment. This means that the conversion will take place in the month of March 2024. As March approaches, we will communicate the exact date and time.
Looking at the timeline, the conversion will take place after the deprecation of BIC. That is not a problem. Up until the change, we will make sure that the correct classification is added to the book, whether the publisher has provided us with BIC or Thema. For us, it is more important to do this conversion correctly even when it takes more time.
If you have questions, feel free to contact us ([email protected])! This conversion is a complex matter for everybody involved, and if there is any way we can help, we would gladly do that.