JOSSCast #1: Eva Maxfield Brown on Speakerbox – Open Source Speaker Identification for Political Science

Subscribe Now: Apple, Spotify, YouTube, RSS

In the first episode of Open Source for Researchers, hosts Arfon and Abby sit down with Eva Maxfield Brown to discuss Speakerbox, an open source speaker identification tool.

Originally part of the Council Data Project, Speakerbox was used to train models to identify city council members speaking in transcripts, starting with cities like Seattle. Speakerbox can run on your laptop, making this a cost-effective solution for many civic hackers, citizen scientists, and now .. podcasters!

From the advantages of fine-tuning pre-trianed models for personalized speaker identification to the concept of few-shot learning, Eva walks us through her solution. Want to know how this open source project came to life? Tune in to hear about Eva’s journey with Speakerbox and publishing in JOSS!

Transcript

[00:00:00] Arfon: Welcome to Open Source for Researchers, a podcast showcasing open source software built by researchers for researchers.

My name is Arfon.

[00:00:11] Abby: And I’m Abby.

[00:00:12] Arfon: And we’re your hosts. Every week, we interview an author who’s published in the Journal of Open Source Software. This week, we’re going to be talking to Eva Maxfield Brown about their software, Speakerbox, few shot learning for speaker identification with transformers.

[00:00:27] Abby: Yeah. And I think this was a great interview to kick off with. Eva was really excited to talk about her work. And I thought it was very applicable for podcasting!

[00:00:35] Arfon: Absolutely. And, I don’t think I said it at the start, but this is our first ever episode, so we’re really excited to start with this really interesting piece of software.

[00:00:44] Abby: Awesome, I guess we can just dive right in.

[00:00:46] Arfon: Yeah, let’s do it.

Welcome to the podcast, Eva.

[00:00:49] Eva: Hi, how are you?

[00:00:50] Arfon: Great. Thanks for coming on.

[expand]

[00:00:52] Abby: Yeah, glad to have you in one of our early episodes.

But do you want to tell us a little bit about yourself just to kick us off?

[00:00:58] Eva: Yeah, sure. My name is Eva Maxfield Brown. I’m a PhD student at the University of Washington Information School. My research primarily focuses on the science of scientific software more generally, but I also like to believe that I practice building scientific software. So I have experience building software for computational biology or microscopy and also political and information science.

The project that we’ll be talking about today falls into that political and information science capacity of building not just studying.

[00:01:27] Arfon: Awesome.

Tell us about Speakerbox. Tell us about the project. This is a paper that was reviewed in JOSS about a year ago now, a little bit over, or initially submitted. I’m just curious, if you could tell us a little bit about the project, why you started it, and what kinds of problems you’re trying to solve with Speakerbox.

[00:01:43] Eva: Yeah. I’ll have to back up for a little bit. Speakerbox is part of a larger ecosystem of tools that fall under this project that we were calling Council Data Project. This was a system that basically was like, Oh, what’s really hard for political scientists is studying local government and one of the reasons why it’s hard to study local government is most of the stuff you have to do is qualitative.

It’s very hard to get transcripts. It’s very hard to get audio. It’s very hard to get all these different things in a standardized format for all these different councils. So, Council Data Project started a long time ago and it tried to lay the groundwork of getting all those transcripts and systematizing it in a way.

But one of the longest requested features of that project was being able to say that each sentence in a transcript was said by who, like this sentence was said by Council Member A and this sentence was said by Council Member B and so on. And so there’s a problem here, right?

We could spend the time and resources to build and train individual speaker identification models for each city council. But that’s a lot of time and resources that we don’t really have access to both as researchers, but also just as people interested in contributing to the local government area.

And so Speakerbox really came out of that problem of how do you quickly annotate and train individual speaker identification models for application across a transcript, across a timestamped transcript.

[00:03:00] Abby: Yeah, reading this paper I got really excited as a nascent podcaster. So I could see how this is immediately applicable because I have this audio files, and can we identify who’s speaking with what. Anyways, I could see how this could be useful.

[00:03:14] Arfon: Yeah, maybe we could, maybe we should try it

[00:03:17] Abby: Yeah, we can. On this episode.

[00:03:19] Eva: There was, I think there was one of the early GitHub issues that we got after publishing the paper was from someone who was recording their research lab’s meetings and basically wanting to train a model just to tag their own lab members. And I was like, I guess that’s a use case. Sure. That makes sense.

[00:03:35] Abby: We can train it to tag ourselves as hosts identify a mysterious third person each episode.

[00:03:41] Arfon: Yeah, there you go

I was going to say, so, the Council Data Project, could you say a bit more about who the users are, who the target audience here is? Is that like studying government and how councils work? Who is your audience?

Who would use Speakerbox?

[00:03:55] Eva: Yeah. So that’s a really good question. So I would say the larger ecosystem of Council Data Project would fall into political science, especially the subdomain of urban or city level, municipal level political scholarship. Then there’s also information science questions.

And I think there’s a lot of really interesting information science questions, such as how do ideas and misinformation and disinformation enter city councils and so on. So I think those are two very core audiences of that research, but there’s also the public side.

So there’s just general users, right? Council Data Projects comes with a website and searchable index that you can search through all the meetings that we’ve transcribed and everything. So we do have general citizen users. We also have people at the cities that we support like city employees as users as well. And journalists.

So there’s a couple of different people that are interested in it from the Council Data Project angle. But specifically for Speakerbox, as we just talked about I think there’s a number of cases where this is just useful. So that was one of the reasons why we split it out as its own thing . Certain tools for Council Data Project will likely be hard coded towards our infrastructure or towards our technology stack. But, Speakerbox is just this general toolbox of quickly how to train a speaker identification model has a broader application set.

So things like podcasting, things still in the domain of journalism or in the domain of political science, sure, but also in music or in anything else that kind of deals with this very fast audio labeling type of dilemma.

[00:05:21] Abby: Yeah, another thing that really struck me reading the paper and also just hearing you talk about the Council Data Project plus Speakerbox. There’s a lot of open source and open data tools and resources that you’re using and you’re drawing from.

Did that really affect why you’re doing this openly? Can you just tell me about the whole ecosystem?

[00:05:38] Eva: Yeah. that’s a good question. I think from the start we positioned. Council Data Project in the genre or domain of the prior, past era of open data and civic collaboration type of, angle. We originally envisioned that a different group would be able to deploy all of our tools and technology for their city council, wherever they are.

That would just reduce the cost on us. And so that was partly the reason why moving to the open source model was really nice. All the documentation can be available if they need to make an edit for their own city, whatever it is, they can contribute back. There are also existing projects prior to us, in the history of this civic information landscape that did this pattern. But I think we’ve shifted it to our own needs a little bit. That said, the fast pace of machine learning and natural language processing tools is just so quick that you could expect a new version of some package came out and there’s going to be, a breaking change.

And to be honest, at this point, I work on a lot of different projects and it’s really nice to push it open source just because I know that there are other people, whether they’re related to Council Data Project or not, that are also going to use it and need it and can contribute to helping fix and maintain and sustain the project itself.

[00:06:47] Arfon: So are there really obvious alternatives? I understand why you would want to publish this. There’s lots of opportunity for pushing things into the public domain but are there tools that other people were using previously that you’ve found efficient or are maybe using I don’t know, proprietary languages or something.

What were the motivations for publishing a new tool specifically?

[00:07:07] Eva: I think there definitely are proprietary technologies that allow you to do this. I think maybe Google is probably the most famous. I think they have a whole suite of give us your data set and we will try to train a model for it.

Other companies as well. I think Microsoft probably has their own version of that. Amazon as well. So there’s definitely proprietary kind of versions of this. I think what I particularly wanted to do was: One, this is something that, again, focusing on specifically our core user base, which is that civic technology type of person, they don’t typically have access to a lot of money to pay Google or to pay Amazon or whatever but they might have access to their laptop and that has a decent CPU, or they might have, they might even have a desktop that has a GPU or something.

And so I wanted to make it possible where it’s like we can, quickly annotate, like we can demonstrate exactly how to annotate this data and also demonstrate using consumer grade GPU. I think the GPU that I used in the example documentation for all of this is five, six years old or something at this point.

And show that it still works totally fine and it trains in under an hour. Everything is able to be done and let’s say a weekend of time. And I think that was really needed. I’ve already heard from other people, whether it’s, other research labs or other people trying to deploy all of our Council Data Project infrastructure that they are able to quickly iterate and train these models just on their own local machine.

[00:08:33] Arfon: Yeah, that’s really cool. You mentioned civic technology. I think I heard you say civic hacking space before. I think that the idea of publishing tools for communities that don’t have tons of resources and don’t necessarily, aren’t able to pay for services is really important.

And I was fortunate to be in Chicago when the civic, hacking scene was really starting up in the late

[00:08:54] Eva: Oh,

[00:08:55] Arfon: 2000, Yeah, about 2009 2010, and that was a really vibrant community, and I think still is, and there’s just lots

[00:09:01] Eva: Yeah. Chi Hack Night is

[00:09:03] Arfon: Chi Hack Night, there you go, yeah, lots of open, civic tech is, it’s a really cool, space, yeah, cool, okay.

Okay, so I can run this tool on, a relatively modest GPU, but what’s actually happening under the hood?

So there’s this transformer thing, there’s some learning that happens, what, does it look like to actually train Speakerbox for your project, for your dataset?

[00:09:27] Eva: Yeah. fortunately for us, Transformers is I think two things. So when people say the word Transformers, it’s, a couple of things all in one. There’s Transformers the architecture, which I think we can maybe shove aside and say it’s the kind of quote foundation of many modern contemporary machine learning and natural language processing models.

It’s very good at what it does and it works based off of trying to look for patterns across the entire sequence, right? So looking for whole sequences of data and saying where are these, things similar? So that’s great. But there’s also Transformers, the larger Python package and ecosystem, which is run by a company called Hugging Face.

And that’s, I think, where, at least now, honestly, maybe most people think of oh, I’m using Transformers. Specifically in my case, and in the Speakerbox case, we built it off of Transformers because there are tons of existing models that we can use to make this process of speaker identification or speaker classification much, much, easier.

Specifically there are models already pre trained and available on I think it’s called the Vox Fox something data set. I forget what the name is. But the general idea was that, people have already trained a transformer model to identify something like 1000 different people’s voices.

And to my understanding, they’re all celebrities, right? So you can imagine this. training process as, okay, given a bunch of different audio samples of celebrities voices, can we identify which celebrities voices these are? In our case, we don’t care about the celebrities, right? For many, if you want to train a model for your lab or you want to train it for a city council meeting or whatever it is you care about the people in your meeting.

And so the process there is called fine tuning, where really the main task is you’re taking off that last layer of the model and instead of saying, I want you to pick between one out of a thousand celebrities, I want you to pick out of one out of in this call, I want you to pick out of one of three people, right?

And by doing so, you can quickly give it some extra samples of data. So we can give it a couple, let’s say 20 samples of Arfon speaking. We can give it 20 samples of Abby speaking. We can give it 20 samples of me speaking. And hopefully just because of the existing pre training it, it’s learned the basics of how people speak and how people, converse.

And all we’re really doing is saying. Don’t care about the basics of how people speak and how people converse. Really focus on the qualities of this person’s voice and the qualities of this person’s voice, and so on. Hopefully that answers the question.

[00:11:50] Arfon: is that why, is that, those examples, is that what we mean by the few shot learning? So you give a few examples, effectively, of labeled This is Abby speaking, or this is Eva speaking. is that what we mean by the few shot?

[00:12:03] Eva: Exactly. In our case we’re doing a little bit of a hack. We say few shot learning, but it’s actually not really because it’s very common for few shot learning to literally be five examples or something less than 10. But really because audio samples can become really long, like I’m answering this question in a one minute phrase or a one minute response we can split up that audio into many, smaller chunks that act as potentially better smaller examples.

And so you may give us five examples, but in the process of actually training the model in Speakerbox, we might split that up into many smaller ones. So we’re kind of cheating, but,

[00:12:38] Arfon: Interesting. Thanks.

[00:12:40] Abby: Yeah, that’s awesome. And I do really like the, the Hugging Face platform, where there’s so many models that you can tweak. Are you sharing back to the Hugging Face platform at all? Or,

[00:12:50] Eva: I haven’t, I’m not. We have used some of the models trained from Speakerbox in our own work with Council Data Project specifically for the Seattle City Council, because that’s, my home council. And we’ve trained a model to identify city council members for Seattle, and we’ve applied it across a number, I think, almost 300 transcripts.

But that was just purely for analysis. We haven’t pushed that model up or anything, and we probably should. That should probably even be part of the process or part of the pipeline. Yeah, to come. I’m sure other people who may have used the system might have or something, but,

[00:13:20] Abby: you did publish in JOSS, the Journal of Open Source Software. Can you talk a bit about why you decided to publish there, and how the experience was?

[00:13:27] Eva: Yeah. So to be honest, I love JOSS. This is my second article in JOSS. We’ve also published one for the larger ecosystem of Council Data Projects as well. I think after the first time publishing with JOSS, I was just really, really, impressed with the reviewing process. More so than some other journals, right?

I think the requirements of having reviewers try to use your package that you’re pushing, saying this is ready for use, or this is useful in X domain. I think having reviewers try to use it is such a It’s such a good thing to do simply because one, like we know that it’s available, we know it’s documented, but also we, know that it works and I think I’ve just been frustrated by the number of times I’ve seen a paper on arxiv or something that links to their code, but there’s no real documentation or something.

And so, to me, it’s like I don’t know, I, just, one, like to keep supporting what JOSS does because I really appreciate, I’ve, read a number of papers from JOSS where I’m like, that tool seems great, I’m going to go use it, right? Yeah, I don’t know, it just, it seems like the default place in my mind now, where if I want, to come off as like a trusted developer of software or a trusted developer of research software, that’s the place to go.

[00:14:35] Arfon: Yeah, and it’s funny you say having people actually use your software. This is one of the founding questions that a few of us were asking, which was, it was possible to publish a paper about software in some disciplines, but the question I was asking people on Twitter at the time, I’m not really very active there now, but was like, when you have managed to publish a paper about software, did anybody look at the software?

And people were like, nope. Universally, nobody ever said yes. And I was like, huh, that seems really weird to me, that you would publish software, but nobody would look at the software, that seems. I’m glad that resonates for you and provides value.

[00:15:17] Eva: I was gonna, I was gonna add, I think the open reviewer process is very, very, nice. Because at least when I was writing the first paper that we wrote for JOSS, I could go and look at, other review processes and say, okay, what are they expecting? What might we need to change ahead of time, etc. And I could just, prepare myself, especially as like a, now a PhD student, sometimes publishing is scary and being able to look at a review process drastically lowers that concern or that worry as well.

[00:15:47] Arfon: Yeah, definitely. I mean, it’s good and bad with open review. It’s good for transparency and people are generally very well behaved. It’s bad because it’s, I think, amongst other things, intimidating is probably, it could be exclusionary to some folks who don’t feel like they could participate in that kind of review.

I’m, but on balance, I feel like it’s the right decision for this journal. I definitely think it’s an important part of what JOSS does.

I was going to ask if there was anything it sounds like you’d eyeballed and seen some reviews before you’d submitted, but was there anything particularly important about the review for Speakerbox that was a particularly good contribution from the reviewers or anything that was surprising, , anything particularly memorable from that

review?

[00:16:33] Eva: Speakerbox , because it’s designed to act in a workflow manner, it’s not so much, a library of here’s a single function or here’s like a nice function, two or three functions to use for your processing. It’s very much designed as a workflow, right?

Where you say, I have some set of data. I need to annotate that data first and then I need to, prep it for training. And then I need to finally train the model. And I might need to evaluate as well. And because it’s laid out in that workflow kind of style. The reviewers were having trouble from a normal package where you say, I’m just going to go to the readme, look at a quick start, and use it.

And I think that, honestly, the biggest piece was just, the reviewers just kept saying, I’m a little confused as to how to actually use this. And ultimately I very quickly made a video demonstrating exactly how to use it and I think that was the best contribution was just going back and forth and saying okay, I see your confusion here.

I see your confusion here. And ultimately leading me to just be like, okay, I’m going to take everything that they just said, make a video and also take any of the things that I did in the video. And add them to the readme, right?

By literally forcing myself to do it on a toy data set, I can add all the little extra bits of places that were possibly confusing for them as well. Using my own data versus using some random other data, there’s always going to be, things that you forget to add to the readme or whatever it is.

[00:17:46] Abby: That’s awesome. And one thing I really, I do like about the JOSS Review System is just how it makes, it just adds so many best practices for open source and makes it so much easier for others to, to run it and to contribute. So is Speakerbox open for a contribution if people wanted to jump in? Can they?

[00:18:02] Eva: Yeah, totally. Speakerbox. Yes, I think there are some existing minor, little bugs, I think available. There’s also probably tons of optimizations or just UX improvements that could definitely happen. Yes, totally open for contribution, totally happy to review PRs or respond to bugs and stuff.

[00:18:20] Abby: That’s great, especially since you published this a while ago, that you’re still being responsive there on GitHub. But what skills do people need if they want to jump in?

[00:18:28] Eva: I think the primary thing, there’s maybe two areas, one is, I think a very good background in let’s say data processing and machine learning in Python. Like those two things together as a single trait might be nice. Or on the opposite end of things, if you just want to try and train your own, like just use the system as is. You might experience bugs or something and just making a documentation post, right? like say "I experienced this bug, here’s how I got around it". Or, posting a GitHub issue and saying, there’s this current bug for someone else to gather. I think truly trying to fix stuff if you have a machine learning background or just trying to build a model are both totally welcome things.

[00:19:06] Abby: That’s awesome. Yeah, and I love how you’re seeing like using the model as a contribution, and just like documenting what you run into.

[00:19:12] Arfon: So I was gonna I’m gonna put my product manager hat on for a minute and ask you what questions should we have asked you today that we haven’t done yet?

[00:19:21] Eva: Woo.

[00:19:21] Arfon: Okay if there isn’t a question, but it’s the magic wand question for, what, could this product do if we if it could do anything? What should we have asked you as hosts that we didn’t, that we didn’t check in with you about today?

[00:19:35] Eva: I think it’s not so much a question, maybe what’s like the feature that you dream of,

[00:19:39] Arfon: Yeah,

[00:19:40] Eva: if, it’s true, if it’s truly like a product manager hat on then I think the feature we’ve previously talked a lot about okay, the, workflow itself is pretty simple and there are pretty good packages nowadays for shipping a webpage via Python. And it’d be very nice to just have the entire workflow as a user, like as a nice user experience process, like you you launch the server on your terminal and then you can do everything in the webpage especially for the users that aren’t as comfortable in the terminal, right?

That would be very, very, nice. And something that I just never got around to. it’s not my priority. But something that we’ve thought about as it would definitely reduce the barrier of use or the barrier of effort or whatever.

[00:20:21] Abby: Yeah. Oh, that’s really cool. Especially in the civic tech space, I can see that being like really game changing for a lot of cities.

[00:20:28] Arfon: Yeah.

[00:20:28] Eva: Yeah. I think civics and journalists as well, I think. But yeah.

[00:20:32] Arfon: Yeah. Very cool. Cool.

Hey, Eva, this has been a great, conversation. Thanks for telling us about Speakerbox. Just to close this out, I was curious how can people follow your work, keep up to date with what you’re working on today? Is there any place you would want people to go to keep track of what you’re working on?

[00:20:49] Eva: Yeah my website is evamaxfield.github.io. I will occasionally post updates on papers and stuff there. I think Twitter might be the best place you can find me on Twitter at @EvaMaxfieldB as well. So

[00:21:02] Arfon: Fantastic. thanks for being part of the podcast. It’s been really fun to learn about your software and thanks for your time today.

[00:21:09] Eva: yeah, thanks for having me.

[00:21:10] Abby: Thank you so much for listening to Open Source for Researchers. We love to showcase open source software built by researchers for researchers, so you can hear more by subscribing in your favorite podcast app. Open Source for Researchers is produced and hosted by Arfon Smith and me, Abby Cabunoc Mayes, edited by Abby, and the music is CC-BY Boxcat Games.

[/expand]

Video

Introducing JOSSCast: Open Source for Researchers 🎉

Subscribe Now: Apple, Spotify, YouTube, RSS

We’re thrilled to announce the launch of “JOSSCast: Open Source for Researchers” - a podcast exploring new ways open source can accelerate your work. Hosted by Arfon Smith and Abby Cabunoc Mayes, each episode features an interview with different authors of published papers in JOSS.

There are 3 episodes available for you to listen to today! This includes “#1: Eva Maxfield Brown on Speakerbox – Open Source Speaker Identification for Political Science” and “#2: Astronomy in the Open – Dr. Taylor James Bell on Eureka!” along with a special episode #0 with hosts Arfon and Abby.

Tune in to learn about the latest developments in research software engineering, open science, and how they’re changing the way research is conducted.

New episodes every other Thursday.

Subscribe Now: Apple, Spotify, YouTube, RSS

Call for editors

Once again, we’re looking to grow our editorial team at JOSS!

Since our launch in May 2016, our existing editorial team has handled nearly 3000 submissions (2182 published at the time of writing, 265 under review) and the demand from the community continues to be strong. JOSS now consistently publishes a little over one paper per day, and we see no sign of this demand dropping.

New editors at JOSS are asked to make a minimum 1-year commitment, with additional years possible by mutual consent. As some of our existing editorial team are reaching the end of their term with JOSS, the time is right to bring on another cohort of editors.

Background on JOSS

If you think you might be interested, take a look at our editorial guide, which describes the editorial workflow at JOSS, and also some of the reviews for recently accepted papers. Between these two, you should be able to get a good overview of what editing for JOSS looks like.

Further background about JOSS can be found in our PeerJ CS paper, which summarizes our first year, and our Editor-in-Chief’s original blog post, which announced the journal and describes some of our core motivations for starting the journal.

More recently we’ve also written in detail about the costs related with running JOSS, scaling our editorial processes, and talked about the collaborative peer review that JOSS promotes.

How to apply

Firstly, we especially welcome applications from prospective editors who will contribute to the diversity (e.g., ethnic, gender, disciplinary, and geographical) of our board.

✨✨✨ If you’re interested in applying please fill in this short form by 6th November 2023. ✨✨✨

Who can apply

Applicants should have significant experience in open source software and software engineering, as well as expertise in at least one subject/disciplinary area. Prior experience with peer-review and open science practices are also beneficial.

We are seeking new editors across all research disciplines. The JOSS editorial team has a diverse background and there is no requirement for JOSS editors to be working in academia.

Selection process

The JOSS editorial team will review your applications and make their recommendations. Highly-ranked candidates will then have a short (~30 minute) phone call/video conference interview with the editor(s)-in-chief. Successful candidates will then join the JOSS editorial team for a probational period of 3 months before becoming a full member of the editorial team. You will get an onboarding “buddy” from the experienced editors to help you out during that time.

Dan Foreman-Mackey, Olivia Guest, Daniel S. Katz, Kevin M. Moerman, Kyle E. Niemeyer, Arfon M. Smith, Kristen Thyng

JOSS publishes 2000th paper

Arfon M. Smith

This week JOSS reached a big milestone – publishing our 2000th paper! It also happens to be our 7th birthday, and we thought we’d take this opportunity to review our submission stats from the last few years, discuss some of the changes to JOSS we’ve made of late, and reflect on some of the challenges we have faced as a journal.

Submission summary

Everything discussed here is derived from the amazing work of one of our editors (thanks Charlotte!) who created our submission analytics page which is built nightly based on data from the JOSS API. If you want to dig more into this analysis, the source code is available for you to do so.

High level publication stats

2000 papers over 7 years means, on average, we’ve published 285 papers per year. An average isn’t quite right though of course, with our throughput being substantially lower in the early days.

Year 1 (May 2016 – May 2017): 57
Year 2 (May 2017 – May 2018): 138
Year 3 (May 2018 – May 2019): 254
Year 4 (May 2019 – May 2020): 345
Year 5 (May 2020 – May 2021): 338
Year 6 (May 2021 – May 2022): 369
Year 7 (May 2022 – May 2023): 366

Note that JOSS closed for submissions between March 2020 and May 2020 due to the COVID-19 pandemic. This likely accounts for the drop in publications in year 5 (May 2020 – May 2021). Looking at the high-level breakdown, it seems like we’ve reached some kind of plateau of around one paper published per day. If we were a business looking to grow revenue this lack of year over year growth might be of concern. However, as a volunteer-run journal, this is OK with most of us :-)

Submission scope and rejections

In July 2020 we introduced a test for ‘substantial scholarly effort’ for all new submissions. You can read more about the motivations for this in our blog post but this clearly had an effect on our rejection rate both in the pre-review stage, and during/post review.

Screenshot_2023-05-03_at_11_27_29

We now reject between 15-30% papers at the pre-review stage, and around 5% during the actual review process itself (note this is in line with our goal to provide a constructive review process, improving submissions such that they can be accepted).

Authorship

JOSS is no longer dominated by single author submissions. In fact, we are seeing some evidence of more authors per submission with the fraction of submissions with more than 5 authors now approaching 25%.

Screenshot 2023-05-03 at 11 28 53

Authors

Citations

More than 200 papers have been cited > 20 times. About a third have never been cited. A few papers have been cited a lot: Welcome to the Tidyverse currently has nearly 6000 citations.

Papers in more than 6000 different venues have cited a JOSS paper, with the most common being bioRxiv, JOSS, Scientific Reports, Monthly Notices of the Royal Astronomical Society, and The Astrophysical Journal.

Editor and Reviewer statistics

2093 unique individuals have contributed reviews for the 2000 papers published in JOSS, including 8 amazing individuals who have contributed more than 10 reviews each!

JOSS currently has 77 editors, 6 of whom are track Editors-in-Chief, and one Editor-inChief. 112 editors have served in total on the editorial board.

Unfortunately, our reviews are getting slower. We’re not really sure why, but this has been a noticeable change from our earlier days. Our average time now spent in review is approaching four months, whereas pre-COVID it was under three.

Software statistics

JOSS reviews are primarily about the software, and so it would be remiss of us not to talk about that. Python is still the #1 language for JOSS submissions, used in part for well over half of published papers (~1200 out of 2000). R is #2 at 445 submissions, and C++ #3 (although of course C++ and C may be used together with another language).

Anecdotally we’re seeing an increase in the number of authors submitting to JOSS to declare their submission ‘ready for use’ or declaring as a v1.0 release. This is also potentially supported by the peak close to 0 days between the repository creation date and the submission to JOSS.

Screenshot 2023-05-04 at 10 29 00

The MIT, GPL (v3), and BSD 3-Clause licences are still used for the majority of submissions.

Screenshot 2023-05-03 at 11 42 01

Investments in tooling and infrastructure

JOSS has benefited from the kind support of the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, and a number of small development grants from NumFOCUS. This support has enabled JOSS to invest further in our infrastructure and tooling, highlights of which we describe below:

Editorialbot for all of your GitHub-based review needs

Our former bot Whedon has now become Editorialbot, which is much more than a rename. Editorialbot is now its own open source framework that can be used to manage review-like interactions on GitHub (i.e., not just publishing workflows). Currently, Editorialbot is used by rOpenSci, JOSS, SciPy Proceedings, with more coming soon. Thanks to Juanjo Bazán for all of his great work here!

As part of this work we’ve also extracted all of the legacy capabilities from the Whedon codebase to run as a series of GitHub Actions workflows.

Investments in our Pandoc-based publishing pipeline

JOSS has always used Pandoc to produce our PDF papers and Crossref XML metadata, but the way we used it was… hacky. Over the past few years we’ve been fortunate to work directly with one of the Pandoc core team (Albert Krewinkel) to implement a number of improvements to how we use Pandoc within the JOSS publishing system, and also to contribute new capabilities to Pandoc itself, including improved ConTeXt support, support for JATS outputs, and more sophisticated handling of author names in the Pandoc frontmatter.

Editorial tooling improvements

Last but not least, we’ve also made a bunch of small (and not so small) changes to the way that we handle submissions as an editorial team, including implementing tracks (and appointing a track EiC for each) and a reviewer management tool for searching for appropriate reviewers and tracking reviewer assignments.

Thank you!

Over these seven years of operations, JOSS simply wouldn’t work without the dedicated volunteer editors and reviewers (and of course the authors submitting their work).

References

Höllig J, Kulbach C, Thoma S. TSInterpret: A Python Package for the Interpretability of Time Series Classification. Journal of Open Source Software. 2023;8(85):5220. doi:10.21105/joss.05220

Smith AM. Minimum publishable unit. Published online July 7, 2020. doi:10.59349/dfy0f-3y061

Wickham H, Averick M, Bryan J, et al. Welcome to the Tidyverse. Journal of Open Source Software. 2019;4(43):1686. doi:10.21105/joss.01686

Call for editors

Dan Foreman-Mackey, Olivia Guest, Daniel S. Katz, Kevin M. Moerman, Kyle Niemeyer, Arfon M. Smith, George K. Thiruvathukal, Kristen Thyng

Once again, we’re looking to grow our editorial team at JOSS!

Since our launch in May 2016, our existing editorial team has handled over 2000 submissions (1838 published at the time of writing, 215 under review) and the demand from the community continues to be strong. JOSS now consistently publishes a little over one paper per day, and we see no sign of this demand dropping.

New editors at JOSS are asked to make a minimum 1-year commitment, with additional years possible by mutual consent. As some of our existing editorial team are reaching the end of their term with JOSS, the time is right to bring on another cohort of editors.

Background on JOSS

If you think you might be interested, take a look at our editorial guide, which describes the editorial workflow at JOSS, and also some of the reviews for recently accepted papers. Between these two, you should be able to get a good overview of what editing for JOSS looks like.

Further background about JOSS can be found in our PeerJ CS paper, which summarizes our first year, and our Editor-in-Chief’s original blog post, which announced the journal and describes some of our core motivations for starting the journal.

More recently we’ve also written in detail about the costs related with running JOSS, scaling our editorial processes, and talked about the collaborative peer review that JOSS promotes.

How to apply

Firstly, we especially welcome applications from prospective editors who will contribute to the diversity (ethnic, gender, disciplinary, and geographical) of our board.

✨✨✨ If you’re interested in applying please fill in this short form by 22nd December 2022. ✨✨✨

Who can apply

We welcome applications from potential editors with significant experience in one or more of the following areas: open source software, open science, software engineering, peer-review.

The JOSS editorial team has a diverse background and there is no requirement for JOSS editors to be working in academia. Unfortunately individuals enrolled in a PhD program are not eligible to serve on the JOSS editorial team.

Selection process

The JOSS editorial team will review your applications and make their recommendations. Highly-ranked candidates will then have a short (~30 minute) phone call/video conference interview with the editor(s)-in-chief. Successful candidates will then join the JOSS editorial team for a probational period of 3 months before becoming a full member of the editorial team. You will get an onboarding “buddy” from the experienced editors to help you out during that time.

References

Smith AM, Niemeyer KE, Katz DS, et al. Journal of Open Source Software (JOSS): design and first-year review. PeerJ Computer Science. 2018;4:e147. doi:10.7717/peerj-cs.147

Katz DS, Barba LA, Niemeyer K, Smith AM. Cost models for running an online open journal. Published online June 4, 2019. doi:10.59349/g4fz2-1cr36

Katz DS, Barba LA, Niemeyer K, Smith AM. Scaling the Journal of Open Source Software (JOSS). Published online July 8, 2019. doi:10.59349/gsrcb-qsd74

Call for editors: Astronomy & Astrophysics

Dan Foreman-Mackey

JOSS is continuing to grow, and we are looking to add more editors with expertise in the area of astronomy & astrophysics.

Since our launch in May 2016, our existing editorial team has handled nearly 1900 submissions (1684 published at the time of writing, 205 under review) and the demand from the community continues to grow. In particular, we have seen an increase in the number of astronomy & astrophysics submissions, beyond the capacity of our current editorial team.

Editors at JOSS make a minimum 1-year commitment, with additional years possible by mutual consent. With some of our existing editorial team reaching the end of their terms with JOSS and this increase in submissions, the time is right to bring new editors on board.

Background on JOSS

If you think you might be interested, take a look at our editorial guide, which describes the editorial workflow at JOSS, and also some of the reviews for recently accepted papers. Between these two, you should be able to get a good overview of what editing for JOSS looks like.

Further background about JOSS can be found in our PeerJ CS paper, which summarizes our first year, and our Editor-in-Chief’s original blog post, which announced the journal and describes some of our core motivations for starting the journal.

More recently we’ve also written in detail about our commitment to the Principles of Open Scholarly Infrastructure the costs related with running JOSS, scaling our editorial processes, and talked about the collaborative peer review that JOSS promotes.

Of specific interest to this call, we also have a collaboration with the American Astronomical Society (AAS) Journals to provide a parallel review for submissions with a significant software component.

Who can apply

We welcome applications from potential editors with research experience in astronomy and astrophysics, including but not limited to, open source software development for astrophysical simulations, data reduction, or statistical methods.

Members of the JOSS editorial team have diverse backgrounds, and we welcome JOSS editors from academia, government, and industry. We especially welcome applications from prospective editors who will contribute to the diversity (ethnic, gender, disciplinary, and geographical) of our board. We also value having a range of junior and senior editors.

How to apply

✨✨✨ To apply please fill in this short form by 31 July 2022. ✨✨✨

Selection process

The JOSS editorial team will review your applications and make recommendations. Highly-ranked candidates will then have a short (~30 minute) phone call/video conference interview with a current editor. Successful candidates will then join the JOSS editorial team for a probationary period of 3 months before becoming a full member of the editorial team. You will get an onboarding “buddy” from the experienced editors to help you out during that time.

References

Smith A. Announcing The Journal of Open Source Software - Arfon Smith. Published online May 5, 2016. Accessed July 12, 2022. https://www.arfon.org/announcing-the-journal-of-open-source-software

Smith AM, Niemeyer KE, Katz DS, et al. Journal of Open Source Software (JOSS): design and first-year review. PeerJ Computer Science. 2018;4:e147. doi:10.7717/peerj-cs.147

Katz DS, Smith AM, Niemeyer K, Huff K, Barba LA. JOSS’s Commitment to the Principles of Open Scholarly Infrastructure. Published online February 14, 2021. doi:10.59349/m5h23-pjs71

Katz DS, Barba LA, Niemeyer K, Smith AM. Cost models for running an online open journal. Published online June 4, 2019. doi:10.59349/g4fz2-1cr36

Katz DS, Barba LA, Niemeyer K, Smith AM. Scaling the Journal of Open Source Software (JOSS). Published online July 8, 2019. doi:10.59349/gsrcb-qsd74

Smith AM. A new collaboration with AAS publishing. Published online December 19, 2018. doi:10.59349/wj1gg-tsg49

Call for editors

Arfon M. Smith

JOSS is continuing to grow, and we are looking to add more editors again. We’re especially interested in recruiting editors with expertise in bioinformatics, neuroinformatics/neuroimaging, material science, ecology, machine learning & data science, and the social sciences.

Since our launch in May 2016, our existing editorial team has handled over 1800 submissions (1200 published at the time of writing, 170 under review) and the demand from the community continues to grow. The last three months have been our busiest yet, with JOSS publishing more than one paper per day, and we see no sign of this demand dropping.

Editors at JOSS make a minimum 1-year commitment, with additional years possible by mutual consent. With some of our existing editorial team reaching the end of their terms with JOSS and this increase in submissions, the time is right to bring on another cohort of editors.

Editing for a journal during a pandemic

After a pause on submissions in early 2020, JOSS has been open for submissions during most of the pandemic. We recognize that making time for volunteer commitments such as JOSS is especially challenging at this time and are taking steps to reduce the load on authors, editors, and reviewers and continually striving to find the right balance between accommodating the very real challenges many of us now face in our daily lives, and providing a service to the research software community.

Editing for JOSS is a regular task, but not one that takes a huge amount of time. JOSS editors are most effective if they are able to check in on their submissions a couple of times per week. Our goal is that a JOSS editor handles about three submissions at any one time making for about 25 submissions per year.

Background on JOSS

If you think you might be interested, take a look at our editorial guide, which describes the editorial workflow at JOSS, and also some of the reviews for recently accepted papers. Between these two, you should be able to get a good overview of what editing for JOSS looks like.

Further background about JOSS can be found in our PeerJ CS paper, which summarizes our first year, and our Editor-in-Chief’s original blog post, which announced the journal and describes some of our core motivations for starting the journal.

More recently we’ve also written in detail about our commitment to the Principles of Open Scholarly Infrastructure the costs related with running JOSS, scaling our editorial processes, and talked about the collaborative peer review that JOSS promotes.

Who can apply

We welcome applications from potential editors with significant experience in one or more of the following areas: open source software, open science, software engineering, peer-review, noting again that editors with expertise in bioinformatics, neuroinformatics/neuroimaging, material science, ecology, machine learning & data science, and the social sciences are most needed.

Members of the JOSS editorial team have diverse backgrounds and we welcome JOSS editors from academia, government, and industry. We especially welcome applications from prospective editors who will contribute to the diversity (ethnic, gender, disciplinary, and geographical) of our board. We also value having a range of junior and senior editors.

How to apply

✨✨✨ To apply please fill in this short form by 23 April 2021. ✨✨✨

Selection process

The JOSS editorial team will review your applications and make recommendations. Highly-ranked candidates will then have a short (~30 minute) phone call/video conference interview with the editor(s)-in-chief. Successful candidates will then join the JOSS editorial team for a probationary period of 3 months before becoming a full member of the editorial team. You will get an onboarding “buddy” from the experienced editors to help you out during that time.

Thanks to our editors who are stepping down

A few of our editors are completing terms and stepping down from editorial duties at JOSS. Lorena A Barba (@labarba), Kathryn Huff (@katyhuff), Karthik Ram (@karthik), and Bruce E. Wilson (@usethedata) have been amazing editors to have on the team and we will miss them very much!

References

Smith A. Announcing The Journal of Open Source Software - Arfon Smith. Published online May 5, 2016. Accessed July 12, 2022. https://www.arfon.org/announcing-the-journal-of-open-source-software

Smith AM. Reopening JOSS. Published online May 18, 2020. doi:10.59349/4tz9w-yq369

Smith AM, Niemeyer KE, Katz DS, et al. Journal of Open Source Software (JOSS): design and first-year review. PeerJ Computer Science. 2018;4:e147. doi:10.7717/peerj-cs.147

Katz DS, Smith AM, Niemeyer K, Huff K, Barba LA. JOSS’s Commitment to the Principles of Open Scholarly Infrastructure. Published online February 14, 2021. doi:10.59349/m5h23-pjs71

Katz DS, Barba LA, Niemeyer K, Smith AM. Cost models for running an online open journal. Published online June 4, 2019. doi:10.59349/g4fz2-1cr36

Katz DS, Barba LA, Niemeyer K, Smith AM. Scaling the Journal of Open Source Software (JOSS). Published online July 8, 2019. doi:10.59349/gsrcb-qsd74

JOSS's Commitment to the Principles of Open Scholarly Infrastructure

Daniel S. Katz, Arfon M. Smith, Kyle Niemeyer, Kathryn Huff, Lorena A. Barba

The Journal of Open Source Software (JOSS) is committed to the Principles of Open Scholarly Infrastructure, and here we summarize our status in doing so, followed by a more detailed discussion of how we do so, as well as explaining some when we do not, and some work in progress.

This document was assembled by Daniel S. Katz, Arfon Smith, Kyle E. Niemeyer, Kathryn D. Huff, and Lorena A. Barba, and reviewed and approved by the active JOSS editorial board and topic editors, and the Open Journals Steering Council.

Summary

Governance
💛 Coverage across the research enterprise
💛 Stakeholder Governed
💚 Non-discriminatory membership
💚 Transparent operations
💚 Cannot lobby
💛 Living will
💚 Formal incentives to fulfil mission & wind-down

Sustainability
💚 Time-limited funds are used only for time-limited activities
💛 Goal to generate surplus
💛 Goal to create contingency fund to support operations for 12 months
💚 Mission-consistent revenue generation
💚 Revenue based on services, not data

Insurance
💚 Open source
💚 Open data (within constraints of privacy laws)
💚 Available data (within constraints of privacy laws)
💚 Patent non-assertion

(💚 = good, 💛 = less good)

Discussion

Governance

💛 Coverage across the research enterprise

  • It is increasingly clear that research transcends disciplines, geography, institutions and stakeholders. The infrastructure that supports it needs to do the same.

Research software is essential to all types of research, and in response, JOSS’s coverage includes research software in any discipline, from any place, and from any institution.

The scope of what we publish is only limited by a few guidelines. JOSS publications must be research software of sufficient scholarly effort, which means that some software essential to research is excluded because it is either not research software, (e.g., a C compiler) or too small (e.g., a few hundred lines of Python that implement an existing tool or that provides a wrapper to access a fine-grained data source.)

JOSS strives for broad coverage of research disciplines in its editorial board, to better serve a broad community of authors.

💛 Stakeholder Governed

  • A board-governed organisation drawn from the stakeholder community builds more confidence that the organisation will take decisions driven by community consensus and consideration of different interests.

Open Journals is fiscally sponsored by NumFOCUS, and has a documented governance structure. The steering council, being a small group, is limited in its representation, in terms of geographic, ethnic, gender, and organizational diversity. The editorial board members mostly represent North America and Europe, are mostly white, are mostly male, and are mostly hands-on researchers, primarily from universities and national laboratories.

💚 Non-discriminatory membership

  • We see the best option as an “opt-in” approach with a principle of non-discrimination where any stakeholder group may express an interest and should be welcome. The process of representation in day to day governance must also be inclusive with governance that reflects the demographics of the membership.

Additions to the editorial board, which is the first layer of governance, are made via selections from responses to open calls and are non-discriminatory.

💚 Transparent operations

  • Achieving trust in the selection of representatives to governance groups will be best achieved through transparent processes and operations in general (within the constraints of privacy laws).

JOSS is publicly transparent, much more so than most other journals. Few issues are not publicly open, but these are generally open to all editors, including initial discussions about potential changes to the journal and discussions about the scope of submissions that may or may not be accepted for review. Even though some of these discussions may not occur in the open, we do publish the decisions themselves, along with an explanation.

💚 Cannot lobby

  • The community, not infrastructure organisations, should collectively drive regulatory change. An infrastructure organisation’s role is to provide a base for others to work on and should depend on its community to support the creation of a legislative environment that affects it.

The vast majority of time and effort used at JOSS is operational, with improvements of JOSS next, and finally some publicity about JOSS, which could be considered lobbying. However, this lobbying is mostly done by the JOSS editors because they think it is important to them, not to JOSS. The fact that they are also JOSS editors is a consequence of their feelings about the importance of recognizing contributions to research software, which also leads them to talk about this with others.

💛 Living will

  • A powerful way to create trust is to publicly describe a plan addressing the condition under which an organisation would be wound down, how this would happen, and how any ongoing assets could be archived and preserved when passed to a successor organisation. Any such organisation would need to honour this same set of principles.

As discussed below, there are circumstances in which we would consider the mission of JOSS fulfilled and the journal no-longer necessary. While we have not documented a plan for winding down JOSS, we believe that the core assets associated with the journal (software, article metadata, papers) are appropriately preserved as part of our ongoing operations. The articles published in JOSS are persistently archived such that an end of the journal will not affect the scholarly record.

💚 Formal incentives to fulfil mission & wind-down

  • Infrastructures exist for a specific purpose and that purpose can be radically simplified or even rendered unnecessary by technological or social change. If it is possible the organisation (and staff) should have direct incentives to deliver on the mission and wind down.

JOSS views itself as a temporary solution to provide a means for software developers and maintainers to receive credit for their work, and to have this work (research software) improved by the process of open peer review. We look forward to a time when software papers are not needed, when software is directly recognized and cited, and when software peer review (potentially using a future version of our criteria and processes) is more widespread. JOSS is volunteer-run as a service to the community, and most of the volunteers will be happy when a solution like JOSS is no longer needed, because software has found a more direct avenue to be valued and counted in the scholarly record.

Sustainability

💚 Time-limited funds are used only for time-limited activities

  • Day to day operations should be supported by day to day sustainable revenue sources. Grant dependency for funding operations makes them fragile and more easily distracted from building core infrastructure.

JOSS does not depend on grants for regular operations, but has attracted grant funding for specific activities to improve the tooling that facilitates running JOSS. We make effective use of time-limited funds such as grants to support enhancements to our services.

💛 Goal to generate surplus

  • Organisations which define sustainability based merely on recovering costs are brittle and stagnant. It is not enough to merely survive, it has to be able to adapt and change. To weather economic, social and technological volatility, they need financial resources beyond immediate operating costs.

As described in our blog post on the topic, our operational costs are deliberately very low. We currently do not generate a surplus and have no plans to. We also do not employ any staff and so “economic, social and technological volatility” would be expected to have limited impact on JOSS.

💛 Goal to create contingency fund to support operations for 12 months

  • A high priority should be generating a contingency fund that can support a complete, orderly wind down (12 months in most cases). This fund should be separate from those allocated to covering operating risk and investment in development.

We have sufficient funds available today to support our operations for substantially longer than 12 months. Allocating some of these to a formal contingency fund is something we are considering but have not yet done. As a fiscally sponsored project of NumFOCUS, JOSS can receive donations from individuals, and we can kick off a fund-raising campaign at short notice. JOSS can also apply for NumFOCUS Small Development Grants, which are awarded several times per year.

💚 Mission-consistent revenue generation

  • Potential revenue sources should be considered for consistency with the organisational mission and not run counter to the aims of the organisation. For instance…

JOSS revenue comes from three sources: a small amount from donations, a small amount from the American Astronomical Society (AAS) as fees for reviews of the software linked to AAS publications, and a larger amount from grants to the journal related to demonstrating its effectiveness, promoting the importance of research software, and recognizing research software’s contributors. These sources of revenue are fully consistent with the mission of JOSS.

💚 Revenue based on services, not data

  • Data related to the running of the research enterprise should be a community property. Appropriate revenue sources might include value-added services, consulting, API Service Level Agreements or membership fees.

JOSS receives no revenue for its data, which is completely open, but rather receives revenue for its services and for its community impact.

Insurance

💚 Open source

  • All software required to run the infrastructure should be available under an open source license. This does not include other software that may be involved with running the organisation.

All of JOSS’ tools are open source and available on GitHub under the Open Journals organization. This includes the JOSS website, our editorial bot Whedon, and the document production toolchain. Some of the collaboration tools we use as an editorial team are not open (e.g., GitHub, Slack, Google Docs), but these are not critical to the functioning of the journal and could be replaced by open alternatives.

💚 Open data (within constraints of privacy laws)

  • For an infrastructure to be forked it will be necessary to replicate all relevant data. The CC0 waiver is best practice in making data legally available. Privacy and data protection laws will limit the extent to which this is possible

Our papers and the (Crossref DOI) metadata associated with them are available on GitHub, with an open license. We deposit open citations with Crossref, and archive papers and our reviews with Portico.

💚 Available data (within constraints of privacy laws)

  • It is not enough that the data be made “open” if there is not a practical way to actually obtain it. Underlying data should be made easily available via periodic data dumps.

Our papers and the (Crossref DOI) metadata associated with them are available on GitHub, with an open license. These data are easily accessible to all motivated to make use of them.

We could potentially create data exports of the JOSS web application database; however this would just be an alternative representation of the data already available.

💚 Patent non-assertion

  • The organisation should commit to a patent non-assertion covenant. The organisation may obtain patents to protect its own operations, but not use them to prevent the community from replicating the infrastructure.

JOSS has no interest in patents, other than resisting the creation of patents that might prevent us from operating freely.

References

POSI. The Principles of Open Scholarly Infrastructure. The Principles of Open Scholarly Infrastructure. Accessed February 21, 2021. https://openscholarlyinfrastructure.org/

1000 papers published in JOSS

Arfon M. Smith

Today we reached a huge milestone at JOSS – we published our 1000th paper! JOSS is a developer friendly, free-to-publish, open-access journal for research software packages. Publishing 1000 papers (and reviewing the corresponding 1000 software packages) over the past ~4 years has been no small feat. This achievement has been possible thanks to the efforts of our journal team and community of reviewers who have all given their time to make JOSS a success. We take this opportunity to review some of what we’ve learnt over the past four years and outline some plans for the future.

A brief recap

Much has been written1 on the topic of research software, and the challenges that individuals face receiving credit for their work. Software is critical for modern research, yet people who invest time in writing high-quality tools often aren’t well rewarded for it. The scholarly metrics of the “impact” of a researcher’s work do a poor job of supporting software (and more).

JOSS was created as a workaround to some of the challenges for supporting software development in academia. Launched in May 2016, JOSS provides a simple, reliable process for receiving academic career credit (through citation of software papers) for writing open source research software. Authors write and submit a short article (usually under 1000 words) about their software, and JOSS reviews the paper and the software2, reviewing it for a variety of qualities including function, (re)usability, documentation3.

In establishing JOSS, we wanted the editorial experience to be very different from a traditional journal, and developer friendly – short papers authored in Markdown, review process on GitHub, open process and documentation – while at the same time following best practices in publishing depositing first-class metadata and open citations with Crossref, archiving papers and reviews with Portico, leaving copyright for the JOSS papers with authors and more.

We describe the journal this way: JOSS is an open access (Diamond OA) journal for reviewing open source research software. With a heavy focus on automation, our open and collaborative peer review process is designed to improve the quality of the software submitted and happens in the open on GitHub.

Some lessons learned publishing our first 1000 papers

JOSS is meeting a need of the research community

Something unclear when starting JOSS was whether demand from the research community would be sufficient. A few years in, we can safely conclude that JOSS is meeting a real need of the academic community.

It has taken us a little over four years to publish 1000 papers and before pausing submissions for two months starting in early March 2020 (to give relief to our volunteers during the pandemic), we were projecting to reach this milestone sometime in June this year. It took us a little under a year to publish our 100th paper, and an additional 8 months to reach our 200th. In that time we’ve grown our editorial board from an initial group of 10 to more than 50 today.

Screen Shot 2020-08-31 at 19 17 49

People are reading and citing JOSS papers

Well over half of all JOSS papers have been cited, and many have been cited hundreds of times.

While not designed for it, JOSS is proving a useful resource for people interested in new research software: JOSS currently receives ~10,000 monthly visitors to the journal website and provides language (e.g., https://joss.theoj.org/papers/in/Python) and topic-based (e.g., https://joss.theoj.org/papers/tagged/Exoplanets) search filters and feeds.

People are key

Every journal relies upon the expertise, knowledge, and time of their reviewers and JOSS is no different. 935 individuals have contributed reviews for our first 1000 papers, many having reviewed multiple times.

💖💖💖 THANK YOU 💖💖💖

Like many journals, as the number of submissions grows, JOSS has had to scale its human processes. Over the past few years we’ve added many more editors and associate editors-in-chief to enable papers to be handled efficiently by the editorial team. Simultaneously, we’ve developed our editorial robot Whedon from being an occasional assistant during the review process to being the backbone of the whole JOSS editorial process.

Automation is important

A big part of keeping our costs low is automating common editorial tasks where at all possible. The primary interface for editors managing JOSS submissions is a GitHub issue with the assistance of our Whedon bot who supports a broad collection of common editorial tasks. Other than having reviewers, editors and authors read papers, Pandoc-generated proofs are the final version, and no additional copy editing is done before a paper is published. PDF proofs and Crossref metadata are generated automatically by Whedon, and when the time comes, deposited with Crossref and published automatically too.

Screen Shot 2020-08-24 at 08 50 57

When starting JOSS, we thought that automation could be a big part of how things would work if the journal became successful. 1000 papers in, we believe it has been an absolutely critical part of our operations. We call this chatops-driven publishing.

Financial support

JOSS is committed to providing a high-quality service to the community at no cost for authors or readers (Diamond/Platinum Open Access). We’re transparent about our operating costs and have written about cost models for operating an online open journal.

While JOSS’ operating costs are modest, we’ve benefited from the support of a number of organizations including NumFOCUS (the ‘Open Journals’ organization is a sponsored project of NumFOCUS), the Gordon and Betty Moore Foundation, and the Alfred P. Sloan Foundation. It’s also possible to donate to JOSS if you would like to support us financially.

To the future!

With 1000 papers published and a further ~170 papers under review JOSS is busier than ever and there’s little sign of demand from the community slowing.

Over the next year or so, we’re going to be investing resources in a number of key areas to enable JOSS to scale further, improve the experience for all parties, and help others reuse the infrastructure we’ve developed for JOSS. We’ve captured much of what we want to achieve in this public roadmap. All of this will be possible thanks to a new grant from the Alfred P. Sloan Foundation, some highlights include:

Smarter reviewer assignment and management: Finding reviewers for a JOSS submission is still one of the most time-intensive aspects of the JOSS editorial process (though a large fraction of those we ask to review tend to accept, as they are excited by our mission). We think there’s lots of opportunity for substantially improving the success rate of finding potential reviewers through automation. Making sure we’re not overloading our best reviewers will also be an important aspect of this work.

A major refactor of our editorial bot Whedon: Whedon is a critical part of our infrastructure but has become hard to maintain, and almost impossible for other projects to reuse. We’re planning to rework Whedon into a general framework with a set of reusable modules for common editorial tasks.

Investments in open source: JOSS relies upon a small number of open source projects such as Pandoc and pandoc-citeproc to produce scholarly manuscripts (PDFs) and metadata outputs (e.g., Crossref and JATS). We’re going to work with the Pandoc core team to generalize some of the work we’ve done for JOSS into Pandoc core.

For many of us on the editorial team JOSS is a labor of love, and it has been quite a ride growing JOSS from an experimental new journal to a venue that is publishing more close to 500 papers per year. For those of you who have helped us on this journey by submitting a paper to JOSS or volunteering to review, thank you ⚡🚀💥.

References

Anzt H, Cojean T, Chen YC, et al. Ginkgo: A high performance numerical linear algebra library. JOSS. 2020;5(52):2260. doi:10.21105/joss.02260

Katz DS, Barba LA, Niemeyer K, Smith AM. Scaling the Journal of Open Source Software (JOSS). Published online July 8, 2019. doi:10.59349/gsrcb-qsd74

Smith AM. Call for editors. Published online December 21, 2018. doi:10.59349/546tr-5p719

Katz DS, Barba LA, Niemeyer K, Smith AM. Cost models for running an online open journal. Published online June 4, 2019. doi:10.59349/g4fz2-1cr36

Smith A. Chatops-Driven Publishing Arfon Smith. Published online February 28, 2019. Accessed August 31, 2020. https://www.arfon.org/chatops-driven-publishing

Jiménez RC, Kuzak M, Alhamdoosh M, et al. Four simple recommendations to encourage best practices in research software. F1000Res. 2017;6:876. doi:10.12688/f1000research.11407.1

Cohen J, Katz DS, Barker M, Chue Hong N, Haines R, Jay C. The Four Pillars of Research Software Engineering. IEEE Softw. 2021;38(1):97-105. doi:10.1109/MS.2020.2973362

Katz DS, Druskat S, Haines R, Jay C, Struck A. The State of Sustainable Research Software: Learning from the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1). JORS. 2019;7(1):11. doi:10.5334/jors.242

  1. For example: https://doi.org/10.12688/f1000research.11407.1 · https://doi.org/10.1109/MS.2020.2973362 · http://doi.org/10.5334/jors.242 

  2. Something hardly any other journals do. 

  3. The JOSS review structure was originally derived from the process developed by the rOpenSci community for their software review. 

Minimum publishable unit

Arfon M. Smith

tl;dr – JOSS is introducing new submission criteria whereby submissions under 1000 lines of code will automatically be flagged as potentially being out of scope, and those under 300 lines desk-rejected. This blog post describes some of the motivations behind this decision.

Sometime in 2020, JOSS will publish its 1000th paper – an incredible achievement by a volunteer team of editors and reviewers.

Since its inception a little over four years ago, the primary goal of JOSS has always been to provide credit for authors of research software. Quoting from the original blog post announcing JOSS:

The primary purpose of a JOSS paper is to enable citation credit to be given to authors of research software.

One challenge we’ve always struggled with as an editorial team is defining clear guidelines for submissions allowed (and not allowed) in JOSS. Our current submission criteria are available online and include language about software having to be a significant contribution, feature complete, and having an obvious research application. In these criteria we also explicitly exclude a category of software we generally call “minor utilities”.

The challenge of defining a unit of publication credit

We think of JOSS essentially granting “1 publication credit” for each software package that is reviewed and published in the journal. In empirical research, a publication is often the result of years of work. In engineering research, rarely does a paper represent less than one year of work. Other fields may vary, but let’s say that a scientific paper resulting from work measured in just months is rare or exceptional.

Since the earliest days of the journal, there has been a range of views within the editorial team on what level of effort we should require from authors for a submission to be allowed in JOSS – some JOSS editors feeling that every useful piece of software should be considered, others believing that the “bar” for publishing in JOSS should be higher than it currently is.

Building trust in JOSS

As an editorial team we want JOSS papers to count the same as any other publication in the CV of researchers who write software. With career credit the stated primary reason for starting JOSS, if this isn’t true then the mission of the journal is at risk.

In reality this means that our editorial policy requires us to balance two competing needs:

  1. Providing an excellent service to authors by offering peer-review of their software and the opportunity to receive career credit for their work.
  2. Building the trust of an existing academic culture to accept a JOSS paper as equal to any peer-reviewed journal paper.

These two aspects are in tension with each other because while we would dearly love to publish any and all research software in JOSS regardless of size/scale and level of effort to implement, part of building and maintaining that trust with the community relies on us ensuring that our published authors can continue to expect JOSS papers to “count” in their future merit reviews and promotions.

Updates to our submission requirements and scope-checking procedures

Over the past couple of years, as the number of submissions to JOSS has grown, we’ve found that our existing submission criteria and protocol for rejecting papers as out of scope have been taking a significant fraction of our editorial team time. With a volunteer team of editors, it’s essential that we use their time carefully and an update to our procedures for handling scope assessments is long overdue.

Going forward we’re going to adopt the following new process:

Automatically flagging small submissions

As part of the pre-review process, incoming submissions that are under 1000 lines of code1 (LOC) will be automatically flagged as potentially out of scope by the EiC on rotation.

Submissions under 300 lines2 of code will be desk rejected with no further review.

Mandatory “Statement of need” section in JOSS papers

While this has always been preferred, a clear Statement of need section in the JOSS paper is usually extremely valuable in helping the editorial team understand the rationale for the development of the software.

Gauging the scholarly content of the software as part of the review

Reviewers will be asked if the software under review is a “substantial scholarly effort” and guidelines will be provided on how they can make that assessment.

A streamlined editorial review process

Rather than each paper that is potentially out-of-scope being discussed in a separate thread on our editorial mailing list, JOSS is going to move to a weekly review of such papers by our editorial team. Topic editors will be asked to review papers flagged as potentially out of scope in their area and help the EiC team make a decision.

References

Smith A. Announcing The Journal of Open Source Software - Arfon Smith. Published online May 5, 2016. Accessed July 7, 2020. https://www.arfon.org/announcing-the-journal-of-open-source-software

  1. In a high-level language such as Python or R. More verbose languages such as Java, C++, Fortran etc. will require more LOC. 

  2. We realize that introducing numerical thresholds may encourage some authors to unnecessarily “pad” their submissions with additional lines of code to meet our thresholds. As reviewers are already asked to judge the standard of the implementation as part of their review we expect that situations like these will usually be flagged during the review.