Transcript of Panton Discussion 4: Ian Hrynaszkiewicz, BioMed Central
In this interview:
- Transcript of Panton Discussion 4: Ian Hrynaszkiewicz, BioMed Central
- In this interview:
- Full Transcript:
- My background
- My role at BioMed Central
- The role of Open Data at BioMed Central
- What is needed to make Open Data happen in BioMed Central?
- The annual BMC Open Data awards
- Other Open Data advocates in the publishing industry
- Technical issues of data between disciplines
- Does one size fit all for data? Or should we subdivide disciplines?
- What are the roles of the scientist? the funder? the publisher? the institution?
- Should each discipline have its own central repository?
- How do you see the Panton Principles?
- What are the barriers to Open Data?
- Business models and funding for publishing data
- Can we avoid fragmentation (e.g. of definitions) for Open Data?
- Would it make sense to have a central organisation for Open Data?
- Creating or capturing machine-readable data.
- Examples of where Open Data works well
- What is the influence of Government Open Data initiatives?
- What would you like to see in 5 years’ time?
- What are the greatest challenges facing the Open Data movement?
Full Transcript:
My background
Laura Newman (OKFN): OK. Iain its really great to be here with you at BMC today, I wonder if you could tell us a little bit about yourself.
Iain Hrynaszkiewicz (BioMed Central): So I’m Iain Hrynaszkiewicz otherwise known as Iain H. for quite obvious reasons! I’m a journal publisher at BioMed Central.
Laura: And could you tell us a bit about what you did before you came here?
Iain: I did my first degree, which was in Microbiology, at the University of Sheffield, and then I did a masters in Print Journalism, also at the University of Sheffield. I always wanted to work in science publishing or science communication. As probably a lot of graduates will know, it’s quite a competitive industry and quite hard to get into, so my first proper job after graduating (although I had various call centre positions to pay my way through university!) was for a contract medical information consultancy which was based in north Yorkshire, so involved providing standard responses to doctors and consultants and patients about selected products, which I didn’t particularly enjoy that much, but it was a good way to start employment whilst I was looking to move into scientific publishing, and my first job in publishing was here at BioMed Central in 2006.
My role at BioMed Central
Iain: I’m responsible for a selection of BioMed Central’s journal portfolio; journals that are loosely connected in that they’re all trying to do something that goes beyond just publishing Open Access papers. So journals that are for example trying to publish data driven papers, or journals that have their own repositories for data so that they can link data to publications. In addition to that, I’m also responsible for overseeing some of our new publishing product developments, for example our threaded publications initiative and our database of medical case reports, and also I lead all of our Open Data based projects and initiatives in publishing as well.
The role of Open Data at BioMed Central
Laura: At BioMed Central, you’ve been championing the need for definition and implementation of Open Data. What led you to this?
Iain: BioMed Central has been around since 2000, and was the first purely Open Access publisher, and has grown very much over that time and has been quite successful in terms of growth in paper and growth in journals. Increasing transparency in scientific research and scholarly communication is really at the core of our strategy, and in the last couple of years in particular, we have very much been moving our focus – or extending our focus rather – from papers and also on to data, recognising that more scientists are doing data intensive research, and thinking about what they need to do with their data as much as their publications. The Panton Principles being published in early 2010 were definitely an important event. My colleague Brian Vickery who is our Chief Operating Officer I think was one of the first few people to endorse the Principles, and I recall he came back from an editorial board meeting of the Journal of ChemInformatics (which is published by Chemistry Central, a sister publisher to BioMed Central), and he was quite excited by the ideas within the Panton Principles, and these led us to help crystalise some of our thoughts and strategies about how we can better enable Open Data in our publications.
What is needed to make Open Data happen in BioMed Central?
Iain: I should point out that we are still working on this and we haven’t achieved everything that we need to. There are a number of different initiatives that we’re working on. We are trying to develop guidance and best practice because we’ve found that in different fields, people need examples of data sharing done well. There are specific problems in specific fields, so I personally have done quite a bit of work in the clinical trial arena, where patient privacy and confidentiality is obviously an important issue. So we’ve worked with editors of relevant journals to develop guidance and best practice for sharing clinical research data for example. We’ve also looked at linking data to publications, so firstly finding where authors can put their data, if it’s not in additional data files with their article, and then providing them with the right information and the right ways of doing that. Another thing that is very important and we’re still currently looking at is the right licenses for data. So all of BioMed Central’s research content is available under a Creative Commons attribution license, but we definitely recognise that to fully put Panton Principles into practice, we need Open Knowledge compliant licenses – Creative Commons CC0 etc. for data – and that’s something that we are still working towards and are very keen to do.
The annual BMC Open Data awards
Laura: You run the annual BMC Open Data Awards. What are the short term aims of these and what do you hope they might achieve in the longer term as well?
Iain: We know that data sharing isn’t always easy, and also there is a problem with data sharing and publication which comes up over and over again: people don’t get any credit or recognition for doing it, and whilst an award or a prize is just a small measure, it is nice to be able to select authors or identify people who are trying to go that extra mile in making their data available, making it re-useable or standardising it. We’ve run the awards for two years so far and we’ll be doing it again definitely in 2012 for papers published in 2011. The first winner was a researcher in malaria and the second winner in evolutionary biology. It’s not a universal solution to the problem of giving people credit, but we certainly see it as something worth doing.
Other Open Data advocates in the publishing industry
Laura: Are you aware of or in touch with other Open Data advocates working in the publishing industry.
Iain: I think I would have to leave it up to them as to whether or not they called themselves an ‘advocate’, but certainly I’m in touch with other people that do specifically work in publishing. BioMed Central convened in summer 2011 a Publishing Open Data Working Group meeting with three specific goals, one of which was looking at licensing of research data. That group was put together to get together different publishers, different authors, different editors, different funders and librarians as well. People from publishers who came to that meeting included Theo Bloom from PLoS, Trish Groves from the BMJ and Ruth Wilson from Nature.
Technical issues of data between disciplines
Laura: We know that data publication is relatively straightforward in some areas, and technically a lot more complicated in others. What kind of technical issues need to be addressed?
Iain: I think it would be quite difficult for me to go into detail for different scientific domains. We certainly at BMC recognise the importance of data standards, agreed upon formats for making specific types of scientific experiments available so that they can be reused readily or harvested by others. A group in Oxford called BioSharing are doing some really great work in that area in terms of cataloguing the different data standards in different Life Science domains, and in fact one of our journals BMC Research Notes has explicitly partnered with BioSharing to help them develop their resource, and also to offer people that contribute data standards the opportunity to publish exemplary data sets in the journal BMC Research Notes. More generally I think there are some potentially helpful developments that would be good to see in publishing, for example making Published Additional Data Files – you’ll probably hear the term ‘supplementary material’ used elsewhere, but we call them Additional Files at BioMed Central because we don’t see them as supplements, we see them as integral to the published article – but I think it certainly would be useful from feedback we’ve received to make those additional files more filterable, discoverable and searchable, so if you did want to find all particular files of a certain type about a certain experiment from a certain journal, then that could well be useful for people wanting to do secondary research from our publications. There would be technical developments that could be useful for making that happen.
Does one size fit all for data? Or should we subdivide disciplines?
Laura: Given the differences in these different disciplines, do you think that it’s more helpful to subdivide them, or do you think that we need some kind of overview one-size-fits-all approach?
Iain: One size definitely doesn’t fit all. Certainly as a publisher, BioMed Central, who is ultimately paid for the service of publication, its all about providing a service to our authors and to our editors, providing them with tools that they actually want to use. It’s a case I think of working with the communities that do want innovative ways of publishing, of sharing data, and then trying to find the right way of enabling them to do that.
What are the roles of the scientist? the funder? the publisher? the institution?
Laura: In the publishing process, what do you think the roles of the scientist, the funder, the publisher and the institution are?
Iain: OK, so I’ll probably have a lot more to say about the publisher in terms of experience, but I suppose for scientists, they certainly need to listen to and take on education regarding good data management and good data archiving practices. Of course they do need the tools and the support, financial or technical or otherwise, in order to do that.
Funders definitely have an important role to play. I think what I have written about before and come back to is that we need multiple stakeholders contributing to achieve more data sharing in science, so funders certainly have a role to play in terms of providing policies, mandates, and also checking their adherence so that if there is a data sharing policy, then finding ways to check whether or not it is being followed.
For publishers I think there are a lot of potentially useful roles, and particularly Open Access publishers. An important part of the role of publishers is to disseminate knowledge and information and make it permanently available. Particularly if you’re a publisher that’s serving different scientists in different domains, in our case across biology and medicine, then we’re in a good position to share knowledge and best practice between different fields. So if we know that scientists in genomics have got a really useful policy or a really useful way of making data more readily available, then we’re in a really good position to share that knowledge with other scientists.
Should each discipline have its own central repository?
Laura: I know in some domains – and in particular I’m thinking here of BioScience, proteins and gene data – have centrally funded repositories. Do you think that each disciplines needs its own domain repository?
Iain: I think it’s up to the discipline to decide if they do need a domain repository. I think that there are obviously benefits for domain specific repositories, because you will have domain experts that could perhaps provide additional services, for example peer review of that data, data curation, so all based on the idea that they have common expertise in the specific field. But I don’t think that means that there isn’t potentially a role for institutional repositories as well, because not every domain does have a repository. There are good examples of institutions with useful repositories and policies; Edinburgh University for example, they’ve got the Edinburgh DataShare Project which I think is worth looking at.
How do you see the Panton Principles?
Laura: You’ve always been very clear about the need to label publications and data with licensing rights, and you’ve been a keen supporter of the Panton Principles. Tell me how you and your colleagues see the Panton Principles.
Iain: BioMed Central has been an early supporter of the Panton Principles for Open Data in science as I mentioned earlier. We certainly do agree that we need to work towards placing data explicitly in the public domain. What we do need to be aware of is that there will be concerns, there may well be objections – commonly for example people may be concerned that if they place their data in the public domain then they won’t get any credit, people won’t cite them because there isn’t that legal requirement. And the response to that, which is a very good one, is that cultural norms of citation should replace any legal requirements. So we do need to make sure that if we are working towards changing author perceptions or awareness about making data more open, we need to do it in close consultation. That was one reason why we formed our Publishing Open Data Working Group. We put forward in August 2010, when we released a draft Open Data statement, that it would be a way forward in the future to set a date and say ‘all submissions to BioMed Central, all the data files that are included with the submission, if and when published, would be available with a Creative Commons CC0 license’, but in order to do that we need to make sure that we tease out any possible concerns, and if necessary provide alternative options for authors or funding institutions where they may not be able to comply with such a policy. But we’re certainly keen to move towards Open Data as default for published additional data files or potentially even tabular data within our content.
What are the barriers to Open Data?
Laura: You’ve begun to touch there upon some of the barriers and problems facing open data. I wonder if you could just outline the main cultural barriers that you’ve encountered preventing people from making data open.
Iain: The concern that we commonly hear is the lack of credit, the lack of motivation for doing so. People don’t get recognised for sharing and publishing their data so why should they do it? I think that’s often heard. And there are a number of solutions to that, I’ve mentioned the Open Data Award as one way of looking at that, but being far from universal. There are mechanisms to enable people to get more credit for data sharing – journals that publish data papers for example is one such way of doing that, so explicitly enabling people to publish a data set with the paper being more of a short peer-reviewed wraparound for the data set.
Business models and funding for publishing data
Laura: Data publishing costs money, whether implicitly or explicitly. What models do you see being useful for this, and what do you think the priorities are?
Iain: I think that data needs to be permanently available, so efforts of DataCite for example are really really important to make people confident that if they publish a data set by assigning it a digital object identifier – which is one of the main activities of DataCite – people know that the data will be permanently available, so I think that’s really important. Publishers for example may increasingly want to link their data to publications. In terms of actually how data repositories are operating, there are different sustainability models or business models that are out there. So I’m aware that a lot of data repositories are funded in perhaps a more cyclical sort of way, in terms of they are reapplying for funding periodically, but the Dryad Repository for example I’m aware does have a long term sustainability plan, and I think it would be worth considering that the successful business model from a number of Open Access publishers where there is a fee for publication might be a useful way to ensure long-term sustainability of published data files.
Can we avoid fragmentation (e.g. of definitions) for Open Data?
Laura: At the moment Open Access I think it would be fair to say isn’t a single philosophy, and as a result there’s a certain amount of confusion, and perhaps even ignorance. Do you think we can avoid this kind of fragmentation in Open Data?
Iain: One thing that I find quite interesting and also quite frustrating about Open Access is that there are a number of definitions of what Open Access means. Even amongst the Open Access community there are different definitions of Open Access and it’s obvious, as someone that works for an Open Access publisher that uses Creative Commons attribution licenses, that there is still a lot of confusion between open content and freely available content. So I think that the Open Data community, the Open Data movement, whatever we call it, certainly could learn from that in terms of getting together and supporting a common definition and a common standard, I think that is important.
Would it make sense to have a central organisation for Open Data?
Laura: Do you think it would make sense to have an organisation – as OSI is for software perhaps – overseeing open data?
Iain: I’m not sure we need another organisation. I think there is a lot of good work already being done at the Science Commons and Creative Commons, at the Open Knowledge Foundation. I think what we need is just better alignment of existing groups, all working to the same definitions and standards. I think we want to avoid any tendency that we’re constantly reinventing the wheel, and really try to identify the areas that aren’t or haven’t yet been looked at, and then work on things that are going to have lasting change.
Creating or capturing machine-readable data.
Laura: At the moment a lot of data is locked in PDFs, where it’s pretty hard for machines to read it. How do you think we can change this? What role do you think for example the author, the editor, or funders and publishers have in this?
Iain: So for unlocking data that’s in publications?
Laura: Or preventing it from being locked up in the first place.
Iain: I think people need better guidance, again, on what is the right thing to do with particular types of data or particular types of experiment, for example we do still have submissions of tables as PDF files, as additional PDF files, which obviously isn’t the optimum way to do that. Obviously if they were a csv file that would be much more useful. So I do think there is a role for editors and publishers, and it’s a long-term goal of BioMed Central to provide fairly comprehensive guidance on what are the right sorts of principles to follow when publishing data. So for example, if I have a movie, this is the right format for it, if I have a certain tabular data then this is the right format for it to make it readily available.
Examples of where Open Data works well
Laura: Do you have any examples of where you’ve seen Open Data working particularly well in science?
Iain: My experience is obviously in the Life Sciences. The obvious example is in Genomics. There are very well established policies for data being deposited in very well supported and very accessible data repositories, for genetic sequence data for example. And what I think is interesting about that community is that a number of things seem to have culminated in leading towards this culture of sharing. So there was a very large project that needed to be done, the sequencing of the human genome, and there was no way that could have been achieved just by one researcher or one lab or one company, so there’s this need that we need to work together to achieve something big. And then obviously the technical infrastructure to make that happen, the repositories to put the data. And then journals, funders, all signing up to policies and requiring authors to make data available as a condition of publication. So you’ve got all those different things that are needed to make a particular type of data available. I mean there are other examples elsewhere. There are individual journals with individual papers in some of BioMed Central’s journals which have been pleasing to see over the last few years. There’s one really good example in clinical trials. The journal Trials has earlier this year published as a CSV file the 19,000 individual patient data from a very large stroke trial, and of course they’ve made sure that data was all anonymous. There are certainly good case studies. It would be good to see more broader and far reaching examples of open data in Life Sciences.
What is the influence of Government Open Data initiatives?
Peter Murray-Rust: How do you think the increasing emphasis on Open Data from governments will affect data in science? Iain: I would be very much guessing at this. I think it’s certainly good for visibility, that governments are getting behind the idea of access to data. For example I thought it was very interesting that in the summer when we had the UK government peer review enquiry, that the headline or the lead part of the article that was reporting from that wasn’t even about peer review, it was about full access to research data. And there are other governments outside the UK, in the US for example, where Open Science is getting on to the agenda, so I think it’s certainly good for visibility and building support for these initiatives.
What would you like to see in 5 years’ time?
Peter: Looking five years ahead, what would you like to see?
Iain: So I think we need to see, as I alluded to before, lots of different stakeholders playing their part, so funders, institutions, authors, researchers themselves, and publishers and journals and editors. So I do think we need more policies, mandates for data sharing, at the funder level and at the institution level. I think that would be very helpful to see in terms of actually seeing results and making more data available. In terms of what publishers can provide or enhancements to the literature I think it would be good and reasonable to expect more data linked to publications, so data from repositories that assign permanent identifiers to data sets, we’d like to see more of that happening. I think we can also expect to see more journals that are focussing on data publication. So we have a journal at BioMed Central which is launching soon called Giga Science, which is what’s called a “big data” journal, and is very innovative because as well as having a journal, it also has its own cloud computing repository for very large data sets. So I think we can see much more in terms of data publication in the next five years.
What are the greatest challenges facing the Open Data movement?
Peter: What are the greatest challenges facing the Open Data movement?
Iain: In terms of challenges facing the Open Data movement, rather than challenges or barriers against data sharing, I think it needs to avoid a perception that it is some radical movement, that it’s transient or otherwise not really an implementable idea. Because of course it’s not really a new idea. Data sharing, reproducible research is definitely not a new idea, it’s arguably one of the tenets of science. And I think – and these weren’t my words – I know that Egon Willighaagen commented on a BioMed Central blog once and said that Open Data isn’t the goal, it’s just a “means to do science better”. So I think the Open Data movement need to make sure that there’s no perception, erroneous or otherwise, that it is something that is transient and will not be here tomorrow. Also I think its important that the Open Data movement avoids any tendency to preach to the choir. So I think there are lots of good people doing lots of great work in Open Science and Open Data, and I see and go to a lot of good meetings based around Open Science and Open Data. I think it’s important that Open Data gets onto the agenda of all the domain specific meetings and forums that happen, so whether that’s neuroscience or whether that’s genetics or another area in science. So I think making sure that it becomes a prominent part of all scientific circles is a challenge that needs to be addressed. Peter: Well Iain, thank you very much.

Leave a Reply