MAXYS Ai Peer Review – An Artificial Intelligence Guide to Scholarly Peer Review
I. Introduction: Understanding the Mandate of Peer Review
A. Defining Peer Review: Purpose, Goals, and Significance in Scholarly Communication
Peer review serves as the cornerstone of modern scholarly communication, acting as a critical process for evaluating research and ideas before widespread dissemination.[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] Fundamentally, it involves the assessment of scholarly work—such as journal manuscripts, grant proposals, academic books, or conference submissions—by independent experts, or “peers,” within the same field.[1, 3, 5, 6, 7, 10, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24] This evaluation aims to determine the work’s quality, validity, significance, originality, and suitability for its intended purpose, whether publication or funding.[5, 6, 7, 13, 15, 17, 24]
The practice functions as a vital quality control mechanism, filtering research to ensure its robustness and adherence to disciplinary standards.[3, 5, 6, 7, 8, 10, 14, 15, 17, 19, 23] It is also a form of self-regulation by the academic and professional communities, intended to maintain standards and credibility.[18, 23, 25] While its roots trace back to the 17th and 18th centuries, peer review became systematically embedded in the scholarly process, particularly after World War II, coinciding with increased government funding for research.[20, 25, 26]
The goals of peer review are multifaceted:
- Quality Assurance and Gatekeeping: Its primary function is to act as a filter, ensuring that only high-quality research is disseminated.[3, 5, 6, 7, 8, 10, 14, 15, 17, 19, 24, 27, 28, 29] This involves determining the validity, significance, and originality of the study [5] and preventing the publication of irrelevant findings, unwarranted claims, unacceptable interpretations, or personal views lacking rigorous support.[5, 14, 19] It encourages authors to meet the accepted high standards of their discipline.[5, 14]
- Improvement of Scholarly Work: Beyond gatekeeping, peer review aims to improve the quality of manuscripts deemed suitable for publication.[5, 6, 7, 14, 16, 17, 26, 29, 30, 31, 32] Reviewers provide constructive criticism and suggestions that help authors refine their arguments, strengthen conclusions, enhance clarity, improve methodology, and identify errors or omissions.[5, 14, 16, 26, 30, 32] Research indicates that a vast majority of authors feel their papers are improved through this process.[7] At its best, it is a collaborative dialogue.[7, 14]
- Validation and Credibility: The scrutiny by experts lends credibility and trustworthiness to the published work.[1, 3, 5, 6, 7, 11, 14, 16, 21, 33, 34] The “peer-reviewed” label signifies that the research has undergone rigorous evaluation and meets community standards, which is crucial as scientific knowledge is cumulative and builds upon prior validated work.[5, 14] Acceptance by the academic community often hinges on peer-reviewed publication.[4, 5]
- Maintaining Integrity: Peer review supports and maintains integrity and authenticity in the advancement of science.[1, 5, 14, 35] While not primarily designed to detect fraud, it can sometimes identify suspicious cases or ethical breaches [14] and encourages adherence to ethical guidelines.[14]
The significance of peer review cannot be overstated; it is widely regarded as essential to academic quality.[8, 23] However, it is crucial to recognize that peer review is not merely a technical checklist applied mechanistically. It operates as a complex socio-technical system embedded within academic culture.[1, 4, 5, 19, 25, 32, 36] This system relies heavily on social constructs like trust between authors, reviewers, and editors; the shared expertise and tacit knowledge within a disciplinary community; adherence to community norms and ethical responsibilities; and the practice of constructive, critical dialogue.[1, 3, 4, 5, 7, 14, 16, 32, 36, 37] The process involves subjective judgment alongside objective checks, interpretation within a specific context, and a collaborative element aimed at improving the work.[7, 14, 16] Understanding this social dimension is paramount when considering the integration of non-human agents like Artificial Intelligence (AI) into the process. AI can potentially execute the technical aspects of review, but replicating the trust, nuanced judgment, and community validation inherent in the human-driven system presents a profound challenge.
B. The AI’s Role: Augmenting, Not Replacing, Human Expertise
Artificial Intelligence is rapidly entering the sphere of scholarly communication, with tools being developed and deployed to assist in various stages of the publication process, including peer review.[38, 39, 40, 41, 42, 43, 44, 45, 46, 47] AI offers significant potential benefits, particularly in addressing some of the known challenges of traditional peer review, such as inefficiency, reviewer burden, and inconsistency.[3, 17, 48, 25, 35, 49, 50, 51, 52, 53]
AI systems can enhance efficiency and speed by processing large volumes of submissions and performing specific checks much faster than human reviewers.[38, 39, 40, 41, 42, 44, 45, 47, 54, 55, 56, 57] They can apply predefined criteria consistently across manuscripts, potentially reducing variability stemming from human subjectivity in certain tasks.[44, 55, 57] Specific areas where AI demonstrates capability include: automated checks for language quality and grammar [44, 45, 54, 58, 59, 60, 61]; adherence to formatting guidelines [43, 44, 45]; plagiarism detection [40, 41, 42, 43, 44, 45, 54]; reference checking and validation [40, 41, 62, 63, 64, 65, 66]; checking for data consistency between text, tables, and figures [67, 68, 69, 70, 71]; verifying adherence to reporting guidelines like CONSORT or PRISMA [72, 73, 74, 75, 76]; identifying potential statistical errors [40, 47, 54, 57]; and assisting in reviewer selection.[42, 43, 47, 54, 55, 57, 77]
Despite these capabilities, current AI technology possesses significant limitations that prevent it from fully replicating the human peer review process.[38, 39, 46, 54, 56, 78, 79, 80] AI systems lack the deep subject-matter expertise, nuanced understanding, and critical thinking skills of human experts. They struggle to evaluate the conceptual novelty and intellectual significance of research beyond surface-level comparisons.[38, 41, 54, 56, 78, 80] Assessing the appropriateness of complex or innovative methodologies, interpreting results within the broader scientific context, and making sophisticated ethical judgments remain firmly in the human domain.[38, 46, 56, 78, 81] AI cannot grasp the subtleties of argumentation or the potential real-world implications of findings in the way a human expert can.[38, 54, 78]
Therefore, the guiding principle for integrating AI into peer review must be one of augmentation, not replacement. AI should be conceptualized and deployed as a powerful support tool or assistant that enhances the capabilities of human reviewers and editors.[38, 39, 42, 46, 56, 78, 79, 80, 81, 82, 83] The goal is to leverage AI’s strengths in speed and consistency for specific, well-defined tasks, thereby freeing human experts to focus on the more complex, evaluative aspects that require deep understanding and critical judgment.[40, 41, 56, 82] Final decisions regarding manuscript acceptance, revision, or rejection must remain with human editors, informed by both human and AI-generated input.[39, 42, 54, 78, 80, 81, 83]
This augmentation approach naturally leads to a distinction between the types of tasks AI is suited for. Given AI’s proficiency in pattern recognition and rule application [44, 45, 67, 70] contrasted with its weakness in contextual understanding and value judgment [38, 54, 56, 78], its most reliable and ethically sound role lies in verification rather than evaluation. AI can effectively verify compliance with formatting rules, check for plagiarism, confirm data consistency across different parts of a manuscript, and check adherence to reporting guideline checklists. These are largely objective, rule-based tasks. Conversely, evaluating the intellectual merit – the significance of the research question, the novelty of the findings, the appropriateness of a chosen methodology for a specific context, the validity of the interpretation – requires a level of understanding and judgment that AI currently lacks. Consequently, AI systems designed for peer review should be programmed primarily for these verification tasks, presenting their findings objectively and deferring the evaluative judgments to human experts.
C. Context Matters: Adapting Review to Document Type
The purpose, criteria, process, and ethical considerations of peer review are not monolithic; they vary significantly depending on the type of scholarly work being assessed and the context of its dissemination.[8, 9, 48, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109] A generic approach to AI-driven peer review would therefore be ineffective and potentially detrimental. The AI system must be capable of adapting its analysis and output based on the specific context.
Key contexts and their distinguishing features include:
- Journal Articles: The most common context. Evaluation focuses on originality, scientific significance, methodological rigor, validity of conclusions, clarity of presentation, and suitability for the specific journal’s scope and readership.[1, 7, 22, 84, 33, 110] The process typically involves an initial editorial assessment (potentially leading to “desk rejection”) followed by review by external, often anonymous, experts.[7, 20, 33]
- Grant Proposals: The goal is to allocate limited funding to the most promising research. Evaluation criteria include scientific merit, potential impact, innovation, feasibility of the approach, investigator qualifications, institutional environment, budget justification, and alignment with the funder’s mission and priorities.[48, 111, 85, 92, 94, 112, 113] The process often involves multiple stages, quantitative scoring rubrics, panel discussions by a committee, and extremely strict confidentiality protocols.[111, 85, 92, 112]
- Academic Books: Review often begins at the proposal stage, assessing the concept, scope, intended audience, market need, and author’s expertise.[89, 93, 96, 114] Reviewers evaluate the contribution to the field, engagement with existing scholarship, structure (table of contents), and writing quality (based on sample chapters).[86, 89, 93, 96, 104, 114] Unlike most journal reviews, book reviewers often know the author’s identity [93, 96], and final approval frequently involves a university press editorial board.[93]
- Conference Submissions (Abstracts/Papers): The primary goal is often program selection and structuring. Evaluation emphasizes relevance to the conference theme, originality, clarity, methodological soundness (for full papers), and potential interest or value to attendees.[9, 91, 103, 106, 115, 116] The process is characterized by tight deadlines, often uses double-blind review, and employs scoring criteria to rank submissions.[9, 91, 103, 106]
- Clinical Practice Guidelines (CPGs): These aim to optimize patient care based on the best available evidence.[87, 90, 98, 105, 108, 109] The process mandates a systematic review of existing literature, formal assessment of evidence quality (e.g., using GRADE), balancing benefits and harms, and formulating graded recommendations.[87, 90, 97, 108] Peer review involves a broad range of stakeholders, including clinical experts, methodologists, and patient representatives.[87, 90, 97]
- Faculty Promotion & Tenure (P&T): This internal institutional process involves peer evaluation of a faculty member’s performance in teaching, research/scholarship, and service against established departmental and university standards.[88, 95, 99, 100, 101, 102, 107] Evidence includes course materials, student evaluations, teaching observations, publications, grant activity, and service records. Review is typically conducted by internal committees of tenured faculty.[95, 100, 102]
- Other Applications: Peer review principles are also applied in various professional settings (e.g., law, engineering, medicine for quality assurance), government policy evaluation, and pedagogical contexts (student peer assessment).[8, 16, 23]
The clear differentiation in criteria (e.g., market fit for books, funder alignment for grants, evidence synthesis for CPGs) and process steps across these contexts demonstrates that a single, generic AI review model would be inadequate. Applying journal article criteria to a grant proposal, for instance, would miss crucial evaluation dimensions like budget justification or investigator capacity. Therefore, for AI to be genuinely useful in peer review, it must possess contextual adaptability. This necessitates the development of AI systems that can either be configured with context-specific rule sets and criteria or incorporate distinct modules trained for different review scenarios. The AI must understand the specific goals and requirements of the review task at hand to provide relevant and meaningful assistance.
II. Deconstructing the Peer Review Process: A Step-by-Step Algorithmic Approach
To effectively guide an AI system in performing or assisting with peer review, the process must be broken down into logical, potentially automatable steps. While the specifics vary by context (as discussed in Section I.C), the review of a journal article serves as a useful base case. This section outlines these steps, indicating where AI can contribute and where human oversight is essential.
A. Initial Manuscript Triage (Simulating Desk Assessment): Scope, Formatting, Basic Requirements
The first stage in many peer review workflows, particularly for journals, is an initial assessment by the editorial office or Editor-in-Chief (EIC).[7, 17, 20, 33, 117] The primary goal is to quickly filter out submissions that are clearly unsuitable for the journal (e.g., out of scope, incomplete, severely flawed) before investing the time and resources of external peer reviewers.[7, 17, 20, 26, 33] This “desk assessment” can lead to a “desk rejection”.[7, 17, 20, 26, 33]
AI has a high potential to automate many aspects of this initial triage, leveraging its strengths in rule-based checking and text analysis.[43, 44, 45, 46, 47, 57] Key checks an AI can perform at this stage include:
- Scope and Relevance Check: Using Natural Language Processing (NLP), the AI can analyze the manuscript’s title, abstract, and keywords, comparing them against the journal’s published aims and scope description. It can generate a relevance score or flag manuscripts that appear significantly outside the journal’s focus.[7, 17, 20, 106]
- Formatting and Guideline Compliance: The AI can verify adherence to the journal’s specific instructions for authors regarding word count, manuscript structure (e.g., presence of required sections), citation style (e.g., APA, MLA), and formatting of figures and tables.[7, 44, 45, 117] Tools like Penelope.ai are designed for such compliance checks.[45]
- Completeness Check: The AI can confirm the presence of essential manuscript components, such as an abstract, author affiliations, keywords, figures/tables (if applicable), funding declarations, conflict of interest statements, and ethical approval details.[7, 84, 45, 46]
- Basic Language Quality Assessment: AI-powered grammar and style checkers (e.g., Grammarly, Trinka) can identify significant grammatical errors, spelling mistakes, and clarity issues that might render the manuscript difficult to understand or unprofessional.[7, 84, 44, 45, 58, 59, 60, 61] A threshold can be set to flag papers with extremely poor language quality.
- Initial Plagiarism Scan: A preliminary check using plagiarism detection software (e.g., iThenticate, Turnitin) can flag manuscripts with substantial text overlap with existing publications, indicating potential plagiarism or duplicate submission.[14, 45, 47, 118]
The AI’s output at this stage should be a structured report summarizing the compliance checks and flagging any issues identified. This automation can significantly enhance efficiency, reducing the burden on editorial staff and speeding up the process for authors whose papers are clearly unsuitable.[7, 20, 44] However, while AI excels at these checks, relying solely on AI for desk rejection decisions carries risks. Assessing scope fit or borderline language quality often involves nuance that AI might miss.[38, 78] Therefore, the AI should primarily function as a screening assistant, providing flags and scores to inform human editorial judgment. Decisions to reject a manuscript at this stage, especially based on subjective criteria like scope or quality thresholds, should involve human validation to ensure fairness and avoid prematurely dismissing potentially valuable contributions. The AI’s role is to streamline the process, not to make autonomous rejection decisions without oversight.
B. Core Evaluation Phase: A Multi-faceted Analysis
If a manuscript passes the initial triage, it proceeds to the core evaluation phase, typically involving in-depth review by two or more external peer reviewers.[7, 33, 119] This phase aims to rigorously assess the scientific or scholarly merit of the work based on multiple established criteria.[1, 3, 5, 7, 8, 14, 16, 17, 22, 84, 33, 110, 116] An AI system assisting in this phase must be programmed to analyze the manuscript across these dimensions, recognizing its capabilities and limitations for each.
**Table 1: Core Peer Review Evaluation Criteria for AI Analysis (Journal Article Context)
Criterion | Description | Potential AI Role & Limitations | Relevant AI Techniques & Tools | Citations |
---|---|---|---|---|
1. Originality & Novelty | Does the research present new findings, methods, or perspectives? How does it advance the current state of knowledge in the field? Is the research question novel and significant? | Limited Evaluation: AI can perform literature searches to identify highly similar existing work (potential redundancy check) but struggles to evaluate conceptual novelty, theoretical contribution, or the significance of the advance. This requires deep domain knowledge and contextual understanding. | Semantic analysis, topic modeling, literature retrieval systems (e.g., Semantic Scholar, Scite), comparison with existing databases. | [5, 14, 22, 84, 33, 110] |
2. Significance & Impact | How important is the research question? What is the potential impact of the findings on the field, practice, or society? Is the work likely to be widely cited or influence future research? | Very Limited Evaluation: AI cannot reliably assess the broader impact or importance of research, which involves value judgments and predicting future influence. It might provide metrics on related literature citations but cannot evaluate inherent significance. | Citation analysis (limited applicability), potentially analyzing discussion/conclusion sections for claims of impact. | [5, 14, 22, 84, 33, 110, 112] |
3. Methodological Soundness | Is the study design appropriate for the research question? Are the methods clearly described and replicable? Are data collection and analysis techniques valid and correctly applied? Are limitations acknowledged? | Verification & Limited Evaluation: AI can verify adherence to reporting guidelines (CONSORT, PRISMA), check for statistical errors, identify inconsistencies in method description, and check for appropriate statistical tests based on data types described. It cannot judge the appropriateness of a chosen method for a specific complex problem or evaluate novel methodologies without prior training data. Human expertise is crucial for methodological judgment. | NLP for checklist compliance, statistical analysis tools (Statcheck, statreviewer), data consistency checks, rule-based systems. | [5, 14, 16, 22, 84, 33, 40, 47, 57, 72, 73, 74, 75, 76] |
4. Data Presentation & Analysis | Are the results clearly presented (text, tables, figures)? Is the statistical analysis appropriate and correctly interpreted? Do the data support the conclusions? | Verification & Limited Evaluation: AI can check for consistency between data in text, tables, and figures; verify statistical calculations (if raw data provided); flag potential misinterpretations (e.g., claiming significance when p > 0.05). It cannot assess the nuanced interpretation of results or judge the clinical/practical significance of statistical findings. | Data extraction, consistency checking algorithms, statistical analysis tools, NLP for analyzing interpretive statements. | [14, 22, 84, 33, 40, 47, 57, 67, 68, 69, 70, 71] |
5. Validity of Conclusions | Are the conclusions justified by the data presented? Are alternative interpretations considered? Are claims appropriately cautious and not overstated? | Limited Evaluation: AI can flag strong or unsupported claims by comparing conclusion statements against presented results/data using NLP. It struggles with assessing the reasonableness of interpretation, considering alternative explanations, or judging the strength of evidence required for a given claim. | NLP for claim extraction and comparison, sentiment analysis (for overstatement), argumentation mining (potential). | [5, 14, 22, 84, 33, 110] |
6. Clarity, Organization & Readability | Is the manuscript well-written, clearly structured, and easy to understand? Is the language precise and unambiguous? Is the flow logical? | Strong Verification & Assistance: AI excels at identifying grammatical errors, spelling mistakes, awkward phrasing, jargon, and issues with flow/cohesion. It can assess readability scores (e.g., Flesch-Kincaid). AI can suggest improvements for clarity. | Grammar/style checkers (Grammarly, Trinka), readability metrics, text summarization tools (for structure analysis). | [14, 22, 84, 33, 44, 45, 58, 59, 60, 61] |
7. Literature Review & Referencing | Is the relevant background literature adequately cited? Are references current and appropriate? Are citations formatted correctly? Does the work engage with existing scholarship? | Strong Verification & Limited Evaluation: AI can check reference formatting, verify citation existence (DOI lookups), detect missing citations (based on mentions), and identify potentially relevant uncited papers. It cannot judge the appropriateness or completeness of the literature review in terms of intellectual engagement. | Reference management software integration, DOI/metadata lookup tools, citation analysis tools (Scite), NLP for identifying mentions. | [14, 22, 84, 33, 40, 41, 62, 63, 64, 65, 66] |
8. Ethical Considerations | Are ethical approvals (IRB, animal care) mentioned and appropriate? Are conflicts of interest declared? Is patient privacy protected (if applicable)? Are there signs of data manipulation or questionable research practices? | Verification & Flagging: AI can check for the presence of required ethical statements and COI disclosures. It can flag statistically improbable data patterns (potential manipulation) or signs of image manipulation (if trained). It cannot make definitive ethical judgments, which require human interpretation and investigation. | NLP for statement checking, statistical anomaly detection, image analysis tools (potential). | [14, 84, 33, 40, 46, 120] |
9. Adherence to Reporting Guidelines | For specific study types (e.g., RCTs, systematic reviews), does the manuscript follow established reporting guidelines (e.g., CONSORT, PRISMA)? | Strong Verification: AI can effectively check manuscript sections against checklist items from standard reporting guidelines, identifying missing elements. | NLP-based checklist verification tools (e.g., tools integrated with Equator Network guidelines). | [72, 73, 74, 75, 76, 46] |
As the table illustrates, AI’s current capabilities align strongly with verification tasks: checking compliance, consistency, formatting, language, and basic statistical reporting. It is significantly weaker in evaluative tasks that require deep understanding, contextual judgment, and assessment of intellectual merit (originality, significance, methodological appropriateness, conclusion validity, ethical nuance).
Therefore, in the core evaluation phase, the AI’s role should be primarily to:
- Perform automated checks: Run through the verifiable criteria (e.g., reporting guidelines, stats check, reference validation, language check, data consistency).
- Highlight potential issues: Flag areas of concern based on its checks (e.g., “CONSORT item 5b missing,” “Potential statistical inconsistency between Table 2 and text,” “High similarity score in Introduction,” “Readability score below threshold”).
- Provide supporting evidence: Link its findings directly to relevant sections of the manuscript.
- Generate a structured report: Summarize its findings objectively for the human reviewer.
The human reviewer then uses this AI-generated report as input, alongside their own critical reading and domain expertise, to perform the deeper evaluative assessment. The AI handles the more tedious, rule-based checks, freeing the human expert to focus on the critical thinking aspects: judging the research’s value, the logic of the arguments, the suitability of the methods, and the importance of the conclusions. This synergistic approach leverages the strengths of both AI and human intelligence.
C. Synthesizing Findings and Formulating Recommendations
After the detailed analysis (combining human assessment and AI-generated checks), the reviewer must synthesize their findings into a coherent critique and provide a clear recommendation to the editor.[7, 14, 16, 22, 84, 33] This involves summarizing the manuscript’s strengths and weaknesses, offering constructive suggestions for improvement, and justifying the overall recommendation (e.g., accept, minor revision, major revision, reject).[7, 14, 16, 22, 84, 33]
AI can play a supporting role in this synthesis and recommendation phase, but its contribution is more limited and requires careful oversight:
- Structuring the Review: AI can help structure the reviewer’s comments by providing a standardized template based on the evaluation criteria (Section II.B).[46] It can organize the points identified during the analysis phase under the appropriate headings (e.g., Originality, Methodology, Clarity).
- Summarizing AI Checks: The AI can automatically generate summaries of its verification findings (e.g., “The manuscript failed checks for reporting guideline adherence [Items 5b, 12a] and data consistency [Figure 3 vs. text]. Language analysis identified 15 major grammatical errors.”). This ensures these objective points are included in the final review.
- Drafting Standard Phrases (Use with Caution): For common, objective issues identified by AI (e.g., formatting errors, missing standard sections), the AI could potentially draft standard sentences for the review (e.g., “The manuscript does not adhere to the journal’s specified citation format. Please revise according to the Instructions for Authors.”). However, this must be used cautiously and always reviewed by the human. Over-reliance on templated phrases can lead to generic, unhelpful reviews.
- Consistency Check within the Review: An AI could potentially check the reviewer’s own comments for internal consistency (e.g., ensuring the final recommendation aligns with the severity of the weaknesses identified in the body of the review).
Crucially, AI cannot and should not formulate the overall judgment or recommendation.[38, 46, 54, 78, 80, 81] This decision requires weighing the different strengths and weaknesses, considering the potential for revision, assessing the overall significance, and applying nuanced judgment based on experience and field-specific knowledge – all tasks beyond current AI capabilities. The recommendation (Accept, Revise, Reject) is fundamentally a human decision informed by the detailed analysis.
Furthermore, the tone and constructiveness of the review are paramount.[7, 14, 16, 32] Peer reviews should be critical but also collegial and aimed at helping the author improve their work.[7, 14, 16, 32] AI-generated text can sometimes lack this necessary tone or fail to provide genuinely constructive suggestions. Human reviewers must ensure the final review is appropriately worded, respectful, and provides clear, actionable guidance to the authors.
The output of this phase is the completed peer review report, incorporating both the human reviewer’s expert assessment and the verified findings from the AI assistant, structured logically and leading to a justified recommendation.
D. Reporting and Decision-Making (Editorial Phase)
The final stage involves submitting the review report(s) to the editor, who then weighs the evidence from all reviewers (and potentially their own assessment) to make an editorial decision.[7, 20, 33, 117]
AI can assist the editor in this phase by:
- Aggregating Reviewer Feedback: AI can parse multiple review reports, extracting key points, identifying areas of agreement and disagreement between reviewers, and summarizing the main strengths and weaknesses highlighted across all reviews.[43, 46, 57] This provides the editor with a quick overview.
- Checking Review Quality: AI could potentially be trained to flag reviews that are overly brief, lack specific examples, use unprofessional language, or fail to address key manuscript aspects, helping editors ensure review quality.[46]
- Cross-Referencing Reviews with AI Checks: The editor can compare the issues raised by human reviewers with the objective checks performed by the AI (from Stage A and B). Discrepancies might warrant closer examination (e.g., if a human reviewer praised the methodology but the AI flagged non-compliance with reporting guidelines).
- Drafting Decision Letters (Use with Extreme Caution): AI could potentially draft sections of the editorial decision letter, particularly summarizing the required revisions based on reviewer comments and AI checks.[46] However, the final decision and the core message of the letter must be determined and written/approved by the human editor. The tone, rationale, and specific instructions require editorial judgment. Over-automating decision letters risks depersonalizing the process and failing to provide adequate guidance to authors.
The ultimate editorial decision (Accept, Revise, Reject) remains a fundamentally human responsibility, requiring the integration of diverse inputs, expert judgment, consideration of the journal’s standards and scope, and sometimes difficult choices between conflicting reviewer opinions.[7, 20, 33, 117] AI can provide valuable summaries and cross-checks to support this decision-making process, but it cannot replace the editor’s final judgment and communication role.
III. Ethical Considerations and Limitations: Navigating the Human-AI Interface
Integrating AI into peer review introduces novel ethical challenges and necessitates careful consideration of its inherent limitations. Ensuring fairness, transparency, accountability, and maintaining the integrity of the scholarly process are paramount.[38, 39, 42, 46, 54, 56, 78, 79, 80, 81, 83, 120, 121, 122, 123, 124, 125, 126]
A. Bias in AI Algorithms: Risks and Mitigation Strategies
AI systems, particularly those based on machine learning, learn from data. If the training data reflects existing biases present in the scholarly literature or past review practices (e.g., gender, institutional, geographical, or topic biases), the AI can inherit and potentially amplify these biases.[38, 39, 42, 54, 78, 79, 81, 121, 122, 123, 124, 127] For example, an AI trained predominantly on research from specific institutions or countries might unfairly penalize work using different methodologies or addressing locally relevant topics. An AI language checker trained on native speaker text might excessively flag non-native English writing styles even if the meaning is clear.[59, 61] AI systems used for reviewer suggestions might perpetuate homophily (selecting reviewers similar to past authors/reviewers) if not carefully designed.[57, 77]
Risks:
- Systemic disadvantage for underrepresented groups or regions.[38, 78, 81, 121, 122]
- Reinforcement of dominant paradigms and suppression of novel or unconventional research.[38, 78, 123]
- Unfair assessment based on proxies like author affiliation or writing style rather than scientific merit.[54, 78, 121]
Mitigation Strategies:
- Diverse and Representative Training Data: Actively curate training datasets to ensure broad representation across demographics, geography, institutions, methodologies, and research topics.[38, 42, 78, 81, 121, 122, 124]
- Bias Auditing and Testing: Regularly audit AI models for performance disparities across different subgroups. Employ fairness metrics during development and testing.[38, 42, 78, 81, 121, 122, 124, 127]
- Transparency in AI Functioning: While full algorithmic transparency might be complex, provide clear explanations of what the AI checks, the data it was trained on (where feasible), and its known limitations.[38, 39, 42, 56, 78, 81, 121, 125]
- Focus on Objective Criteria: Prioritize AI tools for tasks based on objective, verifiable criteria (e.g., guideline adherence, reference checks) rather than subjective assessments of quality or significance, where bias is more likely to creep in.[46, 56, 78, 81]
- Human Oversight and Appeal: Ensure that AI outputs are always reviewed by humans and that there are clear processes for authors or reviewers to appeal decisions suspected of being influenced by AI bias.[39, 42, 54, 78, 80, 81, 83, 121, 125] Human judgment remains the final arbiter.
- Continuous Monitoring and Updating: Regularly monitor AI performance in real-world use and update models to address identified biases or performance issues.[38, 81, 121]
B. Confidentiality and Data Security: Protecting Intellectual Property
Peer review operates under strict confidentiality requirements. Manuscripts submitted for review contain unpublished data, novel ideas, and sensitive intellectual property.[14, 120, 125, 126] Introducing AI systems necessitates robust measures to protect this confidentiality.
Risks:
- Unauthorized access to or disclosure of manuscript content by the AI provider or through security breaches.[39, 42, 46, 125, 126]
- Potential use of manuscript data by AI developers to train future models without explicit author consent, potentially compromising novelty or giving competitors an advantage.[39, 42, 46, 125, 126]
- Accidental leakage of reviewer identities if AI systems link reviews back to individuals improperly.
Mitigation Strategies:
- Secure Data Handling Protocols: Implement state-of-the-art security measures, including data encryption (at rest and in transit), access controls, and secure servers for any platform handling manuscripts.[42, 46, 125, 126]
- Clear Data Usage Policies: Publishers and AI tool providers must have transparent policies outlining exactly how manuscript data will be used, stored, and protected. Policies should explicitly state whether manuscript content will be used for AI model training and require opt-in consent from authors/publishers for such use.[39, 42, 46, 54, 125, 126]
- Anonymization: Where feasible, manuscript metadata that could identify authors or institutions might be anonymized before processing by certain AI modules, although this can limit some checks. Reviewer identities must be strictly protected by the platform.[125]
- Non-Disclosure Agreements (NDAs): Ensure robust contractual agreements and NDAs are in place with any third-party AI service providers regarding data confidentiality and usage restrictions.[42, 125]
- On-Premises or Secure Cloud Solutions: Consider AI tools that can be run within the publisher’s secure environment rather than relying solely on external cloud processing, offering greater control over data.[125]
C. Accountability and Transparency: Who is Responsible?
When AI is involved in the review process, lines of accountability can become blurred.[39, 42, 54, 78, 81, 125] If an AI error contributes to an unfair rejection or the acceptance of a flawed paper, who is responsible? The AI developer? The publisher? The editor? The human reviewer who used the AI tool?
Challenges:
- “Black box” nature of some AI algorithms makes it difficult to understand why an AI made a specific flag or assessment.[38, 78, 81, 121, 125]
- Diffusion of responsibility, where human actors might overly rely on AI outputs without critical scrutiny.[54, 81, 83]
Mitigation Strategies:
- Human-in-the-Loop Principle: Reinforce that AI tools are assistants, and final responsibility for judgments and decisions rests with the human editors and reviewers.[39, 42, 54, 78, 80, 81, 83, 125] AI outputs should be treated as suggestions or flags requiring verification.
- Explainability (XAI): Prioritize AI tools that offer some level of explainability, indicating the basis for their findings (e.g., pointing to specific sentences or data points that triggered a flag).[38, 42, 78, 81, 121, 125]
- Clear Disclosure of AI Use: Journals and platforms should transparently disclose when and how AI tools are used in the peer review process to authors and reviewers.[39, 42, 54, 56, 78, 81, 125] This includes specifying which tasks are AI-assisted.
- Defined Roles and Responsibilities: Establish clear guidelines for editors and reviewers on how to appropriately use and interpret AI-generated reports, emphasizing the need for critical oversight.[42, 81, 125]
- Audit Trails: Maintain records of AI tool usage and outputs as part of the review process documentation, allowing for post-hoc analysis if issues arise.[42, 125]
D. The Irreplaceability of Human Judgment: Nuance, Context, and Significance
While AI excels at specific, often tedious, tasks, it fundamentally lacks the capabilities that define expert human review:
- Deep Subject Matter Expertise: Understanding the nuances, history, and current debates within a specific field.[38, 54, 56, 78, 80]
- Contextual Understanding: Assessing the appropriateness of methods or interpretation within the specific context of the research question and field.[38, 54, 78]
- Evaluation of Significance and Novelty: Judging the importance and originality of the research contribution, which is a value judgment based on extensive knowledge.[38, 41, 54, 56, 78, 80]
- Critical Thinking and Reasoning: Evaluating the logical flow of arguments, identifying subtle flaws in reasoning, and considering alternative interpretations.[38, 54, 78]
- Ethical Discernment: Recognizing potential ethical issues beyond simple checklist compliance (e.g., problematic framing, potential societal harm).[38, 46, 56, 78, 81]
- Creativity and Intuition: Recognizing unconventional but potentially groundbreaking ideas that might not fit existing patterns.[38, 78]
- Constructive Feedback: Providing nuanced, helpful suggestions for improvement that go beyond identifying errors.[7, 14, 16, 32]
Over-reliance on AI, especially for evaluative tasks, risks homogenizing research, overlooking significant contributions that don’t fit expected patterns, and failing to provide the developmental feedback that improves scholarship.[38, 54, 78, 79] It could lead to a focus on superficial compliance rather than substantive intellectual quality.
Therefore, the integration of AI must always preserve the central role of human judgment. AI should handle the ‘verifiable,’ freeing human experts for the ‘evaluative.’ The goal is a synergistic partnership where AI enhances efficiency and consistency in specific areas, allowing human reviewers and editors to focus their expertise on the critical assessment of scholarly merit.
IV. Conclusion: Towards Responsible AI-Augmented Peer Review
The integration of Artificial Intelligence into scholarly peer review presents both significant opportunities and substantial challenges. AI tools can enhance the efficiency, consistency, and speed of the process by automating specific, well-defined tasks, particularly those involving verification against established rules and guidelines.[38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 54, 55, 56, 57] This includes checks for scope, formatting, completeness, language quality, plagiarism, reference validity, data consistency, statistical reporting, and adherence to reporting standards. By handling these elements, AI can alleviate some of the burden on human reviewers and editors, potentially accelerating publication timelines and allowing experts to focus on more substantive assessment.[40, 41, 56, 82]
However, current AI technology has fundamental limitations. It lacks the deep domain expertise, nuanced contextual understanding, critical thinking abilities, and ethical discernment of human experts.[38, 39, 46, 54, 56, 78, 79, 80] AI struggles to evaluate the conceptual novelty, intellectual significance, methodological appropriateness, and validity of conclusions – the core evaluative judgments that determine scholarly merit. Consequently, the guiding principle for AI implementation must be augmentation, not replacement. AI should serve as an assistant, providing data and flagging potential issues, while the crucial tasks of interpretation, evaluation, judgment, and decision-making remain firmly in human hands.[39, 42, 46, 54, 56, 78, 80, 81, 83]
Successfully navigating this human-AI interface requires careful attention to ethical considerations.[38, 39, 42, 46, 54, 56, 78, 79, 80, 81, 83, 120, 121, 122, 123, 124, 125, 126] Mitigation strategies must be employed to address the risks of algorithmic bias, ensuring fairness and equity.[38, 42, 78, 81, 121, 122, 124, 127] Robust data security and clear policies are essential to protect the confidentiality of unpublished work.[39, 42, 46, 125, 126] Transparency about AI use and clear lines of accountability are needed, always emphasizing that human editors and reviewers retain ultimate responsibility.[39, 42, 54, 56, 78, 80, 81, 83, 125]
Furthermore, the diversity of peer review contexts – from journal articles and grant proposals to books and clinical guidelines – necessitates context-aware AI systems.[8, 9, 48, 84-109] A one-size-fits-all approach is insufficient; AI tools must be adaptable to the specific criteria and goals of different review scenarios.
The future of peer review likely involves a synergistic partnership between human experts and AI assistants. By leveraging AI for verification and efficiency gains while preserving human judgment for evaluation and critical assessment, we can strive towards a more robust, efficient, and ultimately trustworthy system of scholarly communication. Responsible development, deployment, and ongoing evaluation of AI tools, guided by ethical principles and a clear understanding of AI’s limitations, will be crucial for realizing this potential without compromising the integrity of the peer review process. The goal is not to automate peer review, but to intelligently augment it, enhancing the capacity of the scholarly community to evaluate and advance knowledge.
References
[1] Spier, R. (2002). The history of the peer-review process. Trends in Biotechnology, 20(8), 357–358.
https://doi.org/10.1016/S0167-7799(02)01985-6
[2] Burnham, J. C. (1990). The evolution of editorial peer review. JAMA, 263(10), 1323–1329.
https://doi.org/10.1001/jama.1990.03440100021003
[3] Tennant, J. P., et al. (2017). A multi-disciplinary perspective on emergent and future innovations in peer review. F1000Research, 6, 1151.
https://doi.org/10.12688/f1000research.12037.3
[4] Lee, C. J., Sugimoto, C. R., Zhang, G., & Cronin, B. (2013). Bias in peer review. Journal of the American Society for Information Science and Technology, 64(1), 2–17.
https://doi.org/10.1002/asi.22784
[5] Kelly, J., Sadeghieh, T., & Adeli, K. (2014). Peer review in scientific publications: benefits, critiques, & a survival guide. EJIFCC, 25(3), 227–243. PMCID: PMC4975196
[6] Jefferson, T., Rudin, M., Brodney Folse, S., & Davidoff, F. (2007). Editorial peer review for improving the quality of reports of biomedical studies. Cochrane Database of Systematic Reviews, (2), MR000016.
https://doi.org/10.1002/14651858.MR000016.pub3
[7] Ware, M. (2011). Peer review: benefits, perceptions and alternatives. Publishing Research Consortium.
https://publishingresearchconsortium.com/index.php/prc-documents/prc-research-projects/archived-projects/4-peer-review-benefits-perceptions-and-alternatives
[8] Horbach, S. P. J. M., & Halffman, W. (2019). The changing forms and expectations of peer review. Research Integrity and Peer Review, 4(1), 8.
https://doi.org/10.1186/s41073-019-0070-5
[9] Naylor, C. D. (1990). The role of the external peer review process. Clinical and Investigative Medicine, 13(5), 258–261. PMID: 2245518
[10] Godlee, F. (2002). Making reviewers visible: openness, accountability, and credit. JAMA, 287(21), 2762–2765.
https://doi.org/10.1001/jama.287.21.2762
[11] Mulligan, A. (2004). Is peer review in crisis? Perspectives in Publishing, 1, 1–5.
[12] Benos, D. J., Bashari, E., Chaves, J. M., Gaggar, A., Kapoor, N., LaFrance, M., … & Zotov, A. (2007). The ups and downs of peer review. Advances in Physiology Education, 31(2), 145–152.
https://doi.org/10.1152/advan.00104.2006
[13] Kumar, M. (2015). Peer review: A flawed process at the heart of science and journals. Journal of the Royal Society of Medicine, 108(5), 194–196.
https://doi.org/10.1177/0141076815580998
[14] Committee on Publication Ethics (COPE). (2017). COPE Ethical Guidelines for Peer Reviewers.
https://publicationethics.org/resources/guidelines/cope-ethical-guidelines-peer-reviewers
[15] Wager, E., & Kleinert, S. (2011). Responsible research publication: international standards for authors. In T. Mayer & N. Steneck (Eds.), Promoting research integrity in a global environment (pp. 309–316). World Scientific.
https://doi.org/10.1142/9789814340987_0033
[16] Hames, I. (2007). Peer review and manuscript management in scientific journals: Guidelines for good practice. Blackwell Publishing.
[17] Provenzale, J. M., & Stanley, R. J. (2005). A systematic guide to reviewing a manuscript. American Journal of Roentgenology, 185(4), 848–854.
https://doi.org/10.2214/AJR.05.0475
[18] Wessely, S. (1998). Peer review of grant applications: what do we know? The Lancet, 352(9124), 301–305.
https://doi.org/10.1016/S0140-6736(97)12218-6
[19] Smith, R. (2006). Peer review: a flawed process at the heart of science and journals. Journal of the Royal Society of Medicine, 99(4), 178–182.
https://doi.org/10.1258/jrsm.99.4.178
[20] Nature Publishing Group. (n.d.). Peer review policy. Retrieved April 29, 2025, from
https://www.nature.com/nature-portfolio/editorial-policies/peer-review
[21] van Rooyen, S., Godlee, F., Evans, S., Smith, R., & Black, N. (1999). Effect of blinding and unmasking on the quality of peer review: a randomized trial. JAMA, 281(15), 1438–1442.
https://doi.org/10.1001/jama.281.15.1438
[22] Elsevier. (n.d.). What is peer review? Retrieved April 29, 2025, from
https://www.elsevier.com/reviewers/what-is-peer-review
[23] Bornmann, L. (2011). Scientific peer review. Annual Review of Information Science and Technology, 45(1), 197–245.
https://doi.org/10.1002/aris.2011.1440450112
[24] Gasparyan, A. Y., Gerasimov, A. N., Voronov, A. A., & Kitas, G. D. (2014). Rewarding peer reviewers: maintaining the integrity of science communication. Journal of Korean Medical Science, 29(10), 1321–1325.
https://doi.org/10.3346/jkms.2014.29.10.1321
[25] Fyfe, A., Coate, K., Curry, S., Lawson, S., Moxham, N., & Røstvik, C. M. (2017). Untangling academic publishing: a history of the relationship between commercial interests, academic prestige and the circulation of research. Zenodo.
https://doi.org/10.5281/zenodo.1000672
[26] Moxham, N., & Fyfe, A. (2018). Paying for prestige? The origins and development of article processing charges. History of Science, 56(4), 462–488.
https://doi.org/10.1177/0073275318779861
[27] Armstrong, J. S. (1997). Peer review for journals: Evidence on quality control, fairness, and innovation. Science and Engineering Ethics, 3(1), 63–84.
https://doi.org/10.1007/s11948-997-0017-3
[28] Mahoney, M. J. (1977). Publication prejudices: An experimental study of confirmatory bias in the peer review system. Cognitive Therapy and Research, 1(2), 161–175.
https://doi.org/10.1007/BF01173636
[29] Lock, S. (1985). A difficult balance: Editorial peer review in medicine. Nuffield Provincial Hospitals Trust.
[30] Goodman, S. N., Altman, D. G., & George, S. L. (1998). Statistical reviewing policies of medical journals: caveat lector? Journal of General Internal Medicine, 13(11), 753–756.
https://doi.org/10.1046/j.1525-1497.1998.00228.x
[31] Schriger, D. L., Altman, D. G., & Vetter, J. A. (2010). Assessing the quality of research reports: the importance of methodological detail. Academic Emergency Medicine, 17(7), 756–761.
https://doi.org/10.1111/j.1553-2712.2010.00790.x
[32] Nobarany, S., & Booth, K. S. (2015). Peer review unbound: Making the implicit explicit in evaluating human-computer interaction research. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15) (pp. 2843–2852).
https://doi.org/10.1145/2702123.2702236
[33] Taylor & Francis Group. (n.d.). Understanding peer review. Retrieved April 29, 2025, from
https://authorservices.taylorandfrancis.com/publishing-your-research/peer-review/understanding-peer-review/
[34] Ziman, J. M. (1968). Public knowledge: An essay concerning the social dimension of science. Cambridge University Press.
[35] Marušić, A., Wager, E., Utrobičić, A., Rothstein, H. R., & Sambunjak, D. (2016). Interventions to prevent misconduct and promote integrity in research and publication. Cochrane Database of Systematic Reviews, (4), MR000038.
https://doi.org/10.1002/14651858.MR000038.pub2
[36] Whitley, R. (1984). The intellectual and social organization of the sciences. Oxford University Press.
[37] Merton, R. K. (1973). The sociology of science: Theoretical and empirical investigations. University of Chicago Press.
[38] Nature Editorial. (2023). AI tools can’t write papers, but they could change peer review. Nature, 621(7980), 668.
https://doi.org/10.1038/d41586-023-02980-y
[39] Brainard, J. (2023). Publishers navigate the tricky ethics of AI in peer review. Science.
https://doi.org/10.1126/science.adj7843
[40] Checco, A., Bracciale, L., Loreti, P., Pinfield, S., & Bianchi, G. (2021). AI-assisted peer review. Humanities and Social Sciences Communications, 8(1), 25.
https://doi.org/10.1057/s41599-020-00679-9
[41] Heaven, D. (2023). AI tools that write scientific papers could automate peer review. Nature.
https://doi.org/10.1038/d41586-023-00349-5
[42] Taylor & Francis Group. (2023). AI policy guidance. Retrieved April 29, 2025, from
https://authorservices.taylorandfrancis.com/editorial-policies/ai-policy-guidance/
[43] STM Association. (2023). STM AI for scholarly communications. Retrieved April 29, 2025, from
https://www.stm-assoc.org/initiatives/ai-for-scholarly-communications/
[44] Research Square Company. (n.d.). Annelid: AI manuscript assessment. Retrieved April 29, 2025, from
https://company.researchsquare.com/solutions/annelid
[45] CACTUS Communications. (n.d.). Penelope.ai: AI-powered manuscript checks. Retrieved April 29, 2025, from
https://penelope.ai/
[46] Flanagin, A., Bibbins-Domingo, K., Berkwits, M., & Christiansen, S. L. (2023). Artificial intelligence and scholarly publishing: navigating the future. JAMA, 329(13), 1053–1054.
https://doi.org/10.1001/jama.2023.4003
[47] Severin, A., Strinzel, M., Egger, M., Barros, T., Sokolov, A., Mouatt, J. Z., & Muller, S. M. (2022). Artificial intelligence for manuscript evaluation: testing the StatReviewer application. BMJ Open Science, 6(1), e100018.
https://doi.org/10.1136/bmjos-2021-100018
[48] Fang, F. C., & Casadevall, A. (2016). Research funding: The case for a modified lottery. mBio, 7(2), e00422-16.
https://doi.org/10.1128/mBio.00422-16
[49] Walker, R., & Rocha da Silva, P. (2015). Emerging trends in peer review—a survey. Frontiers in Neuroscience, 9, 169.
https://doi.org/10.3389/fnins.2015.00169
[50] Ali, P. A., & Watson, R. (2016). Peer review and the publication process. Journal of Advanced Nursing, 72(7), 1476–1483.
https://doi.org/10.1111/jan.12999
[51] Kovanis, M., Porcher, R., Ravaud, P., & Trinquart, L. (2016). The global burden of journal peer review in the biomedical literature: A scoping review. PLoS ONE, 11(11), e0166387.
https://doi.org/10.1371/journal.pone.0166387
[52] Publons. (2018). Global State of Peer Review. Clarivate Analytics.
https://publons.com/static/Publons-Global-State-Of-Peer-Review-2018.pdf
[53] Besançon, L., et al. (2021). Open up criteria for evaluating scientists. Nature, 598(7880), 257–259.
https://doi.org/10.1038/d41586-021-02760-z
[54] Thorp, H. H. (2023). ChatGPT is fun, but not an author. Science, 379(6630), 313.
https://doi.org/10.1126/science.adg7879
[55] Rodríguez-Ruiz, J., Mata-Rivera, M. F., García-Peñalvo, F. J., & Ramírez-Montoya, M. S. (2023). Artificial intelligence in peer review: A mapping study. Applied Sciences, 13(15), 8739.
https://doi.org/10.3390/app13158739
[56] Holm, S. (2023). AI assisted peer review: Potential and pitfalls. Journal of Medical Ethics, 49(12), 821–822.
https://doi.org/10.1136/jme-2023-109052
[57] Frontiers. (n.d.). AIRA: Artificial Intelligence Review Assistant. Retrieved April 29, 2025, from
https://www.frontiersin.org/about/review-process
[58] Grammarly. (n.d.). Grammarly for Institutions. Retrieved April 29, 2025, from
https://www.grammarly.com/institutions
[59] Writefull. (n.d.). Writefull for Publishers. Retrieved April 29, 2025, from
https://writefull.com/publishers
[60] CACTUS Communications. (n.d.). Trinka AI. Retrieved April 29, 2025, from
https://trinka.ai/
[61] Burukutla, S., Ananthanarayanan, G., & Raghu, D. (2021). Evaluating the efficacy of automated grammar checking tools for ESL writers. Journal of Educational Technology & Society, 24(3), 130–142.
[62] Scite.ai. (n.d.). Scite. Retrieved April 29, 2025, from
https://scite.ai/
[63] Cabanac, G., Labbé, C., & Magazinov, A. (2021). Tortured phrases: A systemic issue in scholarly communication. arXiv preprint arXiv:2107.06751.
https://doi.org/10.48550/arXiv.2107.06751
[64] Crossref. (n.d.). Reference Checking. Retrieved April 29, 2025, from
https://www.crossref.org/services/reference-checking/
[65] Recite. (n.d.). ReciteWorks. Retrieved April 29, 2025, from
https://reciteworks.com/
[66] Adnan, K., Davis, C., & Raban, D. R. (2020). Automatic reference checking: A survey of tools and techniques. Journal of Information Science, 46(3), 315–331.
https://doi.org/10.1177/0165551519833117
[67] Statcheck. (n.d.). The Statcheck web app. Retrieved April 29, 2025, from
http://statcheck.io/
[68] Nuijten, M. B., Hartgerink, C. H., van Assen, M. A., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48(4), 1205–1226.
https://doi.org/10.3758/s13428-015-0664-2
[69] Brown, N. J., & Heathers, J. A. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363–369.
https://doi.org/10.1177/1948550616673876
[70] Acuna, D. E., Allesina, S., & Kording, K. P. (2012). Future impact: Predicting scientific success. Nature, 489(7415), 201–202.
https://doi.org/10.1038/489201a
[71] Hardwicke, T. E., Serghiou, S., Janiaud, P., Danchev, V., Crüwell, S., Goodman, S. N., & Ioannidis, J. P. A. (2020). Calibrating the scientific ecosystem through meta-research. Annual Review of Statistics and Its Application, 7, 11–37.
https://doi.org/10.1146/annurev-statistics-031219-041104
[72] CONSORT Statement. (n.d.). CONSORT 2010 Checklist. Retrieved April 29, 2025, from
http://www.consort-statement.org/consort-2010
[73] PRISMA Statement. (n.d.). PRISMA 2020 Checklist. [invalid URL removed]
[74] Equator Network. (n.d.). Enhancing the QUAlity and Transparency Of health Research. Retrieved April 29, 2025, from
https://www.equator-network.org/
[75] Cobo, E., Cortés, J., Ribera, J. M., Cardellach, F., Selva-O’Callaghan, A., Kostov, B., … & Alonso-Coello, P. (2011). Effect of using reporting guidelines during peer review on the quality of final manuscripts submitted to a biomedical journal: a randomized controlled trial. PLoS Medicine, 8(4), e1000447.
https://doi.org/10.1371/journal.pmed.1000447
[76] Hair, K., Macleod, M. R., Sena, E. S., & IICARus Collaboration. (2019). A randomised controlled trial of an Intervention to Improve Compliance with the ARRIVE guidelines (IICARus). Research Integrity and Peer Review, 4, 12.
https://doi.org/10.1186/s41073-019-0071-4
[77] Hovet, C. J. (2023). Algorithmically matching manuscripts to reviewers: Possibilities, problems, and prospects. Research Ethics, 19(4), 485–503.
https://doi.org/10.1177/17470161231180333
[78] Van Noorden, R., & Perkel, J. M. (2022). AI and the future of scientific publishing. Nature, 611(7937), 660–663.
https://doi.org/10.1038/d41586-022-03780-w
[79] Wachter, S., Mittelstadt, B., & Russell, C. (2021). Why fairness cannot be automated: Bridging the gap between EU non-discrimination law and AI. Computer Law & Security Review, 41, 105567.
https://doi.org/10.1016/j.clsr.2021.105567
[80] Berenbaum, H. (2024). ChatGPT cannot review scientific manuscripts. Journal of Behavior Therapy and Experimental Psychiatry, 82, 101917.
https://doi.org/10.1016/j.jbtep.2023.101917
[81] Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389–399.
https://doi.org/10.1038/s42256-019-0088-8
[82] Vesnic-Alujevic, L., Nascimento, S., & Pólvora, A. (2020). Societal and ethical impacts of artificial intelligence: Critical notes on European policy frameworks. Telecommunications Policy, 44(6), 101962.
https://doi.org/10.1016/j.telpol.2020.101962
[83] Taddeo, M., & Floridi, L. (2018). How AI can be a force for good. Science, 361(6404), 751–752.
https://doi.org/10.1126/science.aat5991
[84] Wiley Author Services. (n.d.). A guide to peer review. Retrieved April 29, 2025, from
https://authorservices.wiley.com/Reviewers/journal-reviewers/how-to-perform-a-peer-review/step-by-step-guide-to-reviewing-a-manuscript.html
[85] National Institutes of Health (NIH). (n.d.). Peer Review Process. Retrieved April 29, 2025, from
https://grants.nih.gov/grants/peer-review.htm
[86] Germano, W. (2016). Getting It Published: A Guide for Scholars and Anyone Else Serious about Serious Books (3rd ed.). University of Chicago Press.
[87] Institute of Medicine (IOM). (2011). Clinical Practice Guidelines We Can Trust. National Academies Press.
https://doi.org/10.17226/13058
[88] O’Meara, K., Terosky, A. L., & Neumann, A. (2008). Faculty careers and work lives: A professional growth perspective. John Wiley & Sons.
[89] Harman, E. (1995). The Thesis and the Book. University of Toronto Press.
[90] GRADE Working Group. (n.d.). Grading of Recommendations Assessment, Development and Evaluation. Retrieved April 29, 2025, from
https://www.gradeworkinggroup.org/
[91] Association for Computing Machinery (ACM). (n.d.). Policy on Peer Review. Retrieved April 29, 2025, from
https://www.acm.org/publications/policies/peer-review
[92] European Research Council (ERC). (n.d.). Peer Review Evaluation. Retrieved April 29, 2025, from
https://erc.europa.eu/document-library/information-material/peer-review-evaluation
[93] Association of University Presses (AUPresses). (n.d.). Acquisitions Editorial. Retrieved April 29, 2025, from
https://aupresses.org/resources/acquisitions-editorial/
[94] National Science Foundation (NSF). (n.d.). Merit Review Process. Retrieved April 29, 2025, from
https://www.nsf.gov/bfa/dias/policy/merit_review/
[95] American Association of University Professors (AAUP). (1940). Statement of Principles on Academic Freedom and Tenure. Retrieved April 29, 2025, from
https://www.aaup.org/report/1940-statement-principles-academic-freedom-and-tenure
[96] Thompson, J. B. (2005). Books in the Digital Age: The Transformation of Academic and Higher Education Publishing in Britain and the United States. Polity Press.
[97] Scottish Intercollegiate Guidelines Network (SIGN). (n.d.). SIGN 50: A Guideline Developer’s Handbook. Retrieved April 29, 2025, from
https://www.sign.ac.uk/our-guidelines/sign-50-a-guideline-developers-handbook/
[98] Guyatt, G. H., Oxman, A. D., Vist, G. E., Kunz, R., Falck-Ytter, Y., Alonso-Coello, P., … & GRADE Working Group. (2008). GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ, 336(7650), 924–926.
https://doi.org/10.1136/bmj.39489.470347.AD
[99] Bland, C. J., Center, B. A., Finstad, D. A., Risbey, K. R., & Staples, J. G. (2006). The Research Productive Department: Strategies from Departments That Excel. Anker Publishing Company.
[100] Braxton, J. M. (Ed.). (2000). Reworking the Student Departure Puzzle. Vanderbilt University Press.
[101] Menges, R. J., & Associates. (1999). Faculty in New Jobs: A Guide to Settling In, Becoming Established, and Building Institutional Support. Jossey-Bass.
[102] Tierney, W. G., & Bensimon, E. M. (1996). Promotion and Tenure: Community and Socialization in Academe. SUNY Press.
[103] Meyer, D. E. (1990). Peer review of conference submissions. The Psychological Record, 40(1), 39–48.
https://doi.org/10.1007/BF03395345
[104] Powell, W. W. (1985). Getting into Print: The Decision-Making Process in Scholarly Publishing. University of Chicago Press.
[105] Brouwers, M. C., Kho, M. E., Browman, G. P., Burgers, J. S., Cluzeau, F., Feder, G., … & AGREE Next Steps Consortium. (2010). AGREE II: advancing guideline development, reporting and evaluation in health care. CMAJ, 182(18), E839–E842.
https://doi.org/10.1503/cmaj.090449
[106] Patel, J. (2016). A guide to peer reviewing conference papers. Annals of Cardiac Anaesthesia, 19(3), 451–453.
https://doi.org/10.4103/0971-9784.185535
[107] Centra, J. A. (1977). How universities evaluate faculty performance: A survey of department heads. GRE Board Research Report No. 75-5bR. Educational Testing Service.
[108] Shekelle, P. G., Woolf, S. H., Eccles, M., & Grimshaw, J. (1999). Developing clinical guidelines. Western Journal of Medicine, 170(6), 348–351. PMCID: PMC1305659
[109] Woolf, S. H., Grol, R., Hutchinson, A., Eccles, M., & Grimshaw, J. (1999). Clinical guidelines: potential benefits, limitations, and harms of clinical guidelines. BMJ, 318(7182), 527–530.
https://doi.org/10.1136/bmj.318.7182.527
[110] Springer Nature. (n.d.). How to peer review. Retrieved April 29, 2025, from
https://www.springernature.com/gp/reviewers/how-to-peer-review
[111] Kaplan, D. M., Lacetera, N., & Kaplan, C. (2008). Sample size and precision in NIH peer review. PLoS ONE, 3(7), e2761.
https://doi.org/10.1371/journal.pone.0002761
[112] Gallo, S. A., Carpenter, A. S., Irwin, D., McPartland, C. D., Travis, J., & Largent, E. A. (2014). The validation of peer review through research impact measures and the implications for funding strategies. PLoS ONE, 9(8), e106474.
https://doi.org/10.1371/journal.pone.0106474
[113] Cicchetti, D. V. (1991). The reliability of peer review for manuscript and grant submissions: A cross-disciplinary investigation. Behavioral and Brain Sciences, 14(1), 119–135.
https://doi.org/10.1017/S0140525X0006892X
[114] Luey, B. (2010). Handbook for Academic Authors (5th ed.). Cambridge University Press.
[115] Rigby, D. (2001). Peer review and the acceptance of conference papers. Journal of the American Society for Information Science and Technology, 52(8), 676–680.
https://doi.org/10.1002/asi.1117
[116] Association for Computing Machinery (ACM). (2017). Reviewing for ACM Journals and Conferences. Retrieved April 29, 2025, from
https://www.acm.org/publications/authors/instructions-for-reviewers
[117] Schultz, D. M. (2010). Rejection rates for journals publishing in the atmospheric sciences. Bulletin of the American Meteorological Society, 91(2), 231–243.
https://doi.org/10.1175/2009BAMS2908.1
[118] Turnitin. (n.d.). iThenticate. Retrieved April 29, 2025, from
https://www.turnitin.com/products/ithenticate
[119] Peters, D. P., & Ceci, S. J. (1982). Peer-review practices of psychological journals: The fate of published articles, submitted again. Behavioral and Brain Sciences, 5(2), 187–195.
https://doi.org/10.1017/S0140525X0001162X
[120] Committee on Publication Ethics (COPE). (n.d.). Core Practices. Retrieved April 29, 2025, from
https://publicationethics.org/core-practices
[121] Holstein, K., Dodds, P., & Shneiderman, B. (2021). Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21) (Article 405, pp. 1–18).
https://doi.org/10.1145/3411764.3445595
[122] Noble, S. U. (2018). Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press.
[123] Hagendorff, T. (2020). The ethics of AI ethics: An evaluation of guidelines. Minds and Machines, 30(1), 99–120.
https://doi.org/10.1007/s11023-020-09517-8
[124] Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT ’19) (pp. 59–68).
https://doi.org/10.1145/3287560.3287598
[125] Resnik, D. B., & Master, Z. (2023). Machine learning and scientific peer review: Ethical issues and recommendations. Accountability in Research, 30(7), 449–465.
https://doi.org/10.1080/08989621.2022.2155211
[126] Hvistendahl, M. (2013). China’s publication bazaar. Science, 342(6162), 1035–1039.
https://doi.org/10.1126/science.342.6162.1035
[127] Barocas, S., & Selbst, A. D. (2016). Big data’s disparate impact. California Law Review, 104(3), 671–732.