Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Webology is an international peer-reviewed journal in English devoted to the field of the World Wide Web and serves as a forum for discussion and experimentation. It serves as a forum for new research in information dissemination and communication processes in general, and in the context of the World Wide Web in particular. This paper presents a Scientometric analysis of the Webology Journal. The paper analyses the pattern of growth of the research output published in the journal, pattern of authorship, author productivity, and subjects covered to the papers over the period (2013-2017). It is found that 62 papers were published during the period of study (2013-2017). The maximum numbers of articles were collaborative in nature. The subject concentration of the journal noted was Social Networking/Web 2.0/Library 2.0 and Scientometrics or Bibliometrics. Iranian researchers contributed the maximum number of articles (37.10%). The study applied standard formula and statistical tools to bring out the factual result.
Multimodal approaches have shown great promise for searching and navigating digital collections held by libraries, archives, and museums. In this paper, we introduce map-RAS: a retrieval-augmented search system for historic maps. In addition to introducing our framework, we detail our publicly-hosted demo for searching 101,233 map images held by the Library of Congress. With our system, users can multimodally query the map collection via ColPali, summarize search results using Llama 3.2, and upload their own collections to perform inter-collection search. We articulate potential use cases for archivists, curators, and end-users, as well as future work with our system in both machine learning and the digital humanities. Our demo can be viewed at: http://www.mapras.com.
This study employs scientometric methods to assess the research output and performance of the University of Ibadan from 2014 to 2023. By analyzing publication trends, citation patterns, and collaboration networks, the research aims to comprehensively evaluate the university's research productivity, impact, and disciplinary focus. This article's endeavors are characterized by innovation, interdisciplinary collaboration, and commitment to excellence, making the University of Ibadan a significant hub for cutting-edge research in Nigeria and beyond. The goal of the current study is to ascertain the influence of the university's research output and publication patterns between 2014 and 2023. The study focuses on the departments at the University of Ibadan that contribute the most, the best journals for publishing, the nations that collaborate, the impact of citations both locally and globally, well-known authors and their total production, and the research output broken down by year. According to the university's ten-year publication data, 7159 papers with an h-index of 75 were published between 2014 and 2023, garnering 218572 citations. Furthermore, the VOSviewer software mapping approach is used to illustrate the stenographical mapping of data through graphs. The findings of this study will contribute to understanding the university's research strengths, weaknesses, and potential areas for improvement. Additionally, the results will inform evidence-based decision-making for enhancing research strategies and policies at the University of Ibadan.
The development of synthesis procedures remains a fundamental challenge in materials discovery, with procedural knowledge scattered across decades of scientific literature in unstructured formats that are challenging for systematic analysis. In this paper, we propose a multi-modal toolbox that employs large language models (LLMs) and vision language models (VLMs) to automatically extract and organize synthesis procedures and performance data from materials science publications, covering text and figures. We curated 81k open-access papers, yielding LeMat-Synth (v 1.0): a dataset containing synthesis procedures spanning 35 synthesis methods and 16 material classes, structured according to an ontology specific to materials science. The extraction quality is rigorously evaluated on a subset of 2.5k synthesis procedures through a combination of expert annotations and a scalable LLM-as-a-judge framework. Beyond the dataset, we release a modular, open-source software library designed to support community-driven extension to new corpora and synthesis domains. Altogether, this work provides an extensible infrastructure to transform unstructured literature into machine-readable information. This lays the groundwork for predictive modeling of synthesis procedures as well as modeling synthesis--structure--property relationships.
This study investigates how different approaches to disciplinary classification represent the Social Sciences and Humanities (SSH) in the Flemish VABB-SHW database. We compare organizational classification (based on author affiliation), channel-based cognitive classification (based on publication venues), and text-based publication-level classification (using channel titles, publication titles, and abstracts, depending on availability). The analysis shows that text-based classification generally aligns more closely with channel-based categories, confirming that the channel choice provides relevant information about publication content. At the same time, it is closer to organizational classification than channel-based categories are, suggesting that textual features capture author affiliations more directly than publishing channels do. Comparison across the three systems highlights cases of convergence and divergence, offering insights into how disciplines such as "Sociology" and "History" extend across fields, while "Law" remains more contained. Publication-level classification also clarifies the disciplinary profiles of multidisciplinary journals in the database, which in VABB-SHW show distinctive profiles with stronger emphases on SSH and health sciences. At the journal level, fewer than half of outlets with more than 50 publications have their channel-level classification fully or partially supported by more than 90% of publications. These results demonstrate the added value of text-based methods for validating classifications and for analysing disciplinary dynamics.
Scientific publishing is facing an alarming proliferation of fraudulent practices that threaten the integrity of research communication. The production and dissemination of fake research have become a profitable business, undermining trust in scientific journals and distorting the evaluation processes that depend on them. This brief piece examines the problem of fake journals through a three-level typology. The first level concerns predatory journals, which prioritise financial gain over scholarly quality by charging authors publication fees while providing superficial or fabricated peer review. The second level analyses hijacked journals, in which counterfeit websites impersonate legitimate titles to deceive authors into submitting and paying for publication. The third level addresses hacked journals, where legitimate platforms are compromised through cyberattacks or internal manipulation, enabling the distortion of review and publication processes. Together, these forms of misconduct expose deep vulnerabilities in the scientific communication ecosystem, exacerbated by the pressure to publish and the marketisation of research outputs. The manuscript concludes that combating these practices requires structural reforms in scientific evaluation and governance. Only by reducing the incentives that sustain the business of fraudulent publishing can the scholarly community restore credibility and ensure that scientific communication fulfils the essential purpose of reliable advancement of knowledge.
Screenshots of social media posts are a common approach for information sharing. Unfortunately, before sharing a screenshot, users rarely verify whether the attribution of the post is fake or real. There are numerous legitimate reasons to share screenshots. However, sharing screenshots of social media posts is also a vector for mis-/disinformation spread on social media. We are exploring methods to verify the attribution of a social media post shown in a screenshot, using resources found on the live web and in web archives. We focus on the use of web archives, since the attribution of non-deleted posts can be relatively easily verified using the live web. We show how information from a Twitter screenshot (Twitter handle, timestamp, and tweet text) can be extracted and used for locating potential archived tweets in the Internet Archive's Wayback Machine. We evaluate our method on a dataset of 1,571 single tweet screenshots.
The IMU-ICIAM working group's new report on Fraudulent Publishing in the Mathematical Sciences documents how gaming of bibliometrics, predatory outlets and paper-mill activity are eroding trust in research, mathematics included. This short EMS note brings that analysis home to Europe. We urge readers to recognise the warning signs of fraudulent publishing, to report serious irregularities so that they can be investigated and sanctioned, and to reflect critically on their own editorial and reviewing practices. We then sketch why Europe is well placed to lead a structural response: a decade of policy development on open science; mature infrastructures for data, software and scholarly communication; and new capacity for community-led diamond open access. Finally, we outline developments towards non-print contributions across member countries including the growth of formal proofs (e.g. with Lean and Isabelle) and we highlight the role of zbMATH Open as a European quality signal that can help editors, reviewers and authors steer clear of problematic venues.
Purpose: It has become increasingly likely that Large Language Models (LLMs) will be used to score the quality of academic publications to support research assessment goals in the future. This may cause problems for fields with competing paradigms since there is a risk that one may be favoured, causing long term harm to the reputation of the other. Design/methodology/approach: To test whether this is plausible, this article uses 17 ChatGPTs to evaluate up to 100 journal articles from each of eight pairs of competing sociology paradigms (1490 altogether). Each article was assessed by prompting ChatGPT to take one of five roles: paradigm follower, opponent, antagonistic follower, antagonistic opponent, or neutral. Findings: Articles were scored highest by ChatGPT when it followed the aligning paradigm, and lowest when it was told to devalue it and to follow the opposing paradigm. Broadly similar patterns occurred for most of the paradigm pairs. Follower ChatGPTs displayed only a small amount of favouritism compared to neutral ChatGPTs, but articles evaluated by an opposing paradigm ChatGPT had a substantial disadvantage. Research limitations: The data covers a single field and LLM. Practical implications: The results confirm that LLM instructions for research evaluation should be carefully designed to ensure that they are paradigm-neutral to avoid accidentally resolving conflicts between paradigms on a technicality by devaluing one side's contributions. Originality/value: This is the first demonstration that LLMs can be prompted to show a partiality for academic paradigms.
Assessing published academic journal articles is a common task for evaluations of departments and individuals. Whilst it is sometimes supported by citation data, Large Language Models (LLMs) may give more useful indications of article quality. Evidence of this capability exists for two of the largest LLM families, ChatGPT and Gemini, and the medium sized LLM Gemma3 27b, but it is unclear whether smaller LLMs and reasoning models have similar abilities. This is important because larger models may be slow and impractical in some situations, and reasoning models may perform differently. Four relevant questions are addressed with Gemma3 variants, Llama4 Scout, Qwen3, Magistral Small and DeepSeek R1, on a dataset of 2,780 medical, health and life science papers in 6 fields, with two different gold standards, one novel. The results suggest that smaller (open weights) and reasoning LLMs have similar performance to ChatGPT 4o-mini and Gemini 2.0 Flash, but that 1b parameters may often, and 4b sometimes, be too few. Moreover, averaging scores from multiple identical queries seems to be a universally successful strategy, and few-shot prompts (four examples) tended to help but the evidence was equivocal. Reasoning models did not have a clear advantage. Overall, the results show, for the first time, that smaller LLMs >4b, including reasoning models, have a substantial capability to score journal articles for research quality, especially if score averaging is used.
We show that large language models (LLMs) can be used to distinguish the writings of different authors. Specifically, an individual GPT-2 model, trained from scratch on the works of one author, will predict held-out text from that author more accurately than held-out text from other authors. We suggest that, in this way, a model trained on one author's works embodies the unique writing style of that author. We first demonstrate our approach on books written by eight different (known) authors. We also use this approach to confirm R. P. Thompson's authorship of the well-studied 15th book of the Oz series, originally attributed to F. L. Baum.
HIKMA Semi-Autonomous Conference is the first experiment in reimagining scholarly communication through an end-to-end integration of artificial intelligence into the academic publishing and presentation pipeline. This paper presents the design, implementation, and evaluation of the HIKMA framework, which includes AI dataset curation, AI-based manuscript generation, AI-assisted peer review, AI-driven revision, AI conference presentation, and AI archival dissemination. By combining language models, structured research workflows, and domain safeguards, HIKMA shows how AI can support - not replace traditional scholarly practices while maintaining intellectual property protection, transparency, and integrity. The conference functions as a testbed and proof of concept, providing insights into the opportunities and challenges of AI-enabled scholarship. It also examines questions about AI authorship, accountability, and the role of human-AI collaboration in research.
This paper presents a novel task of extracting Latin fragments from mixed-language historical documents with varied layouts. We benchmark and evaluate the performance of large foundation models against a multimodal dataset of 724 annotated pages. The results demonstrate that reliable Latin detection with contemporary models is achievable. Our study provides the first comprehensive analysis of these models' capabilities and limits for this task.
While traditionally not considered part of the scientific method, science communication is increasingly playing a pivotal role in shaping scientific practice. Researchers are now frequently compelled to publicise their findings in response to institutional impact metrics and competitive grant environments. This shift underscores the growing influence of media narratives on both scientific priorities and public perception. In a current trend of personality-driven reporting, we examine patterns in science communication that may indicate biases of different types, towards topics and researchers. We focused and applied our methodology to a corpus of media coverage from three of the most prominent scientific media outlets: Wired, Quanta, and The New Scientist -- spanning the past 5 to 10 years. By mapping linguistic patterns, citation flows, and topical convergence, our objective was to quantify the dimensions and degree of bias that influence the credibility of scientific journalism. In doing so, we seek to illuminate the systemic features that shape science communication today and to interrogate their broader implications for epistemic integrity and public accountability in science. We present our results with anonymised journalist names but conclude that personality-driven media coverage distorts science and the practice of science flattening rather than expanding scientific coverage perception. Keywords : selective sourcing, bias, scientific journalism, Quanta, Wired, New Scientist, fairness, balance, neutrality, standard practices, distortion, personal promotion, communication, media outlets.
This paper presents a comprehensive scientometric analysis of the long-term impact of the 1979 Iranian Revolution on the nation scientific development. Using Scopus-indexed data from 1960 to 2024, we benchmark Iran publication trajectory against a carefully selected peer group representing diverse development models, established scientific leaders, Netherlands, stable regional powers, Israel, and high-growth, Asian Tigers, South Korea, Taiwan, Singapore alongside Greece and China. The analysis reveals a stark divergence, in the late 1970s, Iran scientific output surpassed that of South Korea, China and Taiwan. The revolution, however, precipitated a collapse, followed by a lost decade of stagnation, precisely when its Asian peers began an unprecedented, state driven ascent. We employ counterfactual models based on pre revolutionary growth trends to quantify the resulting knowledge deficit. The findings suggest that, in an alternate, stable timeline, Iran scientific output could have rivaled South Korea today. We further outline a research agenda to analyze normalized impact metrics, such as FWCI, and collaboration patterns, complementing our findings on publication volume. By contextualizing Iran unique trajectory, this study contributes to a broader understanding of the divergent recovery patterns exhibited by national scientific systems following profound political shocks, offering insights into the enduring consequences of historical disruptions on the global scientific landscape.
Contextual metadata is the unsung hero of research data. When done right, standardized and structured vocabularies make your data findable, shareable, and reusable. When done wrong, they turn a well intended effort into data cleanup and curation nightmares. In this paper we tackle the surprisingly tricky process of vocabulary standardization with a mix of practical advice and grounded examples. Drawing from real-world experience in contextual data harmonization, we highlight common challenges (e.g., semantic noise and concept bombs) and provide actionable strategies to address them. Our rules emphasize alignment with Findability, Accessibility, Interoperability, and Reusability (FAIR) principles while remaining adaptable to evolving user and research needs. Whether you are curating datasets, designing a schema, or contributing to a standards body, these rules aim to help you create metadata that is not only technically sound but also meaningful to users.
Analyzing origin-destination flows is an important problem that has been extensively investigated in several scientific fields, particularly by the visualization community. The problem becomes especially challenging when involving massive data, demanding mechanisms such as data aggregation and interactive filtering to make the exploratory process doable. However, data aggregation tends to smooth out certain patterns, and deciding which data should be filtered is not straightforward. In this work, we propose ORDENA, a visual analytic tool to explore origin and destination data. ORDENA is built upon a simple and intuitive scatter plot where the horizontal and vertical axes correspond to origins and destinations. Therefore, each origin-destination flow is represented as a point in the scatter plot. How the points are organized in the plot layout reveals important spatial phenomena present in the data. Moreover, ORDENA provides explainability resources that allow users to better understand the relation between origin-destination flows and associated attributes. We illustrate ORDENA's effectiveness in a set of case studies, which have also been elaborated in collaboration with domain experts. The proposed tool has also been evaluated by domain experts not involved in its development, which provided quite positive feedback about ORDENA.
This study of literature focusing on 'AI Policy' over the past decade, found that citations of preprints, publications on platforms such as arXiv, have increased from five percent to forty percent across three major regions: the U.S., U.K. & E.U., and South Korea. We compare regional responses of preprint citations across the global disruptions of COVID-19 and the release of ChatGPT. We discuss driving factors and risks of preprint normalization, which follows the trend in computer science.