Manufacturing News Articles from Medical Abstracts

Using LLMs to automate the authoring of news reports from PubMed content.

  ·  11 min read

Introduction #

What if we could mass-manufacture news on an assembly line like frozen meals or fast fashion?

I have pondered this since my PhD days, during which I sat through a talk by a UK broadsheet science editor about how to get scientific discoveries into the press. He impressed on the audience that news reporting follows a certain limited, structured format. If we communicated our discoveries in that format, we stood a better chance for them to be picked up and delivered to a wider audience by the overworked journalists. Almost anything could be re-written into a news-style article, given a sufficiently skilled journalist, we were told. He further explained that press releases work to this principle, by taking information about events or whatever and reframing it in a standard news format to be picked up for mainstream consumption.

I have since wondered if we could take those techniques (which journalists take years to refine), and automate them. Over the years, I have heard rumours of news publishers experimenting with automated news production. Since LLMs have reached the point where experimentation is possible at home, I wanted to try it myself.

Just this month, February 2026, The Washington Post laid off one third of its reporters. I expect that automation played a part in the decision to cull this many staff.

Prototyping #

I used an idealised assembly line as a guiding analogy:

Raw materials → Pre-processing → Assembly → QA → Package for distribution

Starting from the beginning, I needed a source of interesting “raw” items of information to act as the source materials for the manufacturing process.

I already had something in mind from my PhD days: PubMed. This is a free search engine from the U.S. National Library of Medicine for finding research papers on medicine and the life sciences, which indexes article citations and abstracts and can be queried programmatically via an API.

A typical PubMed entry looks like:

PMID- 37716787
OWN - NLM
STAT- MEDLINE
DCOM- 20230918
LR  - 20230921
IS  - 1526-3231 (Electronic)
IS  - 0749-8063 (Linking)
VI  - 39
IP  - 10
DP  - 2023 Oct
TI  - Radiographic and Dynamic Assessment for Resection of Cam Lesions in Patients With Femoroacetabular Impingement.
PG  - 2119-2121
LID - S0749-8063(23)00384-5 [pii]
LID - 10.1016/j.arthro.2023.04.019 [doi]
AB  - Cam-type femoroacetabular impingement is characterized by a pathologic asphericity of the femoral head-neck junction, and arthroscopic femoral osteoplasty is indicated to correct the bony abnormality and restore normal hip mechanics when symptomatic. Residual femoroacetabular impingement deformity after arthroscopy is a leading cause of failure, and it is therefore critical to perform a thorough fluoroscopic and dynamic assessment when addressing cam deformities arthroscopically. The fluoroscopic assessment uses 6 anteroposterior views, including 3 in hip extension (30° internal rotation, neutral rotation, and 30° external rotation) and 3 in 50° flexion (neutral rotation, 40° external rotation, 60° of external rotation), performed before, during, and after the femoral resection. The dynamic assessment includes evaluation of impingement-free range of motion and "end feel" (a subjective description of the tactile feedback during assessment of hip motion), and should be performed before and after the femoral resection in 3 specific positions (extension/abduction, flexion/abduction, and flexion/internal rotation). Although the anterior aspect of the head-neck junction is readily accessed through standard arthroscopic portals with the hip in 30 to 50° of flexion, the posterolateral, posteromedial, and posterior extent of the femoral head-neck junction are challenging to address. The natural external rotation of the proximal femur during flexion and internal rotation during extension can be used to gain posterior lateral and medial access. Antero/posteromedial femoral access can be obtained with >50° of hip flexion with the burr in the anteromedial portal. Posterolateral femoral access is achieved with hip extension with the burr in the anterolateral portal, and further posterolateral access can be achieved with the addition of traction, allowing resection of posterolateral deformities extending beyond the lateral retinacular vessels while remaining proximal to the vessels. This comprehensive intraoperative fluoroscopic and dynamic assessment and surgical technique can lead to a predictable correction of most cam-type deformities.
CI  - Copyright © 2023 Arthroscopy Association of North America. Published by Elsevier Inc. All rights reserved.
FAU - Larson, Christopher M
AU  - Larson CM
AD  - Twin Cities Orthopedics, Edina-Crosstown, Minnesota, U.S.A.. Electronic address: chrislarson@tcomn.com.
FAU - Faucett, Scott C
AU  - Faucett SC
AD  - Centers for Advanced Orthopaedics LLC, Washington, DC, U.S.A.
FAU - Floyd, Edward R
AU  - Floyd ER
AD  - University of North Dakota School of Medicine, Grand Forks, North Dakota, U.S.A.
FAU - Geeslin, Andrew G
AU  - Geeslin AG
AD  - University of Vermont, Larner College of Medicine, Burlington, Vermont, U.S.A.
LA  - eng
PT  - Journal Article
PT  - Research Support, Non-U.S. Gov't
PL  - United States
TA  - Arthroscopy
JT  - Arthroscopy : the journal of arthroscopic & related surgery : official publication of the Arthroscopy Association of North America and the International Arthroscopy Association
JID - 8506498
SB  - IM
MH  - Humans
MH  - *Femoracetabular Impingement/diagnostic imaging/surgery
MH  - *Plastic Surgery Procedures
MH  - Femur
MH  - Femur Head
MH  - Rotation
EDAT- 2023/09/17 00:41
MHDA- 2023/09/18 12:43
CRDT- 2023/09/16 21:02
PHST- 2023/03/04 00:00 [received]
PHST- 2023/04/03 00:00 [accepted]
PHST- 2023/09/18 12:43 [medline]
PHST- 2023/09/17 00:41 [pubmed]
PHST- 2023/09/16 21:02 [entrez]
AID - S0749-8063(23)00384-5 [pii]
AID - 10.1016/j.arthro.2023.04.019 [doi]
PST - ppublish
SO  - Arthroscopy. 2023 Oct;39(10):2119-2121. doi: 10.1016/j.arthro.2023.04.019.

That obviously does not look much like a news article. However, it might contain a newsworthy hook that an LLM with the right instructions can turn into news article. PubMed offers a continually updated source of high quality peer reviewed scientific research, with novel discoveries waiting to be reported on.

I coded up a simple proof-of-concept in a Python notebook to explore if I could successfully turn PubMed entries into something resembling a Reuters Science article, even just roughly. It enabled me to search PubMed by keywords (e.g. “asthma complications”) and return the top 50 ranked matching records, filtered to publication types such as clinical trials and experimental studies.

I then developed a prompt (see Appendix) to transform the content of a selected abstract and use it to create a news report. The prompt took a considerable amount of trial and error to put together, since I have no journalistic training nor experience. I used my best judgment and arrived at a prompt that created consistent and satisfactory results.

I felt I could improve the performance of the prompt by preferring more “readable” abstracts. I introduced re-ranking by a computed readability score (Dale Chall) to increase the likelihood of surfacing an easier text to transform to news. This method uses a set of 3000 words that a school child should be familiar with to compute and overall difficulty score for a text. Easier-to-read texts seemed to improve the performance of the news report creation, although I did not do the work to quantify it and so this became part of the creation pipeline.

At this point, the output looked like very credible news reporting to me. To illustrate, the table below shows the headlines taken from generated stories originating from this process, accompaned by the original keyword(s) used in the PubMed searches.

Generated HeadlinePubMed Keyword(s)
Microplastics found in placentas and umbilical cords during Brazilian pregnanciesmicroplastics human placenta
High consumption of ultra-processed food linked to increased risk of depression in older adults, study findsultra processed food depression
New Study Sheds Light on Factors Behind Hard-to-Treat High Blood Pressuresleep mortality
Long-term Air Pollution Linked to Higher Dementia Hospitalisation Risk, Study Findsair pollution dementia
Study links prenatal exposure to PFAS chemicals with brain changes in young childrenPFAS pregnancy
Exercise shifts molecular signals in long COVID patients, UK study findslong covid
Semaglutide could benefit most heart attack survivors, study findssemaglutide cardiovascular
High-nicotine e-cigarettes linked to improved blood fat profiles in smokers, study findse-cigarette lung function young adults
New Tool Links Teens’ Digital Habits to Mental Health—Outperforms Simple Screen Time Measuresadolescent screen time depression
Drug-resistant E. coli clone found in Chennai lake sparks health concernsantibiotic resistance wastewater

They look like very plausible headlines, all of them based on the original contents of the abstract. However, it became clear to me that visual presentation plays a large part in how credible we perceive news.

Visualisation #

Assisted by OpenAI’s Codex, I took the existing code and created a simple local web-based version that could give the articles some styling, and a workflow for searching PubMed and generating content. You can get hold of the code in the GitHub repository linked in the Appendix.

The screenshots below show how some styling can improve the credibility of news reporting:

Alt text

Alt text

Alt text

Alt text

The user can search PubMed, retrieve results and then choose which ones to generate, as in the screenshot below:

Alt text

There is also an admin view to manage and publish to a gallery, which I chose not to develop fully, simply because I didn’t need it:

Alt text

Summary & Next Steps #

All of this can only be described as just a proof-of-concept. I am not a journalist and I am sure I could refine the prompt even further with the assistance of one. I would also like to evaluate the output properly, since I simply eyeballed it, and deemed it good enough.

This kind of application could also benefit from fine-tuning the underlying LLM.

Taking the same assembly-line approach to content generation, I would like to explore producing different styles of news targeted for particular audices, such as producing broadsheet or tabloid news.

I would also like to explore using different outputs, such as the continually updated content from the Parliamentary Archives.

Appendix #

Repository #

Prompt #

The prompt used for the initial prototype is given below (the most recent version in the repository may differ):

You are an expert UK news copywriter. Your job is to take the “KERNEL” (raw notes or draft text) and rewrite it into a professional, high-engagement news story focused on narrative craft and clarity (not legalities, not ethics commentary, not media criticism).

INPUT YOU WILL RECEIVE
KERNEL:
{kernel}

OUTPUT YOU MUST PRODUCE (in this exact order)
0) FRAME (see below in B)
1) HEADLINE (1 line)
2) STANDFIRST (1 sentence)
3) STORY (300–700 words, unless the kernel is too small—then write the strongest complete version possible)
4) OPTIONAL: “What happens next” (1–2 sentences with the next concrete step/date if available)

STYLE TARGET
- UK general-audience readability: plain English, short sentences, minimal-to-no jargon (define any necessary term once).
- High engagement through pacing, specificity, and a clear reader path. No sensationalism, no clickbait.
- Neutral tone: avoid loaded adjectives/adverbs and avoid implying motives. Attribute opinions with “X says…” / “Critics say…”.
- Scannable layout: short paragraphs (often 1–2 sentences), strong rhythm, no dense blocks.

HOW TO TURN THE KERNEL INTO A STORY (DO THIS INTERNALLY)
A) FIND THE ENGINE (write one internal sentence before drafting):
“This matters because ____ will now ____, affecting/costing ____.”
If you can’t fill it, infer the most likely practical “so what” from the kernel without inventing facts.

B) CHOOSE ONE FRAME (pick one and keep it consistent early):
- Ordinary life disrupted
- Promise vs reality
- Winners and losers
- Hidden cost
- Behind the numbers

C) USE THIS 6-BEAT STRUCTURE
1. Hook lede (1–2 short paragraphs): the sharpest true thing, concrete and specific.
2. Nut graf (1 paragraph): what it means and why now (“so what”).
3. Impact (2–4 paragraphs): who is affected and how (money/time/access).
4. How we got here (2–4 paragraphs): key cause/decision/trigger.
5. Response (2–4 paragraphs): main counterview + official reply (use a quote ladder if quotes exist).
6. What happens next + kicker (1–2 paragraphs): next date/decision; land on a concrete detail.

SENTENCE-LEVEL CRAFT RULES
- Put the actor first: “The council announced…” not “It was announced…”.
- Prefer verbs over abstract nouns: “cut spending” not “deliver savings”.
- One idea per paragraph. If you use “and”, check if it should be two sentences/paragraphs.
- Adjectives are salt: delete any that don’t add verified meaning.
- Numbers must mean something: give the figure, then translate it into practical terms (without exaggeration).

MOMENTUM (USE LIGHTLY)
Create forward pull by delaying one key detail (cost/when/who decided) by 1–3 paragraphs максимум, but do not withhold it longer.

QUOTES (IF PRESENT)
Use a “quote ladder”:
1) 1 short quote for human impact (if available)
2) 1 short quote for explanation (official reasoning)
3) 1 short quote for accountability/pushback (critics/opposition)
Keep quotes short; paraphrase the rest. Never stack long quotes.

FACT DISCIPLINE
- Do not invent facts, numbers, dates, quotes, or events.
- If something is unclear in the kernel, write it neutrally as uncertain (“The council has not yet said…”, “Details have not been confirmed…”).
- If the kernel contains opinions, keep them as attributed claims, not narrator statements.

HEADLINE/Standfirst PATTERNS (choose one)
- “[Place] to [do X] as [reason] — [who it affects]”
- “New [fee/rule/plan] means [practical impact] for [group]”
- “[Decision] set to change [everyday thing] for [people] in [place]”

FINAL CHECK
- First 2 paragraphs tell the reader what happened and why it matters.
- Paragraphs are short and scannable.
- Tone is neutral; impact is concrete.
- Ending gives the next step/date or the most solid “kicker” detail.
- Avoid em-dashes.

Now rewrite the KERNEL accordingly.