Fine-Tuning a Language Model to Write Like Manzoni

Style transfer is one of the more interesting problems in natural language processing because it sits at the intersection of semantic preservation and form transformation. Manzoni AI is my attempt to solve a specific instance of it: rewriting contemporary Italian into the prose style of Alessandro Manzoni’s I Promessi Sposi.

This post covers the dataset, the training setup, and what the model actually does — including where it works well and where it breaks down.

Why this is hard

The core difficulty is not the model — it is the data. A style-transfer model trained on generic text will produce generic output. What you need is a parallel corpus: aligned sentence pairs expressing the same content in both registers. For Manzonian Italian, no such corpus existed.

The second difficulty is that Manzoni’s style is not just archaic vocabulary. It is specific syntactic structures, long subordinate clauses, particular rhetorical devices, a rhythm that is instantly recognizable to anyone who has read the novel carefully. A model that only picks up on surface-level archaisms will produce something that reads as parody.

Building the dataset

I built the corpus through a reverse distillation process:

Take the full text of I Promessi Sposi (38 chapters; the introduction was excluded because it contains a seventeenth-century manuscript excerpt with a different register from the novel itself).
Send each chapter to an external LLM (deephermes-3-mistral-24b, available via free API during its preview phase) with a carefully engineered prompt requesting B1-level modern Italian rewrites — plain language, short sentences, no summaries, no commentary, just the rewritten text.
Review the outputs to remove hallucinations, language mixing, and prompt artifacts.
Store aligned pairs in CSV format: one column modern, one column Manzonian.

The prompt took several iterations to get right. The model had a tendency to add explanatory sentences, skip difficult passages, or occasionally switch languages. The final prompt was explicit about every constraint:

Rewrite the following excerpt in modern, simple Italian. Keep every sentence. Use easy words and short phrases, as if speaking to a friend. No comments, only the rewritten text. B1 level vocabulary. No misplaced capital letters.

Even with good prompting, a validation pass was necessary.

Training setup

The base model is Gemma 3 4B IT in full precision (non-quantized). I tested quantized variants of LLaMA 3 8B but saw noticeable degradation in stylistic consistency, so I accepted the higher VRAM cost of full precision on the smaller model.

Fine-tuning used LoRA via the Unsloth library, running on a Tesla T4 GPU (Google Colab free tier, ~14.7 GB VRAM). LoRA injects low-rank adaptation matrices into the model’s existing weight matrices — only those matrices are updated during training, keeping the original weights frozen. This makes fine-tuning feasible on hardware that would otherwise be insufficient for a 4B parameter model.

Key configuration: effective batch size of 8 (2 per device, 4 gradient accumulation steps), learning rate 2e-4, 250 training steps, AdamW 8-bit optimizer. The loss showed a consistent downward trend with the expected small-batch fluctuations.

What the model does

At the standard configuration (temperature 0.85, top-p 0.82), the model reliably produces output in a Manzonian register: complex subordinate structures, appropriate archaic vocabulary, and the syntactic rhythm of the original.

A representative transformation:

Input: Anche se tutti speravano che le cose cambiassero, ogni giorno sembrava uguale al precedente. La gente camminava per le strade con il volto stanco, parlava poco e guardava lontano come se aspettasse qualcosa.

Output: Ognuno, per dir la verità, sognava che le cose potessero cambiare; ma ogni giorno era, pareva a tutti, uguale al giorno che gli precedeva. Passeggiavano per le strade, con il volto stanco, a poca parola, e con lo sguardo lontano, come se aspettassero qualcosa.

The output uses the imperfect subjunctive correctly, restructures the sentence with period-appropriate punctuation (the semicolon as a structural device rather than a pause), and replaces “guardava lontano” with the more literary “con lo sguardo lontano”. These are not random archaisms — they are specific features of Manzoni’s syntax.

Temperature and top-p effects

I ran a systematic sweep across temperature and top-p values on the same input. The results are predictable in direction but interesting in degree:

Low temperature, low top-p (e.g., 0.1 / 0.11): highly conservative and repetitive output, closely mirrors the training distribution
High temperature, moderate top-p (e.g., 0.73 / 0.57): more inventive lexical choices, greater syntactic variation, still stylistically coherent
High temperature, very low top-p (e.g., 0.73 / 0.1): constrains token selection to high-probability candidates even with high temperature — output stabilizes and becomes more conservative than the raw temperature would suggest

The Manzonian stylistic markers — archaic verb forms, characteristic punctuation, subordinate clause rhythms — appear consistently across all configurations tested.

Where it breaks down

The model is a style-transfer system, not a general-purpose literary agent. When given pragmatic prompts (“help me write an apology letter to my teacher”), it attempts to respond but in a limited, stylistically-constrained way — it stays in register but doesn’t fully engage with the task. This is expected and desirable behavior given the training objective.

Cross-linguistic transfer (English, French inputs) shows partial stylistic bleeding: archaic lexical choices appear in the output (“boughs” instead of “branches”, “every morn” instead of “every morning”), but the effect is weaker and occasionally produces unnatural constructions. The model was trained exclusively on Italian; cross-linguistic generalization is a side-effect, not a design goal.

A closing note

What surprised me most was how quickly 250 training steps on a constrained dataset produced recognizable Manzonian output. The base model already has strong Italian language capabilities; LoRA fine-tuning steered it toward a specific stylistic attractor without destabilizing its broader competence.

The model cannot yet generate new Manzonian prose from scratch in an open-ended way — it transforms input rather than composing from a minimal prompt. That distinction maps onto something I find genuinely interesting: the difference between stylistic imitation and stylistic invention. The former is a pattern-matching problem. The latter is still, for now, a human one.

The full methodology, dataset construction process, training pipeline, and evaluation results are documented in the linked academic paper.

Read the report