While individual AI-powered translations are useful for targeted updates, modern translation management systems often need to handle large-scale translation tasks efficiently. This article explores how we implemented bulk translation capabilities in our Translation Management System, leveraging ChatGPT's structured output capabilities and parallel processing for optimal performance.
The bulk translation system combines several key components:
A simple user interface in the translation dashboard
A robust backend endpoint for handling translation requests
Parallel processing for improved performance
Structured response handling using Zod schemas
Flexible output handling with draft support
The bulk translation feature is integrated into the translation dashboard through a simple, focused interface:
const AddMissingTranslationsButton = () => {
const [isLoading, setIsLoading] = useState(false);
const handleAddMissingTranslations = async () => {
setIsLoading(true);
try {
const response = await fetch("/api/ui-strings/translate/missing", {
method: "get",
});
if (!response.ok) {
throw new Error("Failed to add missing translations");
}
} catch (error) {
console.error("Error adding missing translations:", error);
} finally {
setIsLoading(false);
window.location.reload();
}
};
return (
<button onClick={handleAddMissingTranslations} disabled={isLoading}>
{isLoading ?
"Working hard at translating everything..." :
"Add All Missing Translations"}
</button>
);
};
This minimalist interface belies the sophisticated processing happening behind the scenes. The single button triggers a comprehensive workflow that:
Identifies missing translations across all content
Processes them in efficient batches
Handles the results appropriately based on system settings
The bulk translation endpoint handles the complex task of coordinating multiple translations:
export const bulkTranslateEndpoint: Endpoint = {
path: "/translate/missing",
method: "get",
handler: async (req, res) => {
if (!rbacHas(ROLE_EDITOR)({ req })) {
return res.status(401).send("Unauthorized");
}
// Fetch all UI strings and system settings
const result = await req.payload.find({
collection: "ui-strings",
locale: "all",
limit: 0,
});
const settings = await req.payload.findGlobal({
slug: "payload-settings",
});
// Process and filter strings needing translation
const cleanedUp = result.docs
.map(({ id, description, text }) => ({ id, description, text }))
.filter(({ text }) => Object.keys(text).length !== locales.length);
// ... translation processing
}
};
A key feature of our implementation is the use of Zod schemas to ensure properly structured responses:
const zodTerms = {};
const missingTranslations = cleanedUp.slice(0, 100).map(({ id, text, description }) => {
const missing = locales.filter((locale) => !text[locale]);
const translations = {};
missing.forEach((locale) => {
translations[locale] = z.string();
});
zodTerms[id] = z.object({
id: z.string(),
translations: z.object(translations),
});
return {
id,
text: text[defaultLocale],
description,
missing,
};
});
const Translations = z.object({
translations: z.object(zodTerms),
});
This schema-based approach ensures:
Type safety throughout the translation process
Properly structured responses from the AI
Easy validation of translation results
To handle large numbers of translations efficiently, we implemented a parallel processing system:
async function parallelTranslate(
missingTranslations,
Translations,
settings,
batchSize = 5
) {
const batchId = crypto.randomBytes(4).toString("hex");
// Split into batches
const batches = [];
for (let i = 0; i < missingTranslations.length; i += batchSize) {
batches.push(missingTranslations.slice(i, i + batchSize));
}
// Process batches in parallel
const batchResults = await Promise.all(
batches.map(async (batch, index) => {
const result = await translate(batch, Translations, settings);
return result;
})
);
// Combine results
const combinedTranslations = batchResults.reduce((acc, result) => {
Object.entries(result.translations).forEach(([key, value]) => {
acc[key] = value;
});
return acc;
}, {});
return {
batchId,
result: { translations: combinedTranslations },
performance: {
totalItems: missingTranslations.length,
batchCount: batches.length,
},
};
}
This parallel processing approach provides:
Improved throughput for large translation sets
Better error isolation (failed batches don't affect others)
Progress tracking through batch identifiers
Performance metrics for system monitoring
The system supports two modes of operation, controlled through settings:
Direct publication of translations
Creation of translation drafts for review
async function addAsDrafts(translations, existingDoc, req) {
const newDrafts = Object.entries(translations).reduce((acc, [locale, text]) => {
if (!acc[locale]) acc[locale] = [];
// Only add if different from existing translation
if (existingDoc.text[locale] !== text) {
acc[locale].push({
text,
id: uuidv4(),
lastModifiedBy: null,
});
}
return acc;
}, {});
// Merge with existing drafts
const updatedDrafts = Object.keys(newDrafts).reduce((acc, locale) => {
acc[locale] = [
...(existingDoc.drafts?.[locale] || []),
...newDrafts[locale]
];
return acc;
}, {});
// Update each locale
for (const [locale, drafts] of Object.entries(updatedDrafts)) {
if (drafts.length === new Set(drafts.map(d => d.text)).size) {
await req.payload.update({
collection: slug,
id: existingDoc.id,
data: { drafts },
locale,
user: req.user,
});
}
}
}
This flexibility allows organizations to:
Fast-track translations in development environments
Implement review processes in production
Maintain quality control through draft reviews
Track translation changes over time
Several areas have been identified for future improvement:
Enhanced Error Recovery
Retry logic for failed batches
Partial success handling
Detailed error reporting
Performance Optimization
Dynamic batch sizing
Priority queue implementation
Caching of common translations
Quality Assurance
Automated quality metrics
Consistency checking
Context-aware validation
User Interface Improvements
Progress indicators
Batch-level control
Result preview
The bulk translation system demonstrates how modern AI capabilities can be effectively scaled for production use. By combining parallel processing, structured responses, and flexible output handling, we've created a system that can efficiently handle large-scale translation tasks while maintaining quality control and system stability.
Key benefits include:
Efficient handling of large translation volumes
Robust error handling and validation
Flexible deployment options
Performance monitoring and optimization
Integration with existing workflow tools
This implementation provides a foundation for automated translation management that can evolve with changing requirements and technological capabilities.
Pas encore de commentaires, soyez le premier :
© 2024 par Moritz Thomas