{"id":8026,"date":"2026-04-08T13:48:26","date_gmt":"2026-04-08T11:48:26","guid":{"rendered":"https:\/\/www.rawk.at\/?p=6612"},"modified":"2026-05-31T08:21:30","modified_gmt":"2026-05-31T08:21:30","slug":"captioner-image-captioning-for-ai-training","status":"publish","type":"post","link":"https:\/\/new.rawk.at\/?p=8026","title":{"rendered":"Captioner &#8211; Image Captioning for AI Training"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"8026\" class=\"elementor elementor-8026\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section data-particle_enable=\"false\" data-particle-mobile-disabled=\"false\" class=\"elementor-section elementor-top-section elementor-element elementor-element-b6c6347 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"b6c6347\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-cf25290\" data-id=\"cf25290\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-82f4f0e elementor-widget elementor-widget-text-editor\" data-id=\"82f4f0e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t\t\t\t\t\t<p><strong>Type:<\/strong> Software <br \/><strong>Status:<\/strong> Beta<\/p><p><strong>Tech Stack: <\/strong>Python, FastAPI, Uvicorn, Transformers (Florence-2, JoyCaption), PyTorch, React, Vite, TypeScript<\/p><p><strong>Problem Statement<\/strong><\/p><p>Training AI image models (LoRA, DreamBooth, fine-tuning) requires image datasets with high-quality text descriptions. Manual captioning is extremely time-consuming, and automatic tools rarely offer a visual gallery view, multi-model support, or the ability to switch between caption formats (full text, tags, short description). Batch processing with subsequent manual correction in an integrated workflow is missing.<\/p><p><strong>Description<\/strong><\/p><p>A lean tool for creating image descriptions for AI training datasets. Images are displayed in a React gallery, captions saved as same-name .txt files. Supports auto-captioning via Florence-2 and JoyCaption (NSFW-capable) with selectable format (full text, short text, tags). Batch save all captions with one click.<\/p><p><strong>Use Case<\/strong><\/p><p><em>Tag images with descriptions so AI models can learn from them.<\/em><\/p><p><strong>Link: <\/strong><a href=\"https:\/\/github.com\/rawk7000\/captioner\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/rawk7000\/captioner<\/a> (private repo)<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-856d7fd elementor-widget elementor-widget-gallery\" data-id=\"856d7fd\" data-element_type=\"widget\" data-e-type=\"widget\" data-settings=\"{&quot;columns&quot;:3,&quot;gap&quot;:{&quot;unit&quot;:&quot;px&quot;,&quot;size&quot;:25,&quot;sizes&quot;:[]},&quot;lazyload&quot;:&quot;yes&quot;,&quot;gallery_layout&quot;:&quot;grid&quot;,&quot;columns_tablet&quot;:2,&quot;columns_mobile&quot;:1,&quot;gap_tablet&quot;:{&quot;unit&quot;:&quot;px&quot;,&quot;size&quot;:10,&quot;sizes&quot;:[]},&quot;gap_mobile&quot;:{&quot;unit&quot;:&quot;px&quot;,&quot;size&quot;:10,&quot;sizes&quot;:[]},&quot;link_to&quot;:&quot;file&quot;,&quot;aspect_ratio&quot;:&quot;3:2&quot;,&quot;overlay_background&quot;:&quot;yes&quot;,&quot;content_hover_animation&quot;:&quot;fade-in&quot;}\" data-widget_type=\"gallery.default\">\n\t\t\t\t\t\t\t<div class=\"elementor-gallery__container\">\n\t\t\t\t\t\t\t<a class=\"e-gallery-item elementor-gallery-item elementor-animated-content\" href=\"https:\/\/new.rawk.at\/wp-content\/uploads\/2026\/04\/01-8.jpg\" data-elementor-open-lightbox=\"yes\" data-elementor-lightbox-slideshow=\"856d7fd\" data-elementor-lightbox-title=\"01\" data-e-action-hash=\"#elementor-action%3Aaction%3Dlightbox%26settings%3DeyJpZCI6ODYyMSwidXJsIjoiaHR0cHM6XC9cL25ldy5yYXdrLmF0XC93cC1jb250ZW50XC91cGxvYWRzXC8yMDI2XC8wNFwvMDEtOC5qcGciLCJzbGlkZXNob3ciOiI4NTZkN2ZkIn0%3D\">\n\t\t\t\t\t<div class=\"e-gallery-image elementor-gallery-item__image\" data-thumbnail=\"https:\/\/new.rawk.at\/wp-content\/uploads\/2026\/04\/01-8-768x378.jpg\" data-width=\"768\" data-height=\"378\" aria-label=\"\" role=\"img\" ><\/div>\n\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-gallery-item__overlay\"><\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-25e4391 elementor-widget elementor-widget-gallery\" data-id=\"25e4391\" data-element_type=\"widget\" data-e-type=\"widget\" data-settings=\"{&quot;columns&quot;:2,&quot;lazyload&quot;:&quot;yes&quot;,&quot;gallery_layout&quot;:&quot;grid&quot;,&quot;columns_tablet&quot;:2,&quot;columns_mobile&quot;:1,&quot;gap&quot;:{&quot;unit&quot;:&quot;px&quot;,&quot;size&quot;:10,&quot;sizes&quot;:[]},&quot;gap_tablet&quot;:{&quot;unit&quot;:&quot;px&quot;,&quot;size&quot;:10,&quot;sizes&quot;:[]},&quot;gap_mobile&quot;:{&quot;unit&quot;:&quot;px&quot;,&quot;size&quot;:10,&quot;sizes&quot;:[]},&quot;link_to&quot;:&quot;file&quot;,&quot;aspect_ratio&quot;:&quot;3:2&quot;,&quot;overlay_background&quot;:&quot;yes&quot;,&quot;content_hover_animation&quot;:&quot;fade-in&quot;}\" data-widget_type=\"gallery.default\">\n\t\t\t\t\t\t\t<div class=\"elementor-gallery__container\">\n\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Type: Software Status: Beta Tech Stack: Python, FastAPI, Uvicorn, Transformers (Florence-2, JoyCaption), PyTorch, React, Vite, TypeScript Problem Statement Training AI image models (LoRA, DreamBooth, fine-tuning) requires image datasets with high-quality text descriptions. Manual captioning is extremely time-consuming, and automatic tools rarely offer a visual gallery view, multi-model support, or the ability to switch between caption [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":8621,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-8026","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-projects"],"_links":{"self":[{"href":"https:\/\/new.rawk.at\/index.php?rest_route=\/wp\/v2\/posts\/8026","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/new.rawk.at\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/new.rawk.at\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/new.rawk.at\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/new.rawk.at\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8026"}],"version-history":[{"count":4,"href":"https:\/\/new.rawk.at\/index.php?rest_route=\/wp\/v2\/posts\/8026\/revisions"}],"predecessor-version":[{"id":8690,"href":"https:\/\/new.rawk.at\/index.php?rest_route=\/wp\/v2\/posts\/8026\/revisions\/8690"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/new.rawk.at\/index.php?rest_route=\/wp\/v2\/media\/8621"}],"wp:attachment":[{"href":"https:\/\/new.rawk.at\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8026"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/new.rawk.at\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8026"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/new.rawk.at\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8026"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}