{"id":146,"date":"2020-08-15T08:51:54","date_gmt":"2020-08-15T08:51:54","guid":{"rendered":"https:\/\/habett.fr\/blog\/?p=146"},"modified":"2020-08-16T07:41:24","modified_gmt":"2020-08-16T07:41:24","slug":"noutfmb4","status":"publish","type":"post","link":"https:\/\/habett.fr\/blog\/2020\/08\/noutfmb4\/","title":{"rendered":"NoUTF8MB4"},"content":{"rendered":"\n<p>Dans un job d&rsquo;indexation de donn\u00e9es externes, essentiellement RSS mais aussi tweets (cf poste pr\u00e9c\u00e9dent), je me suis retrouv\u00e9 bien en difficult\u00e9 d&rsquo;encodage, collation et stockage dans une base de contenus avec des emojis. J&rsquo;avais pourtant pass\u00e9 la table d&rsquo;utf8 \u00e0 utf8mb4 mais cela ne marchait pas (peut-\u00eatre un bug dans DBI mais je pense pas car j&rsquo;ai d\u00e9j\u00e0 r\u00e9ussi ailleurs). Apr\u00e8s pas mal de recherches, j&rsquo;ai eu l&rsquo;id\u00e9e de ce patch qui a pour objet de garder les emojis sous forme d&rsquo;entit\u00e9s hautes tout en conservant le texte en bas (les entiti\u00e9s nomm\u00e9es) de l&rsquo;unicode en direct.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">$s = encode_entities($s);\n(@entits) = $s =~ \/(&amp;.*?;)\/g;\nforeach $entit (@entits) {\n  $deco = encode('utf-8',decode_entities($entit));\n  $s =~ s\/$entit\/$deco\/g unless($entit =~ \/&amp;#x\/);\n}\n<\/pre>\n\n\n\n<p>Plus qu&rsquo;\u00e0 d\u00e9coder pour injection dans solR, ou conserver tel quel lors de l&rsquo;inclusion dans le HTML. Petit hack qui \u00e9vite bien des tracas.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Dans un job d&rsquo;indexation de donn\u00e9es externes, essentiellement RSS mais aussi tweets (cf poste pr\u00e9c\u00e9dent), je me suis retrouv\u00e9 bien en difficult\u00e9 d&rsquo;encodage, collation et stockage dans une base de contenus avec des emojis. J&rsquo;avais pourtant pass\u00e9 la table d&rsquo;utf8 \u00e0 utf8mb4 mais cela ne marchait pas (peut-\u00eatre un bug dans DBI mais je pense &hellip; <a href=\"https:\/\/habett.fr\/blog\/2020\/08\/noutfmb4\/\" class=\"more-link\">Continuer la lecture de <span class=\"screen-reader-text\">NoUTF8MB4<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25,31],"tags":[],"class_list":["post-146","post","type-post","status-publish","format-standard","hentry","category-code","category-encode"],"_links":{"self":[{"href":"https:\/\/habett.fr\/blog\/wp-json\/wp\/v2\/posts\/146","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/habett.fr\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/habett.fr\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/habett.fr\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/habett.fr\/blog\/wp-json\/wp\/v2\/comments?post=146"}],"version-history":[{"count":5,"href":"https:\/\/habett.fr\/blog\/wp-json\/wp\/v2\/posts\/146\/revisions"}],"predecessor-version":[{"id":151,"href":"https:\/\/habett.fr\/blog\/wp-json\/wp\/v2\/posts\/146\/revisions\/151"}],"wp:attachment":[{"href":"https:\/\/habett.fr\/blog\/wp-json\/wp\/v2\/media?parent=146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/habett.fr\/blog\/wp-json\/wp\/v2\/categories?post=146"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/habett.fr\/blog\/wp-json\/wp\/v2\/tags?post=146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}