Jekyll2024-03-12T14:01:38+00:00https://www.allo-media.net/feed.xmlAllo-Media{"fr"=>"Allo-Media valorise le contenu des appels pour le marketing avec son Cookie Vocal et améliore l’expérience client avec sa plateforme de Speech Analytics.", "en"=>"Allo-Media provides an AI platform based on Call Tracking, Automatic Natural Language Recognition, and Speech Analytics that helps you convert your calls into the right actions."}Allo-MediaAn event driven architecture — part 32022-09-26T00:00:00+00:002022-09-26T00:00:00+00:00https://www.allo-media.net/en/tech/architecture/2022/09/26/eda-architecture3<p>In this new post, let’s talk about the actual implementation of <a href="/en/tech/architecture/2022/02/02/eda-architecture2.html">core principles</a> the event driven architecture.</p>
<h2 id="topology">Topology</h2>
<p>The topology describes how the bus is implemented on the message broker.</p>
<p>As message broker, we chose <a href="https://www.rabbitmq.com/">RabbitMQ</a> for its reliability record and ease of use.</p>
<p>In RabbitMQ, the topology is set up by instantiating <em>exchanges</em> and <em>queues</em>.</p>
<p>Exchanges are kind of routers, and queues are bound — using subscriptions to particular <em>routing keys</em> — to them by client applications to store their messages waiting for processing. It is a good practice to consider the queues as private to the logical service (but are shared by the workers of that logical service). We use the message type name as routing key.</p>
<p>The topology is made of three exchanges:</p>
<ul>
<li>a reliable <em>events</em> exchange with:
<ul>
<li>persistent consumer queues (survive broker and service restarts)</li>
<li>persistent messages (survive broker and service restarts)</li>
<li>message processing acknowledgment by clients</li>
<li>message confirmation by broker (for early detection of network transient failures)</li>
<li><em>topic</em> routing (to allow wildcard monitoring)</li>
</ul>
</li>
<li>a reliable <em>commands</em> exchange with:
<ul>
<li>persistent consumer queues (survive broker and service restarts)</li>
<li>persistent messages (survive broker and service restarts)</li>
<li>message processing acknowledgment by clients</li>
<li>message confirmation by broker (for early detection of network transient failures)</li>
<li><em>topic</em> routing</li>
</ul>
</li>
<li>a <em>logs</em> exchange with:
<ul>
<li>no persistence (to avoid over-flooding the broker memory in case nobody is consuming the logs)</li>
<li>no message acknowledgment or confirmation (for speed)</li>
</ul>
</li>
<li>dead letter routing: when a message is permanently refused, it is automatically routed to the dead-letter exchange.</li>
</ul>
<p>The Result of a Command is sent to the <code class="language-plaintext highlighter-rouge">commands</code> exchange too.</p>
<p>Instances of a same logical service are called <em>workers</em> of that service, and they share the same queue. The broker guarantees that a message is processed by one and only one worker of the pool attached to the queue.</p>
<p>The message processing acknowledgment by clients is a very useful mechanism to ensure no data is lost — that is, a message is guaranteed to be processed at least once — and to allow efficient load balancing. Indeed, the message broker won’t remove a message from the queue until the worker who took it for processing tells it that it has finished its job. During this time, the message is reserved. If the worker who took the message goes down before acknowledging, the message becomes available for any other worker, or the same worker when it comes back. Moreover, the broker knows at any given time which workers are busy and which ones are idle, so it can better share the load between them.</p>
<p>Note that this is the base topology. As the topology is <strong>created by the services themselves</strong> instead of a central configuration, they can extend it locally (i.e. on their side) for their own needs.
That also means they must all agree on the base topology exposed above to join the bus.</p>
<h2 id="behaviors">Behaviors</h2>
<p>The topology describes how the broker routes the messages and what guarantees it must provide.</p>
<p>The EDA also requires that the services implement some basic rules to ensure reliability and performance:</p>
<ul>
<li>a message must be acknowledged once it is completely processed and never earlier;</li>
<li>the service must keep sent messages in cache until they are acknowledged by the broker, and resend them otherwise;</li>
<li>the service should automatically reconnect to the broker if it is disconnected, without losing its cache;</li>
<li>in case of unexpected failure when processing a message, the service should requeue it once before permanently rejecting it;</li>
<li>the service should catch unrecoverable errors (e.g. illegal messages, inconsistent data…), log them and acknowledge the faulty message
so that it is not requeued and not routed to the dead-letters.</li>
</ul>
<h2 id="framework-and-tools">Framework and tools</h2>
<p>To avoid code duplication, we developed mini-frameworks (in Python, Elixir and Rust) that implement this base topology and all the required behaviors of the services. We’ll talk about them in the next post: <em>Framework and tools</em>!</p>Allo-MediaIn this new post, let's talk about the actual implementation of the event driven architecture.An event driven architecture — part 22022-02-02T00:00:00+00:002022-02-02T00:00:00+00:00https://www.allo-media.net/en/tech/architecture/2022/02/02/eda-architecture2<p>In the <a href="/en/tech/architecture/2020/02/17/eda-architecture.html">previous post</a> of this series, we explained why we ditched our old architecture based on synchronous REST services for a completely asynchronous event-driven architecture.</p>
<p>Today, we address the core design principles that were crucial in the success of this enterprise.</p>
<h2 id="business-services-and-data-processing-services">Business Services and Data Processing Services</h2>
<p>We make a distinction between <em>Business Services</em> and <em>Data Processing Services</em> (aka utility services) to cleanly separate business logic from data processing complexity.</p>
<h3 id="data-processing-services">Data Processing Services</h3>
<p>Data Processing Services are expected to be <strong><a href="https://en.wikipedia.org/wiki/Pure_function">pure</a>, <a href="https://en.wikipedia.org/wiki/Service_statelessness_principle">stateless</a> services</strong> that provide some kind of algorithmic data processing (computations, transformations…). Moreover, they are also context free: they should not depend on business rules, assumptions or external data sources. All they need to do their processing must be in the message they receive. They should not have to query a tier to get more data. Data Processing services are kind of <em>universal</em> libraries and can even be provided by tiers.</p>
<p>Examples of Data-Processing services:</p>
<ul>
<li>A speech to text service has many different applications. All it needs as inputs are audio and a language reference. It doesn’t need to persist any data.</li>
<li>An image thumbnailing service. It only takes an image and target dimensions as inputs. It has no side effects and may be used in many different businesses.</li>
</ul>
<h3 id="business-services">Business Services</h3>
<p>Business Services implement the customers’ workflows and only focus on business rules and requirements to orchestrate and implement the value addition upon our customers’ audio and data. They make use of the Data Processing services as a library for that. They are very specific to us: you would never want to externalize your business services.</p>
<p>Business Services persist the data they produce and are their unique trusted source of truth.</p>
<p>Business Services build and maintain their own customer configuration from events on the bus.</p>
<p>Examples of Business Services:</p>
<ul>
<li>At Allo-Media, we have a business service to tag incoming calls. It listens for call transcripts and publishes qualification tags. It knows about our customer needs and tag the calls accordingly. It persists the tags and is the unique source of trust for them.</li>
<li>A shopping cart service for an online shop. For each online user, it maintains the state of their shopping cart by listening to UI events like <code class="language-plaintext highlighter-rouge">ItemSelected</code>, <code class="language-plaintext highlighter-rouge">ItemRemoved</code> or stock events like <code class="language-plaintext highlighter-rouge">ItemStockLeft</code>…</li>
</ul>
<p>All services must be <a href="https://en.wikipedia.org/wiki/Idempotence">idempotent</a>, that is, if they receive twice the exactly same message, they must behave identically and produce the same outputs.</p>
<h2 id="events-commands-and-results">Events, Commands and Results</h2>
<p>In the same way we have two different kinds of services, we have two different kinds of messages: <em>Commands (and their results)</em> and <em>Events</em></p>
<h3 id="events">Events</h3>
<p>Events are business messages published on the bus by business services and telling the world what happened.</p>
<p>A business service owns the type of Events it emits. It knows nothing about the services that will process them. It subscribes to the types of Events it needs but knows nothing about their origins.</p>
<p>An Event type defines the meaning of the events of that type and their data schema. They must be documented.</p>
<p>The type of an actual event message is given by its name (aka. <em>routing key</em> because it is used by the subscription routing). The event type name must be in the form <code class="language-plaintext highlighter-rouge">SubjectPastParticiple</code>. For example, <code class="language-plaintext highlighter-rouge">ConversationStarted</code>, <code class="language-plaintext highlighter-rouge">CustomerCreated</code>, <code class="language-plaintext highlighter-rouge">ShoppingCartValidated</code>… If you’re not able to immediately give a name to your event type, it means it is not well defined, or that it is not an event. Maybe, you need to refine your service or split it, as you may not have analyzed your value chain deeply enough?</p>
<h3 id="commands-and-results">Commands and results</h3>
<p>Commands are utility messages consumed by Data Processing Services. Imagine an order you pass to a provider. You don’t know who will complete it, you don’t know how and when either, but you’ll get what you want in your letter box sometime later.</p>
<p>A Data Processing Service owns the types of the commands it consumes and their results. A command is always addressed to the logical service that owns it.</p>
<p>A Command type defines the meaning of the command, its data schema and its result data schema. It must be documented.</p>
<p>The type of an actual command message is given by its name. It is in the form <code class="language-plaintext highlighter-rouge">VerbObject</code>. For example: <code class="language-plaintext highlighter-rouge">AnnotateText</code>, <code class="language-plaintext highlighter-rouge">TranscribeAudio</code>…
As commands are addressed to a particular logical service, the <em>routing key</em> of a command is in the form <code class="language-plaintext highlighter-rouge">logical_service_name.commandname</code>. For example: <code class="language-plaintext highlighter-rouge">asr.TranscribeAudio</code>.</p>
<p>The command contains the return “address” to which the result is to be sent and a reference set by the sender that is returned as-is, along with the command outcome. That reference is called the <em>correlation identifier</em> and it is very important for the sender: as all communications are asynchronous, the service requesting the command needs a way to reconcile the received result with the initial request it made.</p>
<p>A Result is a message associated and specific to each command and that contains the result of the process — that can be the successful outcome or an error — and the correlation identifier. Result messages can’t exist without a previous command.</p>
<p>Error results are expected and documented: they are “normal” errors, not bug reports. Bug exceptions <em>must not return an error result</em>. In case of unexpected error, the service will requeue the input command to retry it once, and if a second try raises an unexpected error again, the message is refused and goes into the dead letter queue for investigation. The exceptions are always logged.</p>
<h2 id="logs">Logs</h2>
<p>We can also have logging messages to easily collect application logs.</p>
<p>All the messages that “cascade” from the same source event, share a common identifier, called the <em>conversation identifier</em>, which has the following properties:</p>
<ul>
<li>it is unique in time;</li>
<li>it is created by an <em>Event</em> (never by <em>Commands</em>) that is published for reasons external to the bus and not as a reaction to other <em>Event</em>s; we call that <em>Event</em> the <em>initial event</em>.</li>
<li>Any message (Event, Command or Result) created as a reaction to another message <em>M</em>, takes and repeats the conversation ID of <em>M</em> as is.</li>
</ul>
<p>All consequent messages of a given initial event share the same conversation id, and no other event does. That way, we can easily trace and debug the actual pipeline of each incoming call for example.</p>
<p>Finally, the message schemas must be forward compatible:</p>
<ul>
<li>a new version of a Message schema for an application can add fields but must not remove or redefine existing ones;</li>
<li>the implementation of a Message decoder must ignore unknown fields without crashing.</li>
</ul>
<p>The detailed documentation of the actual messages must be kept up to date in an easily reachable place by the developers.</p>
<p>In the <a href="/en/tech/architecture/2022/09/26/eda-architecture3.html">next post</a> in this series, we’ll see how we implemented those principles and behaviors in the actual architecture.</p>Allo-MediaToday, we address the core design principles that we use for our event driven architecture.An event driven architecture2020-02-17T11:53:00+00:002020-02-17T11:53:00+00:00https://www.allo-media.net/en/tech/architecture/2020/02/17/eda-architecture<p>At <a href="https://www.allo-media.net/en/">Allo-Media</a>, like many other businesses, our value chain looks like a pipeline: we collect conversations (mainly through phone) sent to us by our customers, we transcribe them, we tag the transcripts with named entities, we anonymize both the transcript and the audio, then we qualify the content with semantic tags, and finally we index them and provide a UI and API to consult, search, analyze the conversations. All those steps are completed automatically by NLP and AI algorithms.</p>
<p>Such pipelines are well suited for service based architectures. If you need to add a new feature, you introduce a new service into the pipeline.</p>
<p>Our first take at it was based on REST services.</p>
<p><img src="/assets/img/blog/eventail/old_pipeline.png" alt="Old pipeline with REST services" /></p>
<p>Unfortunately, as you can see on the schema above, that approach had many drawbacks:</p>
<ul>
<li>it introduces strong coupling between components, as almost each service has to know about the other related services, their addresses, their purposes, their APIs…;</li>
<li>load balancing requires ad hoc solutions (like for the <em>Transcription Pool Manager</em> with <em>Celery</em>);</li>
<li>high availability is tricky because of the synchronous communication: if the requested service is down, the caller has to implement complex “retry later” strategies or give up! And so on for each service.</li>
<li>upgrading or adding new services is a lot of work as it impacts other services and requires careful coordinated releases. Plus, you have to provide them with IP and DNS addresses.</li>
</ul>
<p>So one year later, as our activity grew and development accelerated, we quickly realized that we needed:</p>
<ul>
<li>maximum service decoupling;</li>
<li>easy distribution;</li>
<li>no-brainer load balancing;</li>
<li>one to one, one to many, many to one, many to many asynchronous communications;</li>
<li>high availability: hot restart of services, transparent addition or removal of service instances (workers), resilience to (reasonable) downtime of some services;</li>
<li>support for heavy payloads (megabytes of mp3 audio);</li>
<li>no data loss, whatever happens.</li>
</ul>
<h2 id="enter-the-event-driven-architecture">Enter the event driven architecture</h2>
<p>The best way to achieve those goals is to free your mind from the classical pipeline point of view and instead see the value chain as an ecosystem of business services, each focused on providing a specific value and reacting to events (inputs) and producing new events (outputs). This new metaphor has not only technical benefits, but also business and organizational ones. By reasoning in terms of business units of your value chain, its easier to identify the people involved, the business experts who are the references for the job, the exact value added by the service, etc…</p>
<p>Here is the schema of our new architecture:</p>
<p><img src="/assets/img/blog/eventail/new_pipeline.png" alt="New event based pipeline" /></p>
<p>In this new architecture, events are precisely defined messages that streams on a message bus, and each logical service (implementing one such business service as explained above) subscribes to the events that are relevant to it, without needing any knowledge about what produced them and how. They also push their own events on the bus, without caring about what consumes them.</p>
<p>In that way, we completely decouple the services between each other and the message broker running the message bus provides us with load balancing, distribution and high availability for free.</p>
<p>Now, the messages <em>are</em> the API, the only business and technical reference.</p>
<p>After much thinking and experiments, we came with core design principles that are very important for the success of such an event driven architecture and after 4 months of production use, we are very glad we complied with them from the start. But that’s the subject of another blog post coming soon. Stay tuned!</p>Allo-MediaWhy and How we ditched our old architecture based on synchronous REST services for a completely asynchronous event driven architecture.ElasticSearch Percolator Use Case for Document Classification2019-12-05T11:00:00+00:002019-12-05T11:00:00+00:00https://www.allo-media.net/en/tech/python/2019/12/05/elasticsearch-percolator-qualification<p>Currently at Allo-Media, we use Elasticsearch in its general workflow which is to create an index and store documents holding our phone call audio transcripts metadata, and then allowing to search through these documents given some business criteria like: “Give me all phone calls from client Acme, where the customer speaks about the French strike”.</p>
<p>The percolator feature from Elasticsearch allows to make a reverse search. We store search queries as documents in its own index, and then we can percolate new call documents and retrieve what search queries match. One use case to use the percolator is document classification.</p>
<p>For example, say that we want to tag with <code class="language-plaintext highlighter-rouge">Check sent</code> all documents mentioning that the user has already sent a bank check. We would have the following search query:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>("I've sent" | "I've already sent") ("check")
</code></pre></div></div>
<p>So first, we need to create an index to store the search queries with the following mapping:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PUT /search-perco
{
"mappings": {
"_doc": {
"properties": {
"tag_uuid": {
"type": "text"
},
"tag_name": {
"type": "text"
},
"content": {
"type": "text"
},
"query": {
"type": "percolator"
}
}
}
}
}
</code></pre></div></div>
<ul>
<li>The <code class="language-plaintext highlighter-rouge">tag_*</code> fields are used for document classification</li>
<li>The <code class="language-plaintext highlighter-rouge">query</code> field of type <code class="language-plaintext highlighter-rouge">percolator</code> is used to index the search query documents, storing a <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html">query DSL</a> in JSON</li>
<li>The <code class="language-plaintext highlighter-rouge">content</code> field is used to preprocess the percolating documents.</li>
</ul>
<p>Once the index is created, we can now store our search query documents, like the following one:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PUT /search-perco/_doc/1?refresh
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "("I've sent" | "I've already sent") ("check")",
"fields": [
"content"
],
"default_operator": "and"
}
}
]
}
},
"tag_uuid": "2f86ad85-4c09-4ef3-bb6e-100d129018e9",
"tag_name": "Check sent",
}
</code></pre></div></div>
<p>And if we search through this index, we will retrieve our newly added search query document:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>GET search-perco/_search
{
"query": {"match_all": {}}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "search-perco",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"query": ...
"tag_uuid": "2f86ad85-4c09-4ef3-bb6e-100d129018e9",
"tag_name": "Check sent",
}
}
]
}
}
</code></pre></div></div>
<p>Now it’s time to percolate call documents via the percolate query:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>GET /search-perco/_search
{
"_source": [
"tag_uuid",
"tag_name"
],
"query": {
"percolate": {
"field": "query",
"documents": [{`
"unique_id": "2f86ad85-4c09-4ef3-bb6e-100d129018e7",
"timestamp": "2018-01-02T18:13:30+00:00",
"duration": 322,
"transcribed": true,
"client_name": "Acme",
"content": "I've already sent to you a bank check last week..."
}]
}
},
"highlight": {
"fields": {
"content": {}
}
}
}
</code></pre></div></div>
<p>Elasticsearch providing the following response:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
"took": 37,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.8630463,
"hits": [
{
"_index": "search-perco",
"_type": "_doc",
"_id": "1",
"_score": 0.8630463,
"_source": {
"tag_name": "Check sent",
"tag_uuid": "2f86ad85-4c09-4ef3-bb6e-100d129018e9"
},
"fields": {
"_percolator_document_slot": [
0
]
},
"highlight": {
"content": [
"<em>I've already sent</em> to you a bank <em>check</em> last week..."
]
}
}
]
}
}
</code></pre></div></div>
<p>So here we see that our call document matched the search query tagged <code class="language-plaintext highlighter-rouge">Check sent</code>. We can use the highlighter to highlight the terms that have matched from the search query documents. The field <code class="language-plaintext highlighter-rouge">_percolator_document_slot</code> is useful when we send several documents to the <code class="language-plaintext highlighter-rouge">documents</code> field of the percolate query. And <code class="language-plaintext highlighter-rouge">max_score</code> and <code class="language-plaintext highlighter-rouge">_score</code> gives you the relevance score of matched documents. You can disable the score computing when using the percolate query using a <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html#filter-context">filter context</a>.</p>
<p>We can also percolate existing documents by providing the index where they are stored, and their ids:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>GET /search-perco/_search
{
"query" : {
"percolate" : {
"field": "query",
"index" : "call-index",
"id" : "2"
}
}
}
</code></pre></div></div>
<p>You should care about optimizing text analysis during percolate time as suggested by the docs <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/percolator.html#_optimizing_query_time_text_analysis">Percolator optimization</a>.</p>
<p>Elasticsearch documentation:</p>
<ul>
<li><a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-percolate-query.html">Percolate query</a></li>
<li><a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/percolator.html">ElasticSearch Percolator</a></li>
</ul>Allo-MediaLet's try a use case of document classification with ElasticSearch Percolator.Stateful components in Elm2019-07-16T09:00:00+00:002019-07-16T09:00:00+00:00https://www.allo-media.net/en/tech/elm/2019/07/16/stateful-components-in-elm<p>It’s often claimed that <a href="https://elm-lang.org/">Elm</a> developers should avoid thinking their views as stateful components. While this is indeed a general best design practice, sometimes you may want to make your views reusable (eg. across pages or projects), and if they come with a state… you end up copying and pasting a lot of things.</p>
<p>We recently published <a href="https://package.elm-lang.org/packages/allo-media/elm-daterange-picker/latest/">elm-daterange-picker</a>, a date range picker written in <a href="https://elm-lang.org/">Elm</a>. It was the perfect occasion to investigate what a reasonable API for a reusable stateful view component would look like.</p>
<p><img src="/assets/img/blog/2019-07-16-stateful-components-in-elm/demo.gif" alt="app demo" /></p>
<p>Many component/widget-oriented Elm packages feature a rather raw <a href="https://guide.elm-lang.org/architecture/">Elm Architecture (TEA)</a> API, directly exposing <code class="language-plaintext highlighter-rouge">Model</code>, <code class="language-plaintext highlighter-rouge">Msg(..)</code>, <code class="language-plaintext highlighter-rouge">init</code>, <code class="language-plaintext highlighter-rouge">update</code> and <code class="language-plaintext highlighter-rouge">view</code>, so you can basically import what defines an actual application and embed it within your own application.</p>
<p><img src="/assets/img/blog/2019-07-16-stateful-components-in-elm/meme.jpg" alt="funny meme" /></p>
<p>With these, you usually end up writing things like this:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">import</span> <span class="nn">Counter</span>
<span class="kr">type</span> <span class="n">alias</span> <span class="kt">Model</span> <span class="o">=</span>
<span class="p">{</span> <span class="n">counter</span> <span class="o">:</span> <span class="kt">Counter</span><span class="o">.</span><span class="kt">Model</span>
<span class="p">,</span> <span class="n">value</span> <span class="o">:</span> <span class="kt">Maybe</span> <span class="kt">Int</span>
<span class="p">}</span>
<span class="kr">type</span> <span class="kt">Msg</span>
<span class="o">=</span> <span class="kt">CounterMsg</span> <span class="kt">Counter</span><span class="o">.</span><span class="kt">Msg</span>
<span class="n">init</span> <span class="o">:</span> <span class="nb">()</span> <span class="o">-></span> <span class="p">(</span> <span class="kt">Model</span><span class="p">,</span> <span class="kt">Cmd</span> <span class="kt">Msg</span> <span class="p">)</span>
<span class="n">init</span> <span class="kr">_</span> <span class="o">=</span>
<span class="p">(</span> <span class="p">{</span> <span class="n">counter</span> <span class="o">=</span> <span class="kt">Counter</span><span class="o">.</span><span class="n">init</span><span class="p">,</span> <span class="n">value</span> <span class="o">=</span> <span class="kt">Nothing</span> <span class="p">}</span>
<span class="p">,</span> <span class="kt">Cmd</span><span class="o">.</span><span class="n">none</span>
<span class="p">)</span>
<span class="n">update</span> <span class="o">:</span> <span class="kt">Msg</span> <span class="o">-></span> <span class="kt">Model</span> <span class="o">-></span> <span class="p">(</span> <span class="kt">Model</span><span class="p">,</span> <span class="kt">Cmd</span> <span class="kt">Msg</span> <span class="p">)</span>
<span class="n">update</span> <span class="n">msg</span> <span class="n">model</span> <span class="o">=</span>
<span class="kr">case</span> <span class="n">msg</span> <span class="kr">of</span>
<span class="kt">CounterMsg</span> <span class="n">counterMsg</span> <span class="o">-></span>
<span class="kr">let</span>
<span class="p">(</span> <span class="n">newCounterModel</span><span class="p">,</span> <span class="n">newCounterCommands</span> <span class="p">)</span> <span class="o">=</span>
<span class="kt">Counter</span><span class="o">.</span><span class="n">update</span> <span class="n">counterMsg</span>
<span class="kr">in</span>
<span class="p">(</span> <span class="p">{</span> <span class="n">model</span>
<span class="o">|</span> <span class="n">counter</span> <span class="o">=</span> <span class="n">newCounterModel</span>
<span class="p">,</span> <span class="n">value</span> <span class="o">=</span>
<span class="kr">case</span> <span class="n">counterMsg</span> <span class="kr">of</span>
<span class="kt">Counter</span><span class="o">.</span><span class="kt">Apply</span> <span class="n">value</span> <span class="o">-></span>
<span class="kt">Just</span> <span class="n">value</span>
<span class="kr">_</span> <span class="o">-></span>
<span class="kt">Nothing</span>
<span class="p">}</span>
<span class="p">,</span> <span class="n">newCommands</span> <span class="o">|></span> <span class="kt">Cmd</span><span class="o">.</span><span class="n">map</span> <span class="kt">CounterMsg</span>
<span class="p">)</span>
<span class="n">view</span> <span class="o">:</span> <span class="kt">Model</span> <span class="o">-></span> <span class="kt">Html</span> <span class="kt">Msg</span>
<span class="n">view</span> <span class="n">model</span> <span class="o">=</span>
<span class="n">div</span> <span class="kt">[]</span>
<span class="p">[</span> <span class="kt">Counter</span><span class="o">.</span><span class="n">view</span> <span class="n">model</span><span class="o">.</span><span class="n">counter</span>
<span class="o">|></span> <span class="kt">Html</span><span class="o">.</span><span class="n">map</span> <span class="kt">CounterMsg</span>
<span class="p">,</span> <span class="n">text</span> <span class="p">(</span><span class="kt">String</span><span class="o">.</span><span class="n">fromInt</span> <span class="n">model</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div></div>
<p>This certainly works, but let’s be frank for a minute and admit this is super verbose and not very developer friendly:</p>
<ul>
<li>You need to <code class="language-plaintext highlighter-rouge">Cmd.map</code> and <code class="language-plaintext highlighter-rouge">Html.map</code> here and there</li>
<li>You need to pattern match <code class="language-plaintext highlighter-rouge">Counter.Msg</code> to intercept whatever event interests you…</li>
<li>… meaning <code class="language-plaintext highlighter-rouge">Counter</code> exposes all <code class="language-plaintext highlighter-rouge">Msg</code>s, which are <strong>implementation details</strong> you now rely on.</li>
</ul>
<p>There’s another way, which <a href="https://github.com/evancz/">Evan</a> explained in his now deprecated <a href="https://github.com/evancz/elm-sortable-table#about-api-design">elm-sortable-table</a> package. Among the many good points he has, one idea stroke me as brilliantly simple yet effective to simplify such stateful view components API design:</p>
<blockquote>
<p><strong>State updates can be managed right from event handlers!</strong></p>
</blockquote>
<p>Let’s imagine a simple counter; what if when clicking the <em>increment</em> button, instead of calling <code class="language-plaintext highlighter-rouge">onClick</code> with some <code class="language-plaintext highlighter-rouge">Increment</code> message, we would call <strong>a user-provided one</strong> with the new counter state updated accordingly?</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Counter.elm</span>
<span class="n">view</span> <span class="o">:</span> <span class="p">(</span><span class="kt">Int</span> <span class="o">-></span> <span class="n">msg</span><span class="p">)</span> <span class="o">-></span> <span class="kt">Int</span> <span class="o">-></span> <span class="kt">Html</span> <span class="n">msg</span>
<span class="n">view</span> <span class="n">toMsg</span> <span class="n">counter</span> <span class="o">=</span>
<span class="n">button</span> <span class="p">[</span> <span class="n">onClick</span> <span class="p">(</span><span class="n">toMsg</span> <span class="p">(</span><span class="n">counter</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="p">]</span>
<span class="p">[</span> <span class="n">text</span> <span class="s">"increment"</span> <span class="p">]</span>
</code></pre></div></div>
<p>Or if you want to use an <a href="https://medium.com/@ckoster22/advanced-types-in-elm-opaque-types-ec5ec3b84ed2">opaque type</a>, which is an excellent idea for maintaining the smallest API surface area:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Counter.elm</span>
<span class="kr">type</span> <span class="kt">State</span>
<span class="o">=</span> <span class="kt">State</span> <span class="kt">Int</span>
<span class="n">view</span> <span class="o">:</span> <span class="p">(</span><span class="kt">State</span> <span class="o">-></span> <span class="n">msg</span><span class="p">)</span> <span class="o">-></span> <span class="kt">State</span> <span class="o">-></span> <span class="kt">Html</span> <span class="n">msg</span>
<span class="n">view</span> <span class="n">toMsg</span> <span class="p">(</span><span class="kt">State</span> <span class="n">value</span><span class="p">)</span> <span class="o">=</span>
<span class="n">button</span> <span class="p">[</span> <span class="n">onClick</span> <span class="p">(</span><span class="n">toMsg</span> <span class="p">(</span><span class="kt">State</span> <span class="p">(</span><span class="n">value</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)))</span> <span class="p">]</span>
<span class="p">[</span> <span class="n">text</span> <span class="s">"increment"</span> <span class="p">]</span>
</code></pre></div></div>
<p>Note that as we’re dealing with a counter state, we didn’t bother having anything else than a simple <code class="language-plaintext highlighter-rouge">Int</code> for representing it. But you could of course have a record or anything you want.</p>
<p>Handling internal state update could be just creating internal and unexposed <code class="language-plaintext highlighter-rouge">Msg</code> and <code class="language-plaintext highlighter-rouge">update</code> functions:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Counter.elm</span>
<span class="kr">type</span> <span class="kt">State</span>
<span class="o">=</span> <span class="kt">State</span> <span class="kt">Int</span>
<span class="kr">type</span> <span class="kt">Msg</span>
<span class="o">=</span> <span class="kt">Dec</span>
<span class="o">|</span> <span class="kt">Inc</span>
<span class="n">update</span> <span class="o">:</span> <span class="kt">Msg</span> <span class="o">-></span> <span class="kt">Int</span> <span class="o">-></span> <span class="kt">Int</span>
<span class="n">update</span> <span class="n">msg</span> <span class="n">value</span> <span class="o">=</span>
<span class="kr">case</span> <span class="n">msg</span> <span class="kr">of</span>
<span class="kt">Dec</span> <span class="o">-></span>
<span class="n">value</span> <span class="o">-</span> <span class="mi">1</span>
<span class="kt">Inc</span> <span class="o">-></span>
<span class="n">value</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">view</span> <span class="o">:</span> <span class="p">(</span><span class="kt">State</span> <span class="o">-></span> <span class="n">msg</span><span class="p">)</span> <span class="o">-></span> <span class="kt">State</span> <span class="o">-></span> <span class="kt">Html</span> <span class="n">msg</span>
<span class="n">view</span> <span class="n">toMsg</span> <span class="p">(</span><span class="kt">State</span> <span class="n">value</span><span class="p">)</span> <span class="o">=</span>
<span class="n">div</span> <span class="kt">[]</span>
<span class="p">[</span> <span class="n">button</span> <span class="p">[</span> <span class="n">onClick</span> <span class="p">(</span><span class="n">toMsg</span> <span class="p">(</span><span class="kt">State</span> <span class="p">(</span><span class="n">update</span> <span class="kt">Dec</span> <span class="n">value</span><span class="p">)))</span> <span class="p">]</span>
<span class="p">[</span> <span class="n">text</span> <span class="s">"decrement"</span> <span class="p">]</span>
<span class="p">,</span> <span class="n">button</span> <span class="p">[</span> <span class="n">onClick</span> <span class="p">(</span><span class="n">toMsg</span> <span class="p">(</span><span class="kt">State</span> <span class="p">(</span><span class="n">update</span> <span class="kt">Inc</span> <span class="n">value</span><span class="p">)))</span> <span class="p">]</span>
<span class="p">[</span> <span class="n">text</span> <span class="s">"increment"</span> <span class="p">]</span>
<span class="p">]</span>
</code></pre></div></div>
<p>We should also expose helpers to retrieve (or set) values from the opaque <code class="language-plaintext highlighter-rouge">State</code> type:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Counter.elm</span>
<span class="n">getValue</span> <span class="o">:</span> <span class="kt">State</span> <span class="o">-></span> <span class="kt">Int</span>
<span class="n">getValue</span> <span class="p">(</span><span class="kt">State</span> <span class="n">value</span><span class="p">)</span> <span class="o">=</span>
<span class="n">value</span>
</code></pre></div></div>
<p>So for instance, to use this <code class="language-plaintext highlighter-rouge">Counter</code> component in your own application, you just have to write this:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">import</span> <span class="nn">Counter</span>
<span class="kr">type</span> <span class="n">alias</span> <span class="kt">Model</span> <span class="o">=</span>
<span class="p">{</span> <span class="n">counter</span> <span class="o">:</span> <span class="kt">Counter</span><span class="o">.</span><span class="kt">State</span>
<span class="p">,</span> <span class="n">value</span> <span class="o">:</span> <span class="kt">Maybe</span> <span class="kt">Int</span>
<span class="p">}</span>
<span class="kr">type</span> <span class="kt">Msg</span>
<span class="o">=</span> <span class="kt">CounterChanged</span> <span class="kt">Counter</span><span class="o">.</span><span class="kt">State</span>
<span class="n">init</span> <span class="o">:</span> <span class="nb">()</span> <span class="o">-></span> <span class="p">(</span> <span class="kt">Model</span><span class="p">,</span> <span class="kt">Cmd</span> <span class="kt">Msg</span> <span class="p">)</span>
<span class="n">init</span> <span class="kr">_</span> <span class="o">=</span>
<span class="p">(</span> <span class="p">{</span> <span class="n">counter</span> <span class="o">=</span> <span class="kt">Counter</span><span class="o">.</span><span class="n">init</span><span class="p">,</span> <span class="n">value</span> <span class="o">=</span> <span class="kt">Nothing</span> <span class="p">}</span>
<span class="p">,</span> <span class="kt">Cmd</span><span class="o">.</span><span class="n">none</span>
<span class="p">)</span>
<span class="n">update</span> <span class="o">:</span> <span class="kt">Msg</span> <span class="o">-></span> <span class="kt">Model</span> <span class="o">-></span> <span class="p">(</span> <span class="kt">Model</span><span class="p">,</span> <span class="kt">Cmd</span> <span class="kt">Msg</span> <span class="p">)</span>
<span class="n">update</span> <span class="n">msg</span> <span class="n">model</span> <span class="o">=</span>
<span class="kr">case</span> <span class="n">msg</span> <span class="kr">of</span>
<span class="kt">CounterChanged</span> <span class="n">state</span> <span class="o">-></span>
<span class="p">(</span> <span class="p">{</span> <span class="n">model</span> <span class="o">|</span> <span class="n">counter</span> <span class="o">=</span> <span class="n">state</span><span class="p">,</span> <span class="n">value</span> <span class="o">=</span> <span class="kt">Counter</span><span class="o">.</span><span class="n">getValue</span> <span class="n">state</span> <span class="p">}</span>
<span class="p">,</span> <span class="kt">Cmd</span><span class="o">.</span><span class="n">none</span>
<span class="p">)</span>
<span class="n">view</span> <span class="o">:</span> <span class="kt">Model</span> <span class="o">-></span> <span class="kt">Html</span> <span class="kt">Msg</span>
<span class="n">view</span> <span class="n">model</span> <span class="o">=</span>
<span class="n">div</span> <span class="kt">[]</span>
<span class="p">[</span> <span class="kt">Counter</span><span class="o">.</span><span class="n">view</span> <span class="kt">CounterChanged</span> <span class="n">model</span><span class="o">.</span><span class="n">counter</span>
<span class="p">,</span> <span class="n">text</span> <span class="p">(</span><span class="kt">String</span><span class="o">.</span><span class="n">fromInt</span> <span class="n">model</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div></div>
<p>Notice how our <code class="language-plaintext highlighter-rouge">update</code> function is dramatically simpler to write and to understand. Also, no need to import (and rely) a lot from the package module, which makes it <strong>both easier to consume & maintain</strong> thanks to to the opaque <code class="language-plaintext highlighter-rouge">State</code> type encapsulating implementation details.</p>
<p>Of course a counter wouldn’t be worth creating a package for it, though this may highlight the concept better. Don’t hesitate reading <em>elm-daterange-picker</em>’s <a href="https://github.com/allo-media/elm-daterange-picker/blob/master/src/DateRangePicker.elm">source code</a> and <a href="https://github.com/allo-media/elm-daterange-picker/blob/master/demo/Main.elm">demo code</a> to look at a real world application of this design principle.</p>Allo-MediaIt's often claimed that Elm developers should avoid thinking their views as stateful components. While this is indeed a general best design practice, sometimes you may want to make your views reusable, and if they come with a state... you end up copying and pasting a lot of things.Text2num version 1.0.0 released!2018-10-02T11:53:00+00:002018-10-02T11:53:00+00:00https://www.allo-media.net/en/tech/python/2018/10/02/release-of-text2num-1<p>The output of speech-to-text systems are entirely made of words, without punctuation or capitalization. This makes visual scanning for numbers quite cumbersome,
especially in transcriptions of real life dialogues as they also contain a lot of gibberish words — like « heu, ben, bah… » — and the syntax or
grammar is not always correct. Besides that, text mining tools and techniques often, if not always, expect numbers to be in decimal digit representation.</p>
<p>That’s why we decided to add a transformation pass to our speech-to-text engine in order to convert all spoken numbers into their digit spelling.</p>
<p>Here at Allo-Media, we are fond of Open Source Software. So we first looked at the state of the art of <a href="http://www.python.org">Python</a> libraries for parsing words into numbers. There was at least <a href="https://pypi.org/project/word2number/">one</a> for the English language, but we didn’t find any for French. Therefore, we decided to build our own library and contribute it back to the community.</p>
<p>We could have ported the <a href="https://pypi.org/project/word2number/">Word2number</a> library, but it has some flaws:</p>
<ul>
<li>It is unable to detect by itself the bounds of a number expression;</li>
<li>its algorithm is weak (ex: <code class="language-plaintext highlighter-rouge">w2n.word_to_num('hundred five fifty') == 550</code>);</li>
<li>French has some pecularities like « <em>quatre-vingt-dix-neuf</em> » vs « <em>nonante-neuf</em> ».</li>
</ul>
<p>So we started a linguistic parser from scratch that is able to identify numbers and correctly isolate contiguous ones in a sequence. Moreover, we wanted it to be able
to parse different flavors of french (e.g. <em>soixante-dix</em> and <em>septante</em> for 70, etc…).</p>
<p>If you are interested in linguistics, <em>septante</em> for 70 and <em>nonante</em> for 90 are used in Belgium, Switzerland, Luxembourg, Aosta Valley, Jersey French and to a lesser extend in French regions of Savoie, Franche-Compté and even sometimes in Lorraine and Provence (source <a href="https://fr.wikipedia.org/wiki/70_(nombre)#Linguistique">Wikipedia</a>). The rest of the French speaking world uses respectively <em>soixante-dix</em> and <em>quatre-vingt-dix</em>. The usage area of <em>huitante</em> and <em>octante</em> instead of <em>quatre-vingts</em> is <a href="https://fr.wikipedia.org/wiki/80_(nombre)#Huitante">more restricted yet</a>.</p>
<p>As French spelling is a touchy topic, the parser is tolerant and accepts both the <a href="https://fr.wikipedia.org/wiki/Rectifications_orthographiques_du_fran%C3%A7ais_en_1990#Les_modifications_apport%C3%A9es">1990 spelling reform</a> and prior rules. It even has an optional relaxed mode that parses <em>quatre vingt</em> as <em>quatre-vingt</em> for cases where you prefer to use some punctuation or timing information to help disambiguate and compensate for a wobbly transcription.</p>
<p>Here are two samples of what you can expect from it:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">text_to_num</span> <span class="kn">import</span> <span class="n">text2num</span>
<span class="o">>>></span> <span class="n">text2num</span><span class="p">(</span><span class="s">'quatre-vingt-quinze'</span><span class="p">)</span>
<span class="mi">95</span>
<span class="o">>>></span> <span class="n">text2num</span><span class="p">(</span><span class="s">'nonante-cinq'</span><span class="p">)</span>
<span class="mi">95</span>
<span class="o">>>></span> <span class="n">text2num</span><span class="p">(</span><span class="s">'mille neuf cent quatre-vingt dix-neuf'</span><span class="p">)</span>
<span class="mi">1999</span>
<span class="o">>>></span> <span class="n">text2num</span><span class="p">(</span><span class="s">"cinquante et un million cinq cent soixante dix-huit mille trois cent deux"</span><span class="p">)</span>
<span class="mi">51578302</span>
<span class="o">>>></span> <span class="n">text2num</span><span class="p">(</span><span class="s">'cent cinq cinquante'</span><span class="p">)</span>
<span class="nb">AssertionError</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">text_to_num</span> <span class="kn">import</span> <span class="n">alpha2digit</span>
<span class="o">>>></span> <span class="n">alpha2digit</span><span class="p">(</span><span class="s">'cent cinq cinquante'</span><span class="p">)</span>
<span class="s">'105 50'</span>
<span class="o">>>></span> <span class="n">sentence</span> <span class="o">=</span> <span class="p">(</span>
<span class="p">...</span> <span class="s">"Huit cent quarante-deux pommes, vingt-cinq chiens, mille trois chevaux, "</span>
<span class="p">...</span> <span class="s">"douze mille six cent quatre-vingt-dix-huit clous.</span><span class="se">\n</span><span class="s">"</span>
<span class="p">...</span> <span class="s">"Quatre-vingt-quinze vaut nonante-cinq. On tolère l'absence de tirets avant les unités : "</span>
<span class="p">...</span> <span class="s">"soixante seize vaut septante six.</span><span class="se">\n</span><span class="s">"</span>
<span class="p">...</span> <span class="s">"Nombres en série : douze quinze zéro zéro quatre vingt cinquante-deux cent trois cinquante deux "</span>
<span class="p">...</span> <span class="s">"trente et un.</span><span class="se">\n</span><span class="s">"</span>
<span class="p">...</span> <span class="s">"Ordinaux: cinquième troisième vingt et unième centième mille deux cent trentième.</span><span class="se">\n</span><span class="s">"</span>
<span class="p">...</span> <span class="s">"Décimaux: douze virgule quatre-vingt dix-neuf, cent vingt virgule zéro cinq ; "</span>
<span class="p">...</span> <span class="s">"mais soixante zéro deux."</span>
<span class="p">...</span> <span class="p">)</span>
<span class="o">>>></span> <span class="k">print</span><span class="p">(</span><span class="n">alpha2digit</span><span class="p">(</span><span class="n">sentence</span><span class="p">))</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>842 pommes, 25 chiens, 1003 chevaux, 12698 clous.
95 vaut 95. On tolère l'absence de tirets avant les unités : 76 vaut 76.
Nombres en série : 12 15 004 20 52 103 52 31.
Ordinaux: 5ème 3ème 21ème 100ème 1230ème.
Décimaux: 12,99, 120,05 ; mais 60 02.
</code></pre></div></div>
<p>As you see, we support decimal numbers as well as ordinal numbers.</p>
<p>The algorithm is quite robust and is based on the observation that big numbers are structured like a sum of <strong>decreasing</strong> powers of thousand in the language, each power of thousand being multiplied by a number from 1 (maybe omitted) to 999. The problem is thus “reduced” to recognizing powers of thousands (<em>mille, million, milliard</em>) and being able to parse numbers from 1 to 999.</p>
<p>Example: <em>trois millions cinq cent vingt-trois mille deux cent quarante</em> -> 3 × 1 000 000 + 523 × 1000 + 240.</p>
<p>Parsing numbers between 1 and 999 is more difficult. The basic idea is that we expect between 0 and 9 hundreds, followed by a ten expression (<em>vingt, trente, …</em>) or none and some optional units (from 1 to 9) or extended units (from 1 to 19). The “hard” part is to detect illegal combinations and the end of the number.</p>
<p>As the needs arise, we may develop parsers for other languages on this base, including a robust English one with all the desired features.</p>
<p>The library is distributed under the <a href="https://en.wikipedia.org/wiki/MIT_License">MIT license</a>.</p>
<p>If you are interested in more details or want to contribute, you can check the sources on <a href="https://github.com/allo-media/text2num">GitHub</a> and the <a href="https://text2num.readthedocs.io/en/stable/contribute.html">contribution guide</a>.</p>
<p>If you just want to use it, it’s just a <code class="language-plaintext highlighter-rouge">pip install text2num</code> away and the documentation is on <a href="https://text2num.readthedocs.io/">ReadTheDocs</a>.</p>
<p>Enjoy!</p>Allo-MediaThere already existed some python packages to convert numbers written in English into Python numbers or their decimal digit representation, but there was nothing available for the French language. That's why we developed this library and shared it with the community.From python to Go to Rust: an opinionated journey2018-03-22T08:00:00+00:002018-03-22T08:00:00+00:00https://www.allo-media.net/en/tech/point/of/view/2018/03/22/from-python-to-go-to-rust<p>When looking for a new backend language, I naturally went from <a href="http://python.org/">Python</a> to the new cool kid: <a href="https://golang.org/">Go</a>. But after only one week of Go, I realised that Go was only half of a progress. Better suited to my needs than Python, but too far away from the <strong>developer experience</strong> I was enjoying when doing <a href="http://elm-lang.org/">Elm</a> in the frontend. So I gave <a href="https://www.rust-lang.org/">Rust</a> a try.</p>
<h2 id="away-from-python">Away from Python,</h2>
<p>For backend development, I’ve mainly been using Python 3 for the past three years. From admin scripts to machine learning to <a href="http://flask.pocoo.org/">Flask</a>/<a href="https://www.djangoproject.com/">Django</a> applications, I’ve done a lot of Python lately, but at some point, <strong>it didn’t feel right anymore</strong>. Well, to be honest, it’s not really at “some totally random point” that it started not to feel right anymore, it was when I started to enjoy programming with a strongly typed language: <a href="http://elm-lang.org/">Elm</a>.</p>
<p>I had the famous feeling “when it compiles it works”, and once you’ve experienced that, <strong>there is no way back</strong>. You try stuff, you follow the friendly compiler error messages, you fix things, and then <em>tada</em>, it works!</p>
<p>Ok so, at this point I knew what I wanted from the “perfect” backend language:</p>
<ol>
<li><strong>Static</strong> and <strong>strong</strong> typing</li>
<li>Most of the stuff checked at <strong>compile time</strong> (please, no exceptions!)</li>
<li><strong>No <code class="language-plaintext highlighter-rouge">null</code></strong></li>
<li><strong>No mutability</strong></li>
<li>Handle <strong>concurrency</strong> nicely</li>
</ol>
<p>I see you coming: “hey, this is <a href="https://www.haskell.org/">Haskell</a>”! Yeah indeed, but for whatever reason, I’ve never managed to get anything done with <a href="https://www.haskell.org/">Haskell</a> (and I’ve been trying a lot). This is maybe only me, but from an outsider, the Haskell mindset seems elitist, the documentation and practical examples are lacking and it’s hardly accessible to a beginner. <a href="http://learnyouahaskell.com/">Learn you a Haskell for great good</a> is awesome but very long to read and too abstract for me (you don’t build anything <em>for real</em> during the book).</p>
<p>“Hey, and what about <a href="https://www.scala-lang.org/">Scala</a>?!”. What do you mean by Scala? The better Java? The functional programming language with <a href="https://github.com/scalaz/scalaz">Scalaz</a>? The Object Orienting Programming Functional language that may or may not fail at runtime with a <code class="language-plaintext highlighter-rouge">java.lang.NullPointerException</code> and needs a 4GB JVM running? I tried it some years ago and definitely, this is a no go for me.</p>
<p>After discussing with a few people, I decided to give <strong>Go</strong> a try. It has a <strong>compiler</strong>, <strong>no exceptions</strong>, no <code class="language-plaintext highlighter-rouge">null</code> (but <strong>null values</strong>) and can handle <strong>concurrency</strong> nicely.</p>
<h2 id="into-go">Into Go,</h2>
<p>I decided to rewrite an internal project that was already done in Python using Go. Just to get a feeling of the differences between the two.</p>
<p>First feeling: learning <strong>Go</strong> was so easy. In <strong>one evening</strong>, I was able to compile a Proof Of Concept of the project with basic features developed and some tests written. This was a very pleasant feeling, I was adding features very fast. The compiler messages were helpful, everything was fine.</p>
<p>And at some point, the tragedy started. I needed to add a field to some struct, so I just modified the struct and was ready to analyze the compiler messages to know where this struct was used in order to add the field where it was needed.</p>
<p>I compiled the code and … no error message. Everything went fine. But?! I just added a field to a struct, the compiler should say that my code is not good anymore because I’m not initializing the value where it should be!</p>
<p>The problem is that, not providing a value to a struct is not a problem in <strong>Go</strong>. This value will default to it’s <a href="https://tour.golang.org/basics/12">zero value</a> and everything will compile. This was the <strong>show stopper</strong> for me. I realized that I <strong>couldn’t rely on the compiler</strong> to get my back when I was doing mistakes. At this point, I was wondering: why should I bother learning <strong>Go</strong> if the compiler can’t do much better than <a href="http://mypy-lang.org/">Python and mypy</a>? Of course concurrency is much better with <strong>Go</strong>, but the downside of not being able to rely on the compiler was too much for me.</p>
<p>Don’t get me wrong, I still think that <strong>Go</strong> is a <strong>progress compared to Python</strong> and I would definitively recommend people to learn Go instead of Python if they had to pick one of the two. But for my personal case, as someone who already knew Python and wanted something a lot safer, Go didn’t bring enough to the table in that specific domain.</p>
<h2 id="into-rust">Into Rust.</h2>
<p>So <strong>Go</strong> was not an option anymore as I realized that what I really needed was a <strong>useful compiler</strong>: a compiler that <strong>should not rely on the fact that I know how to code</strong> (as it has been proven to be false a lot of times). That’s why I took a look at <strong>Rust</strong>.</p>
<p>Rust was not my first choice because it advertises itself as a “system language”, and I’m more of a web developer than a system one. But it had some very compelling selling points:</p>
<ul>
<li>No <code class="language-plaintext highlighter-rouge">null</code> values but an <code class="language-plaintext highlighter-rouge">Option</code> type (checked at compile time)</li>
<li>No <code class="language-plaintext highlighter-rouge">exceptions</code> but a <code class="language-plaintext highlighter-rouge">Result</code> type (checked at compile time)</li>
<li>Variables are <strong>immutable</strong> by default</li>
<li>Designed with concurrency in mind</li>
<li>Memory safe by design, no garbage collector</li>
</ul>
<p>I decided to rewrite the <strong>same program</strong> than the one I did in Python and Go. The <strong>onboarding was a lot harder</strong> than with Go. As I did with Go, I tried to go head first, but it was too hard: I needed some new concepts specific to Rust like <strong>ownership</strong> or <strong>lifetimes</strong> to understand the code I was seeing on StackOverflow. So I had no choice but to read the <a href="https://doc.rust-lang.org/book/second-edition/">Rust Book</a>, and it took me two weeks before I could start writing some code (remember that with Go it took me one evening).</p>
<p>But after this steep initial learning curve, I was enjoying writing Rust code, and I’m still enjoying it. With Rust, I don’t have to trust myself, I just have to <strong>follow the compiler</strong> and if I do so, it will most likely work if it compiles. In the end, this is the main feeling I was looking for when searching for a new backend language.</p>
<p>Of course, Rust has a lot of <strong>downsides</strong>:</p>
<ul>
<li>It’s pretty <strong>new and things are moving very fast</strong>. I’m using <a href="https://docs.rs/futures/">futures-rs</a> and <a href="https://hyper.rs/">hyper.rs</a> in my project, and finding good documentation was really hard (kudos to the people on <a href="https://chat.mibbit.com/?server=irc.mozilla.org&channel=%23rust-beginners">irc.mozilla.org#rust-beginners</a> for the help).</li>
<li>It forces you to think of things you’re not used to when coming from more <em>high-level</em> languages: <strong>how is the memory managed</strong> (with lifetimes and ownership).</li>
<li>Compiler messages are not always straightforward to understand, especially when you’re combining futures and their strange long types.</li>
<li>Mutability is allowed, so you can get smashed with side effects</li>
</ul>
<p>But, it also has a lot of <strong>upsides</strong>:</p>
<ul>
<li>It’s <strong>amazingly fast</strong></li>
<li><strong>Tooling is good</strong> (cargo, rustfmt)</li>
<li>Most of the things are <strong>checked at compile time</strong></li>
<li>You can potentially <strong>do whatever you want with it</strong>, from a browser, to a web app, to some game.</li>
<li>Community is welcoming</li>
<li>It’s backed by Mozilla</li>
</ul>
<h2 id="wrapping-up">Wrapping up</h2>
<p><strong>Go</strong> is cool but doesn’t provide enough type safety <strong>for me</strong>. I would rather stick with <strong>Python</strong> and its ecosystem than risking re-writing stuff in <strong>Go</strong> if I don’t need concurrency. If I need concurrency I would still not use <strong>Go</strong> as its lack of type safety will surely hit me back at some point.</p>
<p><strong>Rust</strong> is the perfect candidate for concurrency and safety, even if the <a href="https://crates.io/crates/futures">futures-rs</a> crate (this is how we call libs in Rust) is still early stage. I suspect that <strong>Rust</strong> could become the defacto standard for a lot of backend needs in the future.</p>
<p>For a more in depth blog post discussing the differences between Go and Rust, be sure to check this amazing post by <a href="https://twitter.com/deckarep">Ralph Caraveo (@deckarep)</a> : <a href="https://medium.com/@deckarep/paradigms-of-rust-for-the-go-developer-210f67cd6a29">Paradigms of Rust for the Go developer</a>.</p>
<p>At the very least, I think that I’ve found in Rust <strong>my</strong> new favorite language for the backend.</p>Allo-MediaWhen looking for a new backend language, I naturally went from Python to the new cool kid: Go. But after only one week of Go, I realised that Go was only half of a progress. Better suited to my needs than Python, but too far away from the developer experience I was enjoying when doing Elm in the frontend. So I gave Rust a try.Brace yourself data selection, industrialization is coming!2018-03-22T08:00:00+00:002018-03-22T08:00:00+00:00https://www.allo-media.net/en/tech/r%2526d/2018/03/22/industrialization-versus-data-selection<p>Industrialization is one of the most challenging problems for a start-up like ours. In fact, the research world doesn’t have the same priorities concerning time and cost optimization. Whereas industry is limited by these factors. And this matter struck us when we thought about building language models (referred as LM in the following) massively, especially regarding data selection.</p>
<p>Historically, our LMs were crafted one by one with love, with a nice cup of human intervention in between. Meaning that we had to experiment to find the best system empirically. And this is so not compatible with automation.</p>
<p>What is data selection you may ask? However, first thing first.</p>
<h2 id="asr-automatic-speech-recognition-prelude">ASR: Automatic Speech Recognition Prelude</h2>
<p>In ASR, we usually consider building two independent modules which will be mixed together later on. Each module in itself is very dependent of the language we want to recognize.</p>
<ul>
<li>
<p>The first one is called acoustic model. It represents the way of speaking. What sounds can be put together to form a word, a sentence. In fact, we represent the language by a serie of phonemes. If we look in <a href="https://en.wiktionary.org/wiki/phoneme">Wiktionary</a> it’s clearer isn’t it? Don’t confuse with syllable through. We can take an example. The word ‘through’ (1 syllable) consists of three sounds, three phonemes: ‘th’ ‘r’ ‘oo’. So the goal is to model the sounds as a sequence of phonemes.</p>
</li>
<li>
<p>The second one is the language model, which we want to build automatically. It models the distribution of words for a given language. Thanks to those probabilities, the LM helps in picking the best correspondence between a sequence of phonemes and the words/sentences.</p>
</li>
</ul>
<p>In order to build these modules, we need data and the more we have and the more they are relevant, the better! That’s why we need data selection: we need a mean to retrieve relevant data adapted to the context of the recognition. In fact a lawyer and a baker don’t speak the same language: they don’t use the same lexicon. Data selection is picking the good data that match the domain within millions of examples through the usage of various automatic algorithms.</p>
<h2 id="how-we-used-to-do">How we used to do</h2>
<p>As we discussed above, data selection is a very important step while building a system. Like many, we used the <a href="http://www.aclweb.org/anthology/P10-2041">Moore-Lewis</a> method which was also adapted for bilingual use (like translation) by Axelrod et al. in <a href="https://aclanthology.info/pdf/D/D11/D11-1033.pdf">Domain Adaptation via Pseudo In-Domain Data Selection</a>. These are very effective ways to select data using two corpora (in and out domain) by comparing cross-entropies. In-domain meaning that the corpus have specific data, that are relevant with the context, the domain of recognition as explained before with the lawyer/baker thing. Whereas out-of-domain is just a pool of random data, meaning there is relevant and no-relevant data in it! Then about cross-entropy, it’s a measure that help choosing well-matched data for the desired output. Thanks to some relevant segment, we compare each segment in the pool of data to retrieve the closest ones to the initial data.</p>
<p><img src="/assets/img/blog/DataSelection.jpg" alt="example of data selection" /></p>
<p>So, using the cross-entropy to select, it’s not really scalable because the algorithm can’t decide when to stop on its own and he has an annoying tendency to promote very short to short sentences meaning our corpus isn’t really relevant to conversations. Moreover, something hit us hard. <a href="http://www.aclweb.org/anthology/P10-2041">This paper</a> turned out to be eight this year and we have never looked for another method before. So we asked ourselves: has any new work been done in data selection since this paper? And is there any relevant work ready for a more industrialized turn?</p>
<h2 id="searching-finding">Searching…. Finding!</h2>
<p>After browsing 178 papers quoting the Moore-Lewis one, a title caught our eyes: <a href="https://arxiv.org/pdf/1709.02279.pdf">Cynical Selection of Language Model Training Data</a>. The name was so catchy, we had to explore it. Written by Amittai Axelrod (remember we mentioned him above), we decided to give it a shot <a href="https://github.com/allo-media/cynical-selection">here</a> because the paper was full of good promises … And seemed compatible with industrialization! Unlike the previous methods, the algorithm stops by itself when it has the (supposed) optimal selection, letting us continue our road toward automating.</p>
<h2 id="how-does-it-work-how-did-we-make-it-work">How does it work? How did we make it work?</h2>
<p>The goal is to select data from our out-of-domain corpora that can extend our in-domain data. Suppose you have a small in-domain corpora, which you are a hundred percent positive that is representative. The algorithm will take this corpus and a more generic one, where you don’t know what’s relevant or not. It will then select the sentences that match the specific one using an implementation of the Alexrod’s paper cited above. The script can take arguments which are detailed in the header of the script. It only requires the two corpora to work:</p>
<p><code class="language-plaintext highlighter-rouge">./cynical-selection.py --task inDomainFile.txt --unadapted outDomainFile.txt</code></p>
<p>and returns you a list of sentences along with their scores in a ‘.jaded’ file constructed as follows:</p>
<p><code class="language-plaintext highlighter-rouge">model score sentence score (penalty + gain) length penalty sentence gain sentence id (in the selection) sentence id (in the unadapted corpora) best word word gain sentence</code>
for example:</p>
<p><code class="language-plaintext highlighter-rouge">2.659289425334946 2.659289425334946 5.71042701737487 -3.0511375920399235 1 1 vous -0.12597986190092164 merci à vous tous</code></p>
<p><code class="language-plaintext highlighter-rouge">5.318578850669892 2.659289425334946 5.71042701737487 -3.0511375920399235 2 26978 vous -0.12597986190092164 et vous avez maintenant</code></p>
<p><code class="language-plaintext highlighter-rouge">7.9778682760048385 2.659289425334946 5.71042701737487 -3.0511375920399235 3 26979 vous -0.12597986190092164 puisque vous avez des</code></p>
<p>In the end, we didn’t lose any performance using this method, we even gained accuracy most of the time. But the important part is that it allowed us to automatize this treatment, taking us one step closer to industrialization.</p>
<h2 id="to-conclude">To conclude</h2>
<p>This method allows us to focus on other parts of our systems, making us more productive and more serene towards the building of language model. So it’s a success captain!</p>Allo-MediaIndustrialization is one of the most challenging problems for a start-up. The research world doesn't have the same priorities concerning time and cost optimization, whereas industry is limited by these factors. How can we be more productive and more serene towards the building of language model?Chaining HTTP requests in Elm2018-02-05T08:00:00+00:002018-02-05T08:00:00+00:00https://www.allo-media.net/en/tech/learning/elm/2018/02/05/chaining-http-requests-in-elm<p><em>Preliminary note: in this article we’ll use Elm <a href="https://guide.elm-lang.org/interop/json.html">decoders</a>, <a href="http://ohanhi.com/tasks-in-modern-elm.html">tasks</a>, <a href="https://guide.elm-lang.org/error_handling/result.html">results</a> and leverage the <a href="https://guide.elm-lang.org/architecture/">Elm Architecture</a>. If you’re not comfortable with these concepts, you may want to check their respective documentation.</em></p>
<p>Sometimes in Elm you struggle with the most basic things.</p>
<p>Especially when you come from a JavaScript background, where chaining HTTP requests are relatively easy thanks to Promises. Here’s a real-world example leveraging the Github public API, where we fetch a list of Github events, pick the first one and query some user information from its unique identifier.</p>
<p>The first request uses the <code class="language-plaintext highlighter-rouge">https://api.github.com/events</code> endpoint, and the retrieved JSON looks like this:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"987654321"</span><span class="p">,</span><span class="w">
</span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ForkEvent"</span><span class="p">,</span><span class="w">
</span><span class="nl">"actor"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">1234567</span><span class="p">,</span><span class="w">
</span><span class="nl">"login"</span><span class="p">:</span><span class="w"> </span><span class="s2">"foobar"</span><span class="p">,</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">]</span><span class="w">
</span></code></pre></div></div>
<p>I’m purposely omitting a lot of other properties from the records here, for brevity.</p>
<p>The second request we need to do is on the <code class="language-plaintext highlighter-rouge">https://api.github.com/users/{login}</code> endpoint, and its body looks like this:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">1234567</span><span class="p">,</span><span class="w">
</span><span class="nl">"login"</span><span class="p">:</span><span class="w"> </span><span class="s2">"foobar"</span><span class="p">,</span><span class="w">
</span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Foo Bar"</span><span class="p">,</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Again, I’m just displaying a few fields from the actual JSON body here.</p>
<p>So we basically want:</p>
<ul>
<li>from a list of events, to pick the first one if any,</li>
<li>then pick its <code class="language-plaintext highlighter-rouge">actor.login</code> property,</li>
<li>query the user details endpoint using this value,</li>
<li>extract the user real name for that account.</li>
</ul>
<p>Using JavaScript, that would look like this:</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">fetch</span><span class="p">(</span><span class="dl">"</span><span class="s2">https://api.github.com/events</span><span class="dl">"</span><span class="p">)</span>
<span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">responseA</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">responseA</span><span class="p">.</span><span class="nx">json</span><span class="p">()</span>
<span class="p">})</span>
<span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">events</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">events</span><span class="p">.</span><span class="nx">length</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">throw</span> <span class="dl">"</span><span class="s2">No events.</span><span class="dl">"</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="p">{</span> <span class="na">actor</span> <span class="p">:</span> <span class="p">{</span> <span class="nx">login</span> <span class="p">}</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">events</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">return</span> <span class="nx">fetch</span><span class="p">(</span><span class="s2">`https://api.github.com/users/</span><span class="p">${</span><span class="nx">login</span><span class="p">}</span><span class="s2">`</span><span class="p">)</span>
<span class="p">})</span>
<span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">responseB</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">responseB</span><span class="p">.</span><span class="nx">json</span><span class="p">()</span>
<span class="p">})</span>
<span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">user</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">user</span><span class="p">.</span><span class="nx">name</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">unspecified</span><span class="dl">"</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">user</span><span class="p">.</span><span class="nx">name</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">})</span>
<span class="p">.</span><span class="k">catch</span><span class="p">(</span><span class="nx">err</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="p">})</span>
</code></pre></div></div>
<p>It would get a little fancier using <code class="language-plaintext highlighter-rouge">async/await</code>:</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">responseA</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">fetch</span><span class="p">(</span><span class="dl">"</span><span class="s2">https://api.github.com/events</span><span class="dl">"</span><span class="p">)</span>
<span class="kd">const</span> <span class="nx">events</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">responseA</span><span class="p">.</span><span class="nx">json</span><span class="p">()</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">events</span><span class="p">.</span><span class="nx">length</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">throw</span> <span class="dl">"</span><span class="s2">No events.</span><span class="dl">"</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="p">{</span> <span class="na">actor</span><span class="p">:</span> <span class="p">{</span> <span class="nx">login</span> <span class="p">}</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">events</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="kd">const</span> <span class="nx">responseB</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">fetch</span><span class="p">(</span><span class="s2">`https://api.github.com/users/</span><span class="p">${</span><span class="nx">login</span><span class="p">}</span><span class="s2">`</span><span class="p">)</span>
<span class="kd">const</span> <span class="nx">user</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">responseB</span><span class="p">.</span><span class="nx">json</span><span class="p">()</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">user</span><span class="p">.</span><span class="nx">name</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">unspecified</span><span class="dl">"</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">user</span><span class="p">.</span><span class="nx">name</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This is already complicated code to read and understand, and it’s tricky to do using Elm as well. Let’s see how to achieve the same, understanding exactly what we’re doing (we’ve all blindly copied and pasted code in the past, don’t deny).</p>
<p>First, let’s write the two requests we need; one for fetching the list of events, the second to obtain a given user’s details from her <code class="language-plaintext highlighter-rouge">login</code>:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">import</span> <span class="nn">Http</span>
<span class="kr">import</span> <span class="nn">Json.Decode</span> <span class="k">as</span> <span class="n">Decode</span>
<span class="n">eventsRequest</span> <span class="o">:</span> <span class="kt">Http</span><span class="o">.</span><span class="kt">Request</span> <span class="p">(</span><span class="kt">List</span> <span class="kt">String</span><span class="p">)</span>
<span class="n">eventsRequest</span> <span class="o">=</span>
<span class="kt">Http</span><span class="o">.</span><span class="n">get</span> <span class="s">"https://api.github.com/events"</span>
<span class="p">(</span><span class="kt">Decode</span><span class="o">.</span><span class="n">list</span> <span class="p">(</span><span class="kt">Decode</span><span class="o">.</span><span class="n">at</span> <span class="p">[</span> <span class="s">"actor"</span><span class="p">,</span> <span class="s">"login"</span> <span class="p">]</span> <span class="kt">Decode</span><span class="o">.</span><span class="n">string</span><span class="p">))</span>
<span class="n">nameRequest</span> <span class="o">:</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Http</span><span class="o">.</span><span class="kt">Request</span> <span class="kt">String</span>
<span class="n">nameRequest</span> <span class="n">login</span> <span class="o">=</span>
<span class="kt">Http</span><span class="o">.</span><span class="n">get</span> <span class="p">(</span><span class="s">"https://api.github.com/users/"</span> <span class="o">++</span> <span class="n">login</span><span class="p">)</span>
<span class="p">(</span><span class="kt">Decode</span><span class="o">.</span><span class="n">at</span> <span class="p">[</span> <span class="s">"name"</span> <span class="p">]</span>
<span class="p">(</span><span class="kt">Decode</span><span class="o">.</span><span class="n">oneOf</span>
<span class="p">[</span> <span class="kt">Decode</span><span class="o">.</span><span class="n">string</span>
<span class="p">,</span> <span class="kt">Decode</span><span class="o">.</span><span class="n">null</span> <span class="s">"unspecified"</span>
<span class="p">]</span>
<span class="p">)</span>
<span class="p">)</span>
</code></pre></div></div>
<p>These two functions return <code class="language-plaintext highlighter-rouge">Http.Request</code> with the type of data they’ll retrieve and decode from the JSON body of their respective responses. <code class="language-plaintext highlighter-rouge">nameRequest</code> handles the case where Github users don’t have entered their full name yet, so the <code class="language-plaintext highlighter-rouge">name</code> field might be a <code class="language-plaintext highlighter-rouge">null</code>; as with the JavaScript version, we then default to <code class="language-plaintext highlighter-rouge">"unspecified"</code>.</p>
<p>That’s good but now we need to execute and chain these two requests, the second one depending on the result of the first one, where we retrieve the <code class="language-plaintext highlighter-rouge">actor.login</code> value of the event object.</p>
<p>Elm is a pure language, meaning you can’t have side effects in your functions (a side effect is when functions alter things outside of their scope and use these things: an HTTP request is a <em>huge</em> side effect). So your functions must return <em>something</em> that represents a given side effect, instead of executing it within the function scope itself. The Elm runtime will be in charge of actually performing the side effect, using a <a href="https://www.elm-tutorial.org/en/03-subs-cmds/02-commands.html">Command</a>.</p>
<p>In Elm, you’re usually going to use a <a href="http://package.elm-lang.org/packages/elm-lang/core/latest/Task">Task</a> to describe side effects. Tasks may succeed or fail (like Promises do in JavaScript), but they need to be turned into an [Elm command] to be actually executed.</p>
<p>To quote this <a href="http://ohanhi.com/tasks-in-modern-elm.html">excellent post on Tasks</a>:</p>
<blockquote>
<p>I find it helpful to think of tasks as if they were shopping lists. A shopping list contains detailed instructions of what should be fetched from the grocery store, but that doesn’t mean the shopping is done. I need to use the list while at the grocery store in order to get an end result</p>
</blockquote>
<p>But why do we need to convert a <code class="language-plaintext highlighter-rouge">Task</code> into a command you may ask? Because a command can execute a single thing at a time, so if you need to execute multiple side effects at once, you’ll need a single task that represents all these side effects.</p>
<p>So basically:</p>
<ol>
<li>We first craft <code class="language-plaintext highlighter-rouge">Http.Request</code>s,</li>
<li>We turn them into <code class="language-plaintext highlighter-rouge">Task</code>s we can chain,</li>
<li>We turn the resulting <code class="language-plaintext highlighter-rouge">Task</code> into a command,</li>
<li>This command is executed by the runtime, and we get a result</li>
</ol>
<p>The <a href="http://package.elm-lang.org/packages/elm-lang/http/latest/Http">Http</a> package provides <code class="language-plaintext highlighter-rouge">Http.toTask</code> to map an <code class="language-plaintext highlighter-rouge">Http.Request</code> into a <code class="language-plaintext highlighter-rouge">Task</code>. Let’s use that here:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fetchEvents</span> <span class="o">:</span> <span class="kt">Task</span> <span class="kt">Http</span><span class="o">.</span><span class="kt">Error</span> <span class="p">(</span><span class="kt">List</span> <span class="kt">String</span><span class="p">)</span>
<span class="n">fetchEvents</span> <span class="o">=</span>
<span class="n">eventsRequest</span> <span class="o">|></span> <span class="kt">Http</span><span class="o">.</span><span class="n">toTask</span>
<span class="n">fetchName</span> <span class="o">:</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Task</span> <span class="kt">Http</span><span class="o">.</span><span class="kt">Error</span> <span class="kt">String</span>
<span class="n">fetchName</span> <span class="n">login</span> <span class="o">=</span>
<span class="n">nameRequest</span> <span class="n">login</span> <span class="o">|></span> <span class="kt">Http</span><span class="o">.</span><span class="n">toTask</span>
</code></pre></div></div>
<p>I created these two simple functions mostly to focus on their return types; a <code class="language-plaintext highlighter-rouge">Task</code> must define an error type and a result type. For example, <code class="language-plaintext highlighter-rouge">fetchEvents</code> being an HTTP task, it will receive an <code class="language-plaintext highlighter-rouge">Http.Error</code> when the task fails, and a list of strings when the task succeeds.</p>
<p>But dealing with HTTP errors in a granular way being out of scope of this blog post, and in order to keep things as simple and concise as possible, I’m gonna use <code class="language-plaintext highlighter-rouge">Task.mapError</code> to turn complex HTTP errors into their string representations:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">toHttpTask</span> <span class="o">:</span> <span class="kt">Http</span><span class="o">.</span><span class="kt">Request</span> <span class="n">a</span> <span class="o">-></span> <span class="kt">Task</span> <span class="kt">String</span> <span class="n">a</span>
<span class="n">toHttpTask</span> <span class="n">request</span> <span class="o">=</span>
<span class="n">request</span>
<span class="o">|></span> <span class="kt">Http</span><span class="o">.</span><span class="n">toTask</span>
<span class="o">|></span> <span class="kt">Task</span><span class="o">.</span><span class="n">mapError</span> <span class="n">toString</span>
<span class="n">fetchEvents</span> <span class="o">:</span> <span class="kt">Task</span> <span class="kt">String</span> <span class="p">(</span><span class="kt">List</span> <span class="kt">String</span><span class="p">)</span>
<span class="n">fetchEvents</span> <span class="o">=</span>
<span class="n">toHttpTask</span> <span class="n">eventsRequest</span>
<span class="n">fetchName</span> <span class="o">:</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Task</span> <span class="kt">String</span> <span class="kt">String</span>
<span class="n">fetchName</span> <span class="n">login</span> <span class="o">=</span>
<span class="n">toHttpTask</span> <span class="p">(</span><span class="n">nameRequest</span> <span class="n">login</span><span class="p">)</span>
</code></pre></div></div>
<p>Here, <code class="language-plaintext highlighter-rouge">toHttpTask</code> is a helper turning an <code class="language-plaintext highlighter-rouge">Http.Request</code> into a <code class="language-plaintext highlighter-rouge">Task</code>, transforming the <code class="language-plaintext highlighter-rouge">Http.Error</code> complex type into a serialized, purely textual version of it: a <code class="language-plaintext highlighter-rouge">String</code>.</p>
<p>We’ll also need a function allowing to extract the very first element of a list, if any, as we did in JavaScript using <code class="language-plaintext highlighter-rouge">events[0]</code>. Such a function is builtin the <code class="language-plaintext highlighter-rouge">List</code> core module as <code class="language-plaintext highlighter-rouge">List.head</code>. And let’s make this function a <code class="language-plaintext highlighter-rouge">Task</code> too, as that will ease chaining everything together and allow us to expose an error message when the list is empty:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pickFirst</span> <span class="o">:</span> <span class="kt">List</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Task</span> <span class="kt">String</span> <span class="kt">String</span>
<span class="n">pickFirst</span> <span class="n">logins</span> <span class="o">=</span>
<span class="kr">case</span> <span class="kt">List</span><span class="o">.</span><span class="n">head</span> <span class="n">logins</span> <span class="kr">of</span>
<span class="kt">Just</span> <span class="n">login</span> <span class="o">-></span>
<span class="kt">Task</span><span class="o">.</span><span class="n">succeed</span> <span class="n">login</span>
<span class="kt">Nothing</span> <span class="o">-></span>
<span class="kt">Task</span><span class="o">.</span><span class="n">fail</span> <span class="s">"No events."</span>
</code></pre></div></div>
<p>Note the use of <code class="language-plaintext highlighter-rouge">Task.succeed</code> and <code class="language-plaintext highlighter-rouge">Task.fail</code>, which are approximately the Elm equivalents of <code class="language-plaintext highlighter-rouge">Promise.resolve</code> and <code class="language-plaintext highlighter-rouge">Promise.reject</code>: this is how you create tasks that succeed or fail immediately.</p>
<p>So in order to chain all the pieces we have so far, we obviously need <em>glue</em>. And this glue is the <code class="language-plaintext highlighter-rouge">Task.andThen</code> function, which can chain our tasks this fancy way:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fetchEvents</span>
<span class="o">|></span> <span class="kt">Task</span><span class="o">.</span><span class="n">andThen</span> <span class="n">pickFirst</span>
<span class="o">|></span> <span class="kt">Task</span><span class="o">.</span><span class="n">andThen</span> <span class="n">fetchName</span>
</code></pre></div></div>
<p>Neat. But wait. As we mentioned previously, Tasks are <em>descriptions</em> of side effects, not their actual execution. The <code class="language-plaintext highlighter-rouge">Task.attempt</code> function will help us doing that, by turning a <code class="language-plaintext highlighter-rouge">Task</code> into a <a href="https://www.elm-tutorial.org/en/03-subs-cmds/02-commands.html">Command</a>, provided we define a <code class="language-plaintext highlighter-rouge">Msg</code> that will be responsible of dealing with the received result:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">type</span> <span class="kt">Msg</span>
<span class="o">=</span> <span class="kt">Name</span> <span class="p">(</span><span class="kt">Result</span> <span class="kt">String</span> <span class="kt">String</span><span class="p">)</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">Result String String</code> reflects the result of the HTTP request and shares the same type definitions for both the error (a <code class="language-plaintext highlighter-rouge">String</code>) and the value (the user full name, a <code class="language-plaintext highlighter-rouge">String</code> too). Let’s use this <code class="language-plaintext highlighter-rouge">Msg</code> with <code class="language-plaintext highlighter-rouge">Task.attempt</code>:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fetchEvents</span>
<span class="o">|></span> <span class="kt">Task</span><span class="o">.</span><span class="n">andThen</span> <span class="n">pickFirst</span>
<span class="o">|></span> <span class="kt">Task</span><span class="o">.</span><span class="n">andThen</span> <span class="n">fetchName</span>
<span class="o">|></span> <span class="kt">Task</span><span class="o">.</span><span class="n">attempt</span> <span class="kt">Name</span>
</code></pre></div></div>
<p>Here:</p>
<ul>
<li>We start by fetching all the events,</li>
<li>Then if the Task succeeds, we pick the first event,</li>
<li>Then if we have one, we fetch the event’s user full name,</li>
<li>And we map the future result of this task to the <code class="language-plaintext highlighter-rouge">Name</code> message.</li>
</ul>
<p>The cool thing here is that if anything fails along the chain, the chain stops and the error will be propagated down to the <code class="language-plaintext highlighter-rouge">Name</code> handler. No need to check errors for each operation! Yes, that looks a lot like how JavaScript Promises’ <code class="language-plaintext highlighter-rouge">.catch</code> works.</p>
<p>Now, how are we going to execute the resulting command and process the result? We need to setup the <a href="https://guide.elm-lang.org/architecture/">Elm Architecture</a> and its good old <code class="language-plaintext highlighter-rouge">update</code> function:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">module</span> <span class="nn">Main</span> <span class="n">exposing</span> <span class="p">(</span><span class="n">main</span><span class="p">)</span>
<span class="kr">import</span> <span class="nn">Html</span> <span class="n">exposing</span> <span class="p">(</span><span class="o">..</span><span class="p">)</span>
<span class="kr">import</span> <span class="nn">Http</span>
<span class="kr">import</span> <span class="nn">Json.Decode</span> <span class="k">as</span> <span class="n">Decode</span>
<span class="kr">import</span> <span class="nn">Task</span> <span class="n">exposing</span> <span class="p">(</span><span class="kt">Task</span><span class="p">)</span>
<span class="kr">type</span> <span class="n">alias</span> <span class="kt">Model</span> <span class="o">=</span>
<span class="p">{</span> <span class="n">name</span> <span class="o">:</span> <span class="kt">Maybe</span> <span class="kt">String</span>
<span class="p">,</span> <span class="n">error</span> <span class="o">:</span> <span class="kt">String</span>
<span class="p">}</span>
<span class="kr">type</span> <span class="kt">Msg</span>
<span class="o">=</span> <span class="kt">Name</span> <span class="p">(</span><span class="kt">Result</span> <span class="kt">String</span> <span class="kt">String</span><span class="p">)</span>
<span class="n">eventsRequest</span> <span class="o">:</span> <span class="kt">Http</span><span class="o">.</span><span class="kt">Request</span> <span class="p">(</span><span class="kt">List</span> <span class="kt">String</span><span class="p">)</span>
<span class="n">eventsRequest</span> <span class="o">=</span>
<span class="kt">Http</span><span class="o">.</span><span class="n">get</span> <span class="s">"https://api.github.com/events"</span>
<span class="p">(</span><span class="kt">Decode</span><span class="o">.</span><span class="n">list</span> <span class="p">(</span><span class="kt">Decode</span><span class="o">.</span><span class="n">at</span> <span class="p">[</span> <span class="s">"actor"</span><span class="p">,</span> <span class="s">"login"</span> <span class="p">]</span> <span class="kt">Decode</span><span class="o">.</span><span class="n">string</span><span class="p">))</span>
<span class="n">nameRequest</span> <span class="o">:</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Http</span><span class="o">.</span><span class="kt">Request</span> <span class="kt">String</span>
<span class="n">nameRequest</span> <span class="n">login</span> <span class="o">=</span>
<span class="kt">Http</span><span class="o">.</span><span class="n">get</span> <span class="p">(</span><span class="s">"https://api.github.com/users/"</span> <span class="o">++</span> <span class="n">login</span><span class="p">)</span>
<span class="p">(</span><span class="kt">Decode</span><span class="o">.</span><span class="n">at</span> <span class="p">[</span> <span class="s">"name"</span> <span class="p">]</span>
<span class="p">(</span><span class="kt">Decode</span><span class="o">.</span><span class="n">oneOf</span>
<span class="p">[</span> <span class="kt">Decode</span><span class="o">.</span><span class="n">string</span>
<span class="p">,</span> <span class="kt">Decode</span><span class="o">.</span><span class="n">null</span> <span class="s">"unspecified"</span>
<span class="p">]</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="n">toHttpTask</span> <span class="o">:</span> <span class="kt">Http</span><span class="o">.</span><span class="kt">Request</span> <span class="n">a</span> <span class="o">-></span> <span class="kt">Task</span> <span class="kt">String</span> <span class="n">a</span>
<span class="n">toHttpTask</span> <span class="n">request</span> <span class="o">=</span>
<span class="n">request</span>
<span class="o">|></span> <span class="kt">Http</span><span class="o">.</span><span class="n">toTask</span>
<span class="o">|></span> <span class="kt">Task</span><span class="o">.</span><span class="n">mapError</span> <span class="n">toString</span>
<span class="n">fetchEvents</span> <span class="o">:</span> <span class="kt">Task</span> <span class="kt">String</span> <span class="p">(</span><span class="kt">List</span> <span class="kt">String</span><span class="p">)</span>
<span class="n">fetchEvents</span> <span class="o">=</span>
<span class="n">toHttpTask</span> <span class="n">eventsRequest</span>
<span class="n">fetchName</span> <span class="o">:</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Task</span> <span class="kt">String</span> <span class="kt">String</span>
<span class="n">fetchName</span> <span class="n">login</span> <span class="o">=</span>
<span class="n">toHttpTask</span> <span class="p">(</span><span class="n">nameRequest</span> <span class="n">login</span><span class="p">)</span>
<span class="n">pickFirst</span> <span class="o">:</span> <span class="kt">List</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Task</span> <span class="kt">String</span> <span class="kt">String</span>
<span class="n">pickFirst</span> <span class="n">events</span> <span class="o">=</span>
<span class="kr">case</span> <span class="kt">List</span><span class="o">.</span><span class="n">head</span> <span class="n">events</span> <span class="kr">of</span>
<span class="kt">Just</span> <span class="n">event</span> <span class="o">-></span>
<span class="kt">Task</span><span class="o">.</span><span class="n">succeed</span> <span class="n">event</span>
<span class="kt">Nothing</span> <span class="o">-></span>
<span class="kt">Task</span><span class="o">.</span><span class="n">fail</span> <span class="s">"No events."</span>
<span class="n">init</span> <span class="o">:</span> <span class="p">(</span> <span class="kt">Model</span><span class="p">,</span> <span class="kt">Cmd</span> <span class="kt">Msg</span> <span class="p">)</span>
<span class="n">init</span> <span class="o">=</span>
<span class="p">{</span> <span class="n">name</span> <span class="o">=</span> <span class="kt">Nothing</span><span class="p">,</span> <span class="n">error</span> <span class="o">=</span> <span class="s">""</span> <span class="p">}</span>
<span class="o">!</span> <span class="p">[</span> <span class="n">fetchEvents</span>
<span class="o">|></span> <span class="kt">Task</span><span class="o">.</span><span class="n">andThen</span> <span class="n">pickFirst</span>
<span class="o">|></span> <span class="kt">Task</span><span class="o">.</span><span class="n">andThen</span> <span class="n">fetchName</span>
<span class="o">|></span> <span class="kt">Task</span><span class="o">.</span><span class="n">attempt</span> <span class="kt">Name</span>
<span class="p">]</span>
<span class="n">update</span> <span class="o">:</span> <span class="kt">Msg</span> <span class="o">-></span> <span class="kt">Model</span> <span class="o">-></span> <span class="p">(</span> <span class="kt">Model</span><span class="p">,</span> <span class="kt">Cmd</span> <span class="kt">Msg</span> <span class="p">)</span>
<span class="n">update</span> <span class="n">msg</span> <span class="n">model</span> <span class="o">=</span>
<span class="kr">case</span> <span class="n">msg</span> <span class="kr">of</span>
<span class="kt">Name</span> <span class="p">(</span><span class="kt">Ok</span> <span class="n">name</span><span class="p">)</span> <span class="o">-></span>
<span class="p">{</span> <span class="n">model</span> <span class="o">|</span> <span class="n">name</span> <span class="o">=</span> <span class="kt">Just</span> <span class="n">name</span> <span class="p">}</span> <span class="o">!</span> <span class="kt">[]</span>
<span class="kt">Name</span> <span class="p">(</span><span class="kt">Err</span> <span class="n">error</span><span class="p">)</span> <span class="o">-></span>
<span class="p">{</span> <span class="n">model</span> <span class="o">|</span> <span class="n">error</span> <span class="o">=</span> <span class="n">error</span> <span class="p">}</span> <span class="o">!</span> <span class="kt">[]</span>
<span class="n">view</span> <span class="o">:</span> <span class="kt">Model</span> <span class="o">-></span> <span class="kt">Html</span> <span class="kt">Msg</span>
<span class="n">view</span> <span class="n">model</span> <span class="o">=</span>
<span class="n">div</span> <span class="kt">[]</span>
<span class="p">[</span> <span class="kr">if</span> <span class="n">model</span><span class="o">.</span><span class="n">error</span> <span class="o">/=</span> <span class="s">""</span> <span class="kr">then</span>
<span class="n">div</span> <span class="kt">[]</span>
<span class="p">[</span> <span class="n">h4</span> <span class="kt">[]</span> <span class="p">[</span> <span class="n">text</span> <span class="s">"Error encountered"</span> <span class="p">]</span>
<span class="p">,</span> <span class="n">pre</span> <span class="kt">[]</span> <span class="p">[</span> <span class="n">text</span> <span class="n">model</span><span class="o">.</span><span class="n">error</span> <span class="p">]</span>
<span class="p">]</span>
<span class="kr">else</span>
<span class="n">text</span> <span class="s">""</span>
<span class="p">,</span> <span class="n">p</span> <span class="kt">[]</span> <span class="p">[</span> <span class="n">text</span> <span class="o"><|</span> <span class="kt">Maybe</span><span class="o">.</span><span class="n">withDefault</span> <span class="s">"Fetching..."</span> <span class="n">model</span><span class="o">.</span><span class="n">name</span> <span class="p">]</span>
<span class="p">]</span>
<span class="n">main</span> <span class="o">=</span>
<span class="kt">Html</span><span class="o">.</span><span class="n">program</span>
<span class="p">{</span> <span class="n">init</span> <span class="o">=</span> <span class="n">init</span>
<span class="p">,</span> <span class="n">update</span> <span class="o">=</span> <span class="n">update</span>
<span class="p">,</span> <span class="n">subscriptions</span> <span class="o">=</span> <span class="n">always</span> <span class="kt">Sub</span><span class="o">.</span><span class="n">none</span>
<span class="p">,</span> <span class="n">view</span> <span class="o">=</span> <span class="n">view</span>
<span class="p">}</span>
</code></pre></div></div>
<p>That’s for sure more code than with the JavaScript example, but don’t forget that the Elm version renders HTML, not just logs in the console, and that the JavaScript code could be refactored to look a lot like the Elm version. Also the Elm version is fully typed and <em>safeguarded</em> against unforeseen problems, which makes a huge difference when your application grows.</p>
<p>As always, an <a href="https://ellie-app.com/7Q9svdqRGa1/3">Ellie</a> is publicly available so you can play around with the code.</p>Allo-MediaSometimes in Elm you struggle with the most basic things. Especially when you come from a JavaScript background, where chaining HTTP requests are relatively easy thanks to Promises or async/await.Simple disk encryption tutorial with archlinux2018-02-01T07:00:00+00:002018-02-01T07:00:00+00:00https://www.allo-media.net/en/tech/archlinux/2018/02/01/simple-disk-encryption-tutorial-with-archlinux<p>We all love <a href="https://www.archlinux.org/">archlinux</a>, or if we don’t, we’re using Fedora or Debian, and trolling is (almost) out of the scope of this article.</p>
<p>But let’s be honest, even if the <a href="http://wiki.archlinux.org/">wiki</a> is great, it can be intimidating sometimes. That’s what happened to me yesterday. Here at <a href="http://www.allo-media.net">AlloMedia</a>, for security reasons, we’re encrypting every laptop disk by default. As I’m using archlinux, I went to the wiki to follow how to “just” encrypt my disk. And well, <a href="https://wiki.archlinux.org/index.php/Disk_encryption">the page</a> is a little bit overcrowded, at the very least.</p>
<p>You have first to read about 10 pages of documentation, to learn that you now have to choose between 6 methods (<em>Loop-AES, dm-crypt +/- LUKS, Truecrypt, eCryptfs, EncFS</em>) and read every *#! page to understand which one you may want to choose. I’ve choosen for you.</p>
<h2 id="lvm-on-luks"><a href="https://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)">Lvm</a> on <a href="https://en.wikipedia.org/wiki/Linux_Unified_Key_Setup">Luks</a></h2>
<p>This is shipped with the kernel and seems to be the “default” on other distributions. It totally fits my needs: encrypt the whole system, swap included, and decrypt the system on boot using a passphrase.</p>
<p>If that’s what you want to do too, follow the white rabbit, Neo.</p>
<h2 id="following-the-rabbit">Following the rabbit</h2>
<p>We will assume that you can erase your disk and start with a fresh install, if it’s not the case, this article may not be for you. For the sake of this article, we will use <code class="language-plaintext highlighter-rouge">/dev/nvme0n1</code> as the main disk of the laptop. You may have something different like <code class="language-plaintext highlighter-rouge">/dev/sda</code>, that’s fine, just replace <code class="language-plaintext highlighter-rouge">/dev/nvme0n1</code> by <code class="language-plaintext highlighter-rouge">/dev/sda</code> in the rest of the article.</p>
<p>First, follow the <a href="https://wiki.archlinux.org/index.php/Installation_guide">Archlinux installation guide</a> to the point just before <strong>Format the partitions</strong>, where they are telling you to modify the partition tables using <strong>fdisk</strong> or <strong>parted</strong>. Here, you will need to erase all your partitions and create what’s needed for the encryption.</p>
<h3 id="clean-and-safely-erase-your-disk">Clean and safely erase your disk</h3>
<p>First, use <code class="language-plaintext highlighter-rouge">fdisk</code> or <code class="language-plaintext highlighter-rouge">gdisk</code> (if you’re using UEFI) to wipe out what’s on your disk, i.e. removing all existing partitions (of course, this will delete all the data on your disk…).</p>
<p>For example, for <code class="language-plaintext highlighter-rouge">gdisk</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdisk /dev/nvme0n1
GPT fdisk <span class="o">(</span>gdisk<span class="o">)</span> version 1.0.3
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR<span class="p">;</span> using GPT.
Command <span class="o">(</span>? <span class="k">for </span><span class="nb">help</span><span class="o">)</span>:
</code></pre></div></div>
<p>Use <code class="language-plaintext highlighter-rouge">p</code> to print your partition schema, and <code class="language-plaintext highlighter-rouge">d</code> to delete partitions. Once it’s done, use <code class="language-plaintext highlighter-rouge">w</code> to write your changes to the disk (that is to say, <strong>again</strong>, deleting all the data on your disk) and quit <code class="language-plaintext highlighter-rouge">gdisk</code>.</p>
<p>Every page on the archlinux wiki says you should first be sure that no previous data will still be readable on your disk (if you have a new computer with nothing on it, this doesn’t apply to you).</p>
<p>So we will put random stuff on our disk to be sure to overwrite everything that may still be on it. You can read the <a href="https://wiki.archlinux.org/index.php/Securely_wipe_disk#Random_data">wiki page</a> or just run the following command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dd if=/dev/urandom > /dev/nvme0n1
</code></pre></div></div>
<h3 id="partitionning">Partitionning</h3>
<p>We now have a clean disk, let’s create what’s needed for our encrypted system, that is to say 2 partitions: a partition for <code class="language-plaintext highlighter-rouge">/boot</code> (that will not be encrypted) and another one for our encrypted volumes (where we will later put <code class="language-plaintext highlighter-rouge">/</code> and our <code class="language-plaintext highlighter-rouge">swap</code>).</p>
<p>Here is what we want to have (output of my <code class="language-plaintext highlighter-rouge">gdisk</code> with the <code class="language-plaintext highlighter-rouge">p</code> command):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Number Start (sector) End (sector) Size Code Name
1 2048 1050623 512.0 MiB EF00 EFI System
2 1050624 1000215182 476.4 GiB 8E00 Linux LVM
</code></pre></div></div>
<p>First, create the partition where <code class="language-plaintext highlighter-rouge">/boot</code> will be mounted of type <code class="language-plaintext highlighter-rouge">8300</code> (512Mo is a good size) following the <a href="https://wiki.archlinux.org/index.php/EFI_System_Partition#Create_the_partition">archlinux wiki</a>. I’m assuming you’re using a system compatible with UEFI, if it’s not the case, you may want to document yourself a little bit more using the wiki. Format the partition using <em>FAT32</em>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mkfs.fat -F32 /dev/nvme0n1p1
</code></pre></div></div>
<p>Create the other partition of code <code class="language-plaintext highlighter-rouge">8E00</code> using the remaining space.</p>
<p>You should now have only 2 partitions, one for <code class="language-plaintext highlighter-rouge">/boot</code> that will not be encrypted, and another one that you will first encrypt, and then put your volumes on it (<code class="language-plaintext highlighter-rouge">/</code> and <code class="language-plaintext highlighter-rouge">swap</code>). In my case, the first partition that will be used for <code class="language-plaintext highlighter-rouge">/boot</code> is named <code class="language-plaintext highlighter-rouge">/dev/nvme0n1p1</code>, and the other one <code class="language-plaintext highlighter-rouge">/dev/nvme0n1p2</code>. You may have something like <code class="language-plaintext highlighter-rouge">/dev/sda1</code> and <code class="language-plaintext highlighter-rouge">/dev/sda2</code> if your partition naming scheme is not the same than mine.</p>
<p>You can then follow the (LVM on LUKS section)[https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_an_entire_system#LVM_on_LUKS] section.</p>
<p>I don’t like having separate partitions for <code class="language-plaintext highlighter-rouge">/</code> and <code class="language-plaintext highlighter-rouge">/home</code>. Every time I’ve done that, I always regretted the amount of space I allocated for each. So now, I’m only creating one <code class="language-plaintext highlighter-rouge">/</code> partition with everything inside.</p>
<p>In short, below are the commands you should be running for your encrypted volumes (I’m creating a 8Go swap partition).</p>
<p>Crypt the partition and open it with your key:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cryptsetup luksFormat <span class="nt">--type</span> luks2 /dev/nvme0n1p2
cryptsetup open /dev/nvme0n1p2 cryptolvm
</code></pre></div></div>
<p>Create the LVM volumes on it (swap and root):</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pvcreate /dev/mapper/cryptolvm
vgcreate MyVol /dev/mapper/cryptolvm
lvcreate <span class="nt">-L</span> 8G MyVol <span class="nt">-n</span> swap
lvcreate <span class="nt">-l</span> 100%FREE MyVol <span class="nt">-n</span> root
</code></pre></div></div>
<p>Format the root and swap volumes:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mkfs.ext4 /dev/mapper/MyVol-root
mkswap /dev/mapper/MyVol-swap
</code></pre></div></div>
<p>Mount the file systems:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mount /dev/mapper/MyVol-root /mnt
swapon /dev/mapper/MyVol-swap
</code></pre></div></div>
<p>The arch wiki tells you to format you boot partition using <code class="language-plaintext highlighter-rouge">ext2</code>, but for me this was a bad idea, as I want the UEFI manager of my Dell XPS 9550 to be able to boot on my <code class="language-plaintext highlighter-rouge">/boot</code> partition. So, as I said above, I formatted this partition using <code class="language-plaintext highlighter-rouge">FAT32</code>.</p>
<p>Mount the <code class="language-plaintext highlighter-rouge">/boot</code> partition:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir</span> /mnt/boot
mount /dev/nvme0n1p2 /mnt/boot
</code></pre></div></div>
<p>You can then follow the (<code class="language-plaintext highlighter-rouge">mkinitcpio</code> part of the archlinux wiki)[https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_an_entire_system#Configuring_mkinitcpio_2].</p>
<p>Be sure to have something like that in your <code class="language-plaintext highlighter-rouge">mkinitcpio.conf</code> file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>HOOKS=(... keyboard keymap block encrypt lvm2 ... filesystems ...)
</code></pre></div></div>
<p>Then continue to install you system normally. Of course, be sure to configure your grub accordingly to your encrypted setup by <a href="https://wiki.archlinux.org/index.php/Dm-crypt/System_configuration#Boot_loader">following the wiki</a>.</p>
<p>For the record, here is my <code class="language-plaintext highlighter-rouge">/etc/defaults/grub</code> file (it’s used to generate the <code class="language-plaintext highlighter-rouge">/boot/grub/grub.cfg</code> file by using <code class="language-plaintext highlighter-rouge">grub-mkconfig -o /boot/grub/grub.cfg</code>):</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># GRUB boot loader configuration</span>
<span class="nv">GRUB_DEFAULT</span><span class="o">=</span>0
<span class="nv">GRUB_TIMEOUT</span><span class="o">=</span>1
<span class="nv">GRUB_DISTRIBUTOR</span><span class="o">=</span><span class="s2">"Arch"</span>
<span class="nv">GRUB_CMDLINE_LINUX_DEFAULT</span><span class="o">=</span><span class="s2">"resume=/dev/mapper/MyVol-swap nouveau.modeset=0 i915.preliminary_hw_support=1 acpi_backlight=vendor acpi_osi=Linux"</span>
<span class="c">#GRUB_CMDLINE_LINUX_DEFAULT=""</span>
<span class="c">#GRUB_CMDLINE_LINUX=""</span>
<span class="nv">GRUB_CMDLINE_LINUX</span><span class="o">=</span><span class="s2">"cryptdevice=/dev/nvme0n1p2:cryptolvm"</span>
<span class="nv">GRUB_ENABLE_CRYPTODISK</span><span class="o">=</span>y
<span class="c"># Preload both GPT and MBR modules so that they are not missed</span>
<span class="nv">GRUB_PRELOAD_MODULES</span><span class="o">=</span><span class="s2">"part_gpt part_msdos"</span>
<span class="c"># Uncomment to enable booting from LUKS encrypted devices</span>
<span class="c">#GRUB_ENABLE_CRYPTODISK=y</span>
<span class="c"># Uncomment to enable Hidden Menu, and optionally hide the timeout count</span>
<span class="c">#GRUB_HIDDEN_TIMEOUT=5</span>
<span class="c">#GRUB_HIDDEN_TIMEOUT_QUIET=true</span>
<span class="c"># Uncomment to use basic console</span>
<span class="nv">GRUB_TERMINAL_INPUT</span><span class="o">=</span>console
<span class="c"># Uncomment to disable graphical terminal</span>
<span class="c">#GRUB_TERMINAL_OUTPUT=console</span>
<span class="c"># The resolution used on graphical terminal</span>
<span class="c"># note that you can use only modes which your graphic card supports via VBE</span>
<span class="c"># you can see them in real GRUB with the command `vbeinfo'</span>
<span class="nv">GRUB_GFXMODE</span><span class="o">=</span>auto
<span class="c"># Uncomment to allow the kernel use the same resolution used by grub</span>
<span class="nv">GRUB_GFXPAYLOAD_LINUX</span><span class="o">=</span>keep
<span class="c"># Uncomment if you want GRUB to pass to the Linux kernel the old parameter</span>
<span class="c"># format "root=/dev/xxx" instead of "root=/dev/disk/by-uuid/xxx"</span>
<span class="c">#GRUB_DISABLE_LINUX_UUID=true</span>
<span class="c"># Uncomment to disable generation of recovery mode menu entries</span>
<span class="nv">GRUB_DISABLE_RECOVERY</span><span class="o">=</span><span class="nb">true</span>
<span class="c"># Uncomment and set to the desired menu colors. Used by normal and wallpaper</span>
<span class="c"># modes only. Entries specified as foreground/background.</span>
<span class="c">#GRUB_COLOR_NORMAL="light-blue/black"</span>
<span class="c">#GRUB_COLOR_HIGHLIGHT="light-cyan/blue"</span>
<span class="c"># Uncomment one of them for the gfx desired, a image background or a gfxtheme</span>
<span class="c">#GRUB_BACKGROUND="/path/to/wallpaper"</span>
<span class="c">#GRUB_THEME="/path/to/gfxtheme"</span>
<span class="c"># Uncomment to get a beep at GRUB start</span>
<span class="c">#GRUB_INIT_TUNE="480 440 1"</span>
<span class="c"># Uncomment to make GRUB remember the last selection. This requires to</span>
<span class="c"># set 'GRUB_DEFAULT=saved' above.</span>
<span class="c">#GRUB_SAVEDEFAULT="true"</span>
</code></pre></div></div>
<p>Enjoy your encrypted archlinux!</p>Allo-MediaHere at AlloMedia, for security reasons, we're encrypting every laptop disk by default. As I'm using archlinux, I went to the wiki to follow how to "just" encrypt my disk. And well, the page is a little bit overcrowded, at the very least. Let's clarify that a little bit.