{"id":526,"date":"2022-08-22T16:13:40","date_gmt":"2022-08-22T16:13:40","guid":{"rendered":"https:\/\/cloudspert.com\/?p=526"},"modified":"2025-03-13T00:32:15","modified_gmt":"2025-03-13T00:32:15","slug":"containers-deep-dive-part-1","status":"publish","type":"post","link":"https:\/\/cloudspert.com\/?p=526","title":{"rendered":"Containers: Deep dive part 1"},"content":{"rendered":"<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/0*-BWWxt7hbLk7r1tI.jpg\"\/><\/figure>\n<p>Containers, containers, containers, containers. I\u2019m hearing this word more than my name theses days, what is so special about them, is it magic, or just a fugazi\u00a0?<\/p>\n<p>This is what i\u2019ll try to respond in theses series of articles.<\/p>\n<h3>What is\u00a0expected<\/h3>\n<p>To make the subject more simple and less boring, it will be divide it into several articles, where the main subjects are\u00a0:<\/p>\n<ul>\n<li>What is a container\u00a0: in this section you\u2019ll find out what features are used to create a container and what is simply a container.<\/li>\n<li>Container networking: this section we\u2019ll be dedicated to container networking, we\u2019ll discover how a container can connect to another container or to the internet.<\/li>\n<li>CNI(Container Network Interface): together we\u2019ll understand how CNI works in\u00a0detail.<\/li>\n<\/ul>\n<p>In All articles, we\u2019ll not stick with the theoretical side of each subject, i believe that to fully understand a subject you must go under the hood and do everything yourself, and this is what we\u2019ll try to do in each section. So what do i need\u00a0?<\/p>\n<h3><strong>Requirement<\/strong><\/h3>\n<p>In this series of article you\u2019ll need 2 Linux VMs with a distribution of your choice, i\u2019ll be using a <strong><em>Ubuntu<\/em><\/strong> 20.04 distribution.<\/p>\n<p>Lets goooo\u00a0!<\/p>\n<h3>What is a container\u00a0?<\/h3>\n<p>A container is a process, wait where are you going, we\u2019re not finished yet. Where were we, ah yes a container is a process but a special one, how special let\u2019s find out. But before understanding what makes it so special, lets pay a visit to the kernel\u00a0land.<\/p>\n<h4>Kernel Land<\/h4>\n<p>As you properly know or not. In Linux memory, we have two spaces where applications generally run, the kernel system space and the user space. Kernel space is protected and only kernel code is allowed to access it. On the other hand user space can be used by non kernel applications such as a browser or a text\u00a0editor.<\/p>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/767\/1*k5lOYVxcedbo711HkzCAOQ.png\"\/><figcaption>Kernel and User\u00a0Space<\/figcaption><\/figure>\n<p>So if both the user space can\u2019t access the kernel space, how a user space application can open a file located on a disk or send a ping\u00a0?<\/p>\n<p>The Answer is syscalls, syscalls are use by applications running in the user space to ask the kernel to do something, like opening a file, sending a network packet or creating a new\u00a0process.<\/p>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/1*tcKoxUP_O6Prs8_oQPKNsg.png\"\/><figcaption>System calls<\/figcaption><\/figure>\n<p>So as we can see a process running in the user space mode asks the kernel for multiples actions while executing, but this doesn\u2019t explain how can a process is\u00a0created.<\/p>\n<p>So how processes are created\u00a0?<\/p>\n<p>Each process is a fork of another process, for you who don\u2019t know what fork is. Fork is yet another syscall, this syscall can be called by processes(parents) in the user space to create new processes(childs).<\/p>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/543\/1*NtLqY22La2N_mtUb2dwj5w.png\"\/><figcaption>Process creation<\/figcaption><\/figure>\n<p>Each process is created as in the diagram\u00a0above:<\/p>\n<ol>\n<li>The parent process calls the clone syscal, with some flags the kernel will copy the memory section of <strong><em>app1, <\/em><\/strong>at this state the child and the father will have the same code to run. For example when you run in your bash <strong><em>ls<\/em><\/strong>, at this period the child still points to bash\u00a0code.<\/li>\n<li>To load the code of <strong>app2<\/strong>\u00a0, the exec syscall will load the code of <strong>app2. <\/strong>If we continue with the same example as in phase 1, with this syscall the <strong><em>ls <\/em><\/strong>code is\u00a0loaded.<\/li>\n<\/ol>\n<blockquote><p><strong><em>Note:<\/em><\/strong><\/p><\/blockquote>\n<blockquote><p>Some of you are familiar with the fork syscall as the way to create new processes. To clarify the clone syscall is the new fork, it does the same job but allows more control on the execution context of a process. Now the glibc forks function calls the clone syscall with flags that provide the same effect as the traditional fork syscall. But you don\u2019t need to know this boring\u00a0details<\/p><\/blockquote>\n<p>Now we knows how a process is created, but what makes a container so special than a normal process, the difference is that a container is a process isolated from the rest of other process, this isolation can be at one or multiple level, some of the well know isolation are: network, mount, IPC(Inter Process Communication), PID and so on. But what makes this isolation possible\u00a0? the answer is the kernel, using the namespaces feature.<\/p>\n<p>So what will change if we drew the same diagram\u00a0again<\/p>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/607\/1*gHGOVBC6NUsiPzJkDrTBug.png\"\/><figcaption>Container creation<\/figcaption><\/figure>\n<p>As we can see, the procedure is the same, the difference is in the flags passed to the clone syscall, some of the know flags used to create new namespaces are:<\/p>\n<ul>\n<li><strong>CLONE_NEWIPC: <\/strong>create the process in a new IPC namespace, that means that the process can\u2019t send signals like kill to other processes in the host namespace or other namespaces<\/li>\n<li><strong>CLONE_NEWNET: <\/strong>create the process in a new network namespace, by doing so the process will have its own network tables(routing,arp), own network interface, own network configuration.<\/li>\n<li><strong>CLONE_NEWNS: <\/strong>create the process in a new mount namespace, this will hide host mounts from the\u00a0process.<\/li>\n<li><strong>CLONE_NEWPID: <\/strong>create the process in a new PID namespace, this will prevent the process from seeing other processes, and reuse process ID already used on the host or other namespaces.<\/li>\n<li><strong>CLONE_NEWUSER: <\/strong>create the process in a new user namespace, with this a user <strong><em>user<\/em><\/strong> on the container is not the same as the user <strong><em>user<\/em><\/strong> on the\u00a0host.<\/li>\n<\/ul>\n<p>Clone is not the only syscall used to isolate a processes, there are other syscall too\u00a0:<\/p>\n<ul>\n<li><strong>unshare: <\/strong>This system call is actually the same as clone but the difference is that this syscall will create and move the current process to a new namespaces but clone will create a new process with new namespaces.<\/li>\n<li><strong>setns: <\/strong>This system call allows the running process to join an existing namespace.<\/li>\n<\/ul>\n<p>Now the devil behind containers is unveiled, for the kernel containers\u00a0do\u00a0not\u00a0exist, it is all bunch of namespaces that isolate process like the movie inception where the actor thinks that he is in the realer world but instead he is in dream, and we\u2019re not going to speak level two inception (containers in container), which is possible as well with namespace.<\/p>\n<p>In the end i think that <a href=\"https:\/\/medium.com\/@jpetazzo\"><strong>J\u00e9r\u00f4me Petazzoni<\/strong><\/a><strong> <\/strong>tweet sum it\u00a0all<\/p>\n<h3>J\u00e9r\u00f4me Petazzoni on Twitter: \u00ab\u00a0\u00a0\u00bbContainers are processes,born from tarballs,anchored to namespaces,controlled by cgroups\u00a0\u00bb\ud83d\udcaf@alicegoldfuss #VelocityConf \/ Twitter\u00a0\u00bb<\/h3>\n<p>\u00ab\u00a0Containers are processes,born from tarballs,anchored to namespaces,controlled by cgroups\u00a0\u00bb\ud83d\udcaf@alicegoldfuss #VelocityConf<\/p>\n<h3>What is\u00a0next<\/h3>\n<p>In the next article we will use the notions that we learned today and use it to create a container from scratch, it is not that hard you\u2019ll see, until next\u00a0time.<\/p>\n<h3>Bey!!!!!<\/h3>\n<h3>Resources<\/h3>\n<ul>\n<li><a href=\"https:\/\/man7.org\/linux\/man-pages\/man2\/setns.2.html\">setns(2) &#8211; Linux manual page<\/a><\/li>\n<li><a href=\"https:\/\/man7.org\/linux\/man-pages\/man2\/clone.2.html\">clone(2) &#8211; Linux manual page<\/a><\/li>\n<li><a href=\"https:\/\/man7.org\/linux\/man-pages\/man7\/namespaces.7.html\">namespaces(7) &#8211; Linux manual page<\/a><\/li>\n<li><a href=\"https:\/\/man7.org\/linux\/man-pages\/man2\/unshare.2.html\">unshare(2) &#8211; Linux manual page<\/a><\/li>\n<li><a href=\"https:\/\/man7.org\/linux\/man-pages\/man3\/exec.3.html\">exec(3) &#8211; Linux manual page<\/a><\/li>\n<\/ul>\n<p><img decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=dd5a56743a65\" width=\"1\" height=\"1\" alt=\"\"\/><\/p>","protected":false},"excerpt":{"rendered":"<p>Containers, containers, containers, containers. I\u2019m hearing this word more than my name theses days, what is so special about them, is it magic, or just a fugazi\u00a0? This is what i\u2019ll try to respond in theses series of articles. What is\u00a0expected To make the subject more simple and less boring, it will be divide it into several articles, where the main subjects are\u00a0: What is a container\u00a0: in this section you\u2019ll find out what features are used to create a container and what is simply a container. Container networking: this section we\u2019ll be dedicated to container networking, we\u2019ll discover how a container can connect to another container or to the internet. CNI(Container Network Interface): together we\u2019ll understand how CNI works in\u00a0detail. In All articles, we\u2019ll not stick with the theoretical side of each subject, i believe that to fully understand a subject you must go under the hood and do everything yourself, and this is what we\u2019ll try to do in each section. So what do i need\u00a0? Requirement In this series of article you\u2019ll need 2 Linux VMs with a distribution of your choice, i\u2019ll be using a Ubuntu 20.04 distribution. Lets goooo\u00a0! What is a container\u00a0? A container is a process, wait where are you going, we\u2019re not finished yet. Where were we, ah yes a container is a process but a special one, how special let\u2019s find out. But before understanding what makes it so special, lets pay a visit to the kernel\u00a0land. Kernel Land As you properly know or not. In Linux memory, we have two spaces where applications generally run, the kernel system space and the user space. Kernel space is protected and only kernel code is allowed to access it. On the other hand user space can be used by non kernel applications such as a browser or a text\u00a0editor. Kernel and User\u00a0Space So if both the user space can\u2019t access the kernel space, how a user space application can open a file located on a disk or send a ping\u00a0? The Answer is syscalls, syscalls are use by applications running in the user space to ask the kernel to do something, like opening a file, sending a network packet or creating a new\u00a0process. System calls So as we can see a process running in the user space mode asks the kernel for multiples actions while executing, but this doesn\u2019t explain how can a process is\u00a0created. So how processes are created\u00a0? Each process is a fork of another process, for you who don\u2019t know what fork is. Fork is yet another syscall, this syscall can be called by processes(parents) in the user space to create new processes(childs). Process creation Each process is created as in the diagram\u00a0above: The parent process calls the clone syscal, with some flags the kernel will copy the memory section of app1, at this state the child and the father will have the same code to run. For example when you run in your bash ls, at this period the child still points to bash\u00a0code. To load the code of app2\u00a0, the exec syscall will load the code of app2. If we continue with the same example as in phase 1, with this syscall the ls code is\u00a0loaded. Note: Some of you are familiar with the fork syscall as the way to create new processes. To clarify the clone syscall is the new fork, it does the same job but allows more control on the execution context of a process. Now the glibc forks function calls the clone syscall with flags that provide the same effect as the traditional fork syscall. But you don\u2019t need to know this boring\u00a0details Now we knows how a process is created, but what makes a container so special than a normal process, the difference is that a container is a process isolated from the rest of other process, this isolation can be at one or multiple level, some of the well know isolation are: network, mount, IPC(Inter Process Communication), PID and so on. But what makes this isolation possible\u00a0? the answer is the kernel, using the namespaces feature. So what will change if we drew the same diagram\u00a0again Container creation As we can see, the procedure is the same, the difference is in the flags passed to the clone syscall, some of the know flags used to create new namespaces are: CLONE_NEWIPC: create the process in a new IPC namespace, that means that the process can\u2019t send signals like kill to other processes in the host namespace or other namespaces CLONE_NEWNET: create the process in a new network namespace, by doing so the process will have its own network tables(routing,arp), own network interface, own network configuration. CLONE_NEWNS: create the process in a new mount namespace, this will hide host mounts from the\u00a0process. CLONE_NEWPID: create the process in a new PID namespace, this will prevent the process from seeing other processes, and reuse process ID already used on the host or other namespaces. CLONE_NEWUSER: create the process in a new user namespace, with this a user user on the container is not the same as the user user on the\u00a0host. Clone is not the only syscall used to isolate a processes, there are other syscall too\u00a0: unshare: This system call is actually the same as clone but the difference is that this syscall will create and move the current process to a new namespaces but clone will create a new process with new namespaces. setns: This system call allows the running process to join an existing namespace. Now the devil behind containers is unveiled, for the kernel containers\u00a0do\u00a0not\u00a0exist, it is all bunch of namespaces that isolate process like the movie inception where the actor thinks that he is in the realer world but instead he is in dream, and we\u2019re not going to speak level two inception (containers in container), which is possible as well with namespace. In the end i think that J\u00e9r\u00f4me Petazzoni tweet sum it\u00a0all J\u00e9r\u00f4me Petazzoni on Twitter: \u00ab\u00a0\u00a0\u00bbContainers are<\/p>\n","protected":false},"author":3,"featured_media":632,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[14],"tags":[],"class_list":["post-526","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-containers","entry","has-media"],"jetpack_featured_media_url":"https:\/\/cloudspert.com\/wp-content\/uploads\/2022\/08\/1.webp","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/cloudspert.com\/index.php?rest_route=\/wp\/v2\/posts\/526","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cloudspert.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cloudspert.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cloudspert.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/cloudspert.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=526"}],"version-history":[{"count":1,"href":"https:\/\/cloudspert.com\/index.php?rest_route=\/wp\/v2\/posts\/526\/revisions"}],"predecessor-version":[{"id":633,"href":"https:\/\/cloudspert.com\/index.php?rest_route=\/wp\/v2\/posts\/526\/revisions\/633"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cloudspert.com\/index.php?rest_route=\/wp\/v2\/media\/632"}],"wp:attachment":[{"href":"https:\/\/cloudspert.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=526"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cloudspert.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=526"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cloudspert.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=526"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}