![]() |
s i s t e m a o p e r a c i o n a l m a g n u x l i n u x | ~/ · documentação · suporte · sobre |
Next
Previous
Contents
7. The internals.This section's aim is to provide some information, not needed to reach a basic understanding on how multicast works nor to be able to write multicast programs, but which is very interesting, gives some insight on the underlying multicast protocols and implementations, and may be useful to avoid common errors and misunderstandings.
7.1 IGMP.When talking about What actually happens is that hosts instruct their routers telling them which multicast groups they are interested in; then, those routers tell their up-stream routers they want to receive that traffic, and so on. Algorithms employed for making the decision of when to ask for a group's traffic or saying that it is not desired anymore, vary a lot. There's something, however, that never changes: how this information is transmitted. IGMP is used for that. It stands for Internet Group Management Protocol. It is a new protocol, similar in many aspects to ICMP, with a protocol number of 2, whose messages are carried in IP datagrams, and which all level 2-compliant host are required to implement. As said before, it is used both by hosts giving membership information to
its routers, and by routers to communicate between themselves. In the
following I'll cover only the hosts-routers relationships, mainly because
I was unable to find information describing router to router communication
other than the mrouted source code (rfc 1075 describing the Distance Vector
Multicast Routing Protocol is now obsoleted, and IGMP version 0 is specified in RFC-988 which is now obsoleted. Almost no one uses version 0 now. IGMP version 1 is described in RFC-1112 and, although it is updated by RFC-2236 (IGMP version 2) it is in wide use still. The Linux kernel implements the full IGMP version 1 and parts of version 2 requirements, but not all. Now I'll try to give an informal description of the protocol. You can check RFC-2236 for an in-proof formal description, with lots of state diagrams and time-out boundaries. All IGMP messages have the following structure: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Max Resp Time | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ IGMP version 1 (hereinafter IGMPv1) labels the "Max Resp Time" as "Unused", zeroes it when sent, and ignores it when received. Also, it brakes the "Type" field in two 4-bits wide fields: "Version" and "Type". As IGMPv1 identifies a "Membership Query" message as 0x11 (version 1, type 1) and IGMPv2 as 0x11 too, the 8 bits have the same effective interpretation. I think it is more instructive to give first the IGMPv1 description and next point out the IGMPv2 additions, as they are mainly that, additions. For the following discussions it is important to remember that multicast routers receive all IP multicast datagrams.
IGMP version 1.Routers periodically send IGMP Host Membership Queries to the all-hosts group (224.0.0.1) with a TTL of 1 (once every minute or two). All multicast-capable hosts hear them, but don't answer immediately to avoid an IGMP Host Membership Report storm. Instead, they start a random delay timer for each group they belong to on the interface they received the query. Sooner or later, the timer expires in one of the hosts, and it sends an IGMP Host Membership Report (also with TTL 1) to the multicast address of the group being reported. As it is sent to the group, all hosts that joined the group -and which are currently waiting for their own timer to expire- receive it, too. Then, they stop their timers and don't generate any other report. Just one is generated -by the host that chose the smaller timeout-, and that is enough for the router. It only needs to know that there are members for that group in the subnet, not how many nor which. When no reports are received for a given group after a certain number of queries, the router assumes that no members are left, and thus it doesn't have to forward traffic for that group on that subnet. Note that in IGMPv1 there are no "Leave Group messages". When a host joins a new group, the kernel sends a report for that
group, so that the respective process needs not to wait a minute or two
until a new membership query is received. As you can see this IGMP packet
is generated by the kernel as a response to the Host Membership Queries are identified by Type 0x11, and Host Membership Reports by Type 0x12. No reports are sent for the all-hosts group. Membership in this group is permanent.
IGMP version 2.One important addition to the above is the inclusion of a Leave Group message (Type 0x17). The reason is to reduce the bandwidth waste between the time the last host in the subnet drops membership and the time the router times-out for its queries and decides there are no more members present for that group (leave latency). Leave Group messages should be addressed to the all-routers group (224.0.0.2) rather than to the group being left, as that information is of no use for other members (kernel versions up to 2.0.33 send them to the group; although it does no harm to the hosts, it's a waste of time as they have to process them, but don't gain useful information). There are certain subtle details regarding when and when-not to send Leave Messages; if interested, see the RFC. When an IGMPv2 router receives a Leave Message for a group, it sends Group-Specific Queries to the group being left. This is another addition. IGMPv1 has no group-specific queries. All queries are sent to the all-hosts group. The Type in the IGMP header does not change (0x11, as before), but the "Group Address" is filled with the address of the multicast group being left. The "Max Resp Time" field, which was set to 0 in transmission and ignored on reception in IGMPv1, is meaningful only in "Membership Query" messages. It gives the maximum time allowed before sending a report in units of 1/10 second. It is used as a tune mechanism. IGMPv2 adds another message type: 0x16. It is a "Version 2 Membership Report" sent by IGMPv2 hosts if they detect an IGMPv2 router is present (an IGMPv2 host knows an IGMPv1 router is present when it receives a query with the "Max Response" field set to 0). When more than one router claims to act as querier, IGMPv2 provides a mechanism to avoid "discussions": the router with the lowest IP address is designed to be querier. The other routers keep timeouts. If the router with lower IP address crashes or is shutdown, the decision of who will be the querier is taken again after the timers expire.
7.2 Kernel corner.This sub-section gives some start-points to study the multicast implementation of the Linux kernel. It does not explain that implementation. It just says where to find things. The study was carried over version 2.0.32, so it could be a bit outdated by the time you read it (network code seems to have changed A LOT in 2.1.x releases, for instance). Multicast code in the Linux kernel is always surrounded by
You might want multicast features, but if your Linux box is not going to
act as a multicast router you will probably not want multicast router features
included in your new kernel. For this you have the multicast routing code
surrounded by Kernel sources are usually placed in /usr/src/linux. However, the place
may change so, both for accuracy and brevity, I will refer to the
root directory of the kernel sources as just LINUX. Then, something like
All multicast interfaces with user programs shown in the section devoted
to multicast programming were driven across the The one which interests us is In #ifdef CONFIG_IP_MULTICAST sk->ip_mc_loop=1; sk->ip_mc_ttl=1; *sk->ip_mc_name=0; sk->ip_mc_list=NULL; #endif Also, the assertion of "closing a socket makes the kernel drop all memberships this socket had" is corroborated by: #ifdef CONFIG_IP_MULTICAST /* Applications forget to leave groups before exiting */ ip_mc_drop_socket(sk); #endif taken from inet_release() , on the same file as before.
Device independent operations for the Link Layer are kept in
Two important functions are still missing: the processing of input and
output multicast datagrams. As any other datagrams, incoming datagrams are
passed from the device drivers to the Code in charge of out-putting packets is kept in
While working with
As routed multicast datagrams can be received/sent across either physical
interfaces or tunnels, a common abstraction for both was devised: VIFs,
Virtual InterFaces. VIFs are added with struct vifctl { vifi_t vifc_vifi; /* Index of VIF */ unsigned char vifc_flags; /* VIFF_ flags */ unsigned char vifc_threshold; /* ttl limit */ unsigned int vifc_rate_limit; /* Rate limiter values (NI) */ struct in_addr vifc_lcl_addr; /* Our address */ struct in_addr vifc_rmt_addr; /* IPIP tunnel addr */ }; With this information a struct vif_device { struct device *dev; /* Device we are using */ struct route *rt_cache; /* Tunnel route cache */ unsigned long bytes_in,bytes_out; unsigned long pkt_in,pkt_out; /* Statistics */ unsigned long rate_limit; /* Traffic shaping (NI) */ unsigned char threshold; /* TTL threshold */ unsigned short flags; /* Control flags */ unsigned long local,remote; /* Addresses(remote for tunnels)*/ }; Note the struct ip_mc_list* ip_mc_list; /* IP multicast filter chain */ The struct ip_mc_list { struct device *interface; unsigned long multiaddr; struct ip_mc_list *next; struct timer_list timer; short tm_running; short reporter; int users; }; So, the #ifdef CONFIG_IP_MULTICAST if(!(dev->flags&IFF_ALLMULTI) && brd==IS_MULTICAST && iph->daddr!=IGMP_ALL_HOSTS && !(dev->flags&IFF_LOOPBACK)) { /* * Check it is for one of our groups */ struct ip_mc_list *ip_mc=dev->ip_mc_list; do { if(ip_mc==NULL) { kfree_skb(skb, FREE_WRITE); return 0; } if(ip_mc->multiaddr==iph->daddr) break; ip_mc=ip_mc->next; } while(1); } #endif The for(i=dev->ip_mc_list;i!=NULL;i=i->next) { if(i->multiaddr==addr) { i->users++; return; } } When dropping memberships, the counter is decremented and additional operations
are performed only when the count reaches 0 (
struct mfcctl { struct in_addr mfcc_origin; /* Origin of mcast */ struct in_addr mfcc_mcastgrp; /* Group in question */ vifi_t mfcc_parent; /* Where it arrived */ unsigned char mfcc_ttls[MAXVIFS]; /* Where it is going */ }; With all this information in hand, Function IGMP functions are implemented in
Next Previous Contents |