Commit graph

41757 commits

Author SHA1 Message Date
Eric Dumazet
e13e02a3c6 net_sched: SFB flow scheduler
This is the Stochastic Fair Blue scheduler, based on work from :

W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.

http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf

This implementation is based on work done by Juliusz Chroboczek

General SFB algorithm can be found in figure 14, page 15:

B[l][n] : L x N array of bins (L levels, N bins per level)
enqueue()
Calculate hash function values h{0}, h{1}, .. h{L-1}
Update bins at each level
for i = 0 to L - 1
   if (B[i][h{i}].qlen > bin_size)
      B[i][h{i}].p_mark += p_increment;
   else if (B[i][h{i}].qlen == 0)
      B[i][h{i}].p_mark -= p_decrement;
p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
if (p_min == 1.0)
    ratelimit();
else
    mark/drop with probabilty p_min;

I did the adaptation of Juliusz code to meet current kernel standards,
and various changes to address previous comments :

http://thread.gmane.org/gmane.linux.network/90225
http://thread.gmane.org/gmane.linux.network/90375

Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
we can use an external flow classifier if wanted.

tc qdisc add dev $DEV parent 1:11 handle 11:  \
        est 0.5sec 2sec sfb limit 128

tc filter add dev $DEV protocol ip parent 11: handle 3 \
        flow hash keys dst divisor 1024

Notes:

1) SFB default child qdisc is pfifo_fast. It can be changed by another
qdisc but a child qdisc MUST not drop a packet previously queued. This
is because SFB needs to handle a dequeued packet in order to maintain
its virtual queue states. pfifo_head_drop or CHOKe should not be used.

2) ECN is enabled by default, unlike RED/CHOKe/GRED

With help from Patrick McHardy & Andi Kleen

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-23 14:05:11 -08:00
David S. Miller
dee9f4bceb net: Make flow cache paths use a const struct flowi.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 18:44:31 -08:00
David S. Miller
0730b9a150 net: Mark flowi arg to flow_cache_uli_match() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 18:27:22 -08:00
David S. Miller
b520e9f616 xfrm: Mark flowi arg to xfrm_state_find() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 18:24:19 -08:00
David S. Miller
e33f770426 xfrm: Mark flowi arg to security_xfrm_state_pol_flow_match() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 18:13:15 -08:00
David S. Miller
e1ad2ab2cf xfrm: Mark flowi arg to xfrm_selector_match() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 18:07:39 -08:00
David S. Miller
1744a8fe09 xfrm: Mark token args to addr_match() const.
Also, make it return a real bool.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 18:02:12 -08:00
David S. Miller
8f029de281 xfrm: Mark flowi arg to xfrm_type->reject() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 17:59:59 -08:00
David S. Miller
73e5ebb20f xfrm: Mark flowi arg to ->init_tempsel() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 17:51:44 -08:00
David S. Miller
0c7b3eefb4 xfrm: Mark flowi arg to ->fill_dst() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 17:48:57 -08:00
David S. Miller
05d8402576 xfrm: Mark flowi arg to ->get_tos() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 17:47:10 -08:00
David S. Miller
e8a4e37716 xfrm: Mark flowi arg const in key extraction helpers.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 17:42:56 -08:00
Eric Dumazet
eaefd1105b net: add __rcu annotations to sk_wq and wq
Add proper RCU annotations/verbs to sk_wq and wq members

Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)

Fix sunrpc sk_sleep() abuse too

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 10:19:31 -08:00
Shan Wei
089c34827e tcp: Remove debug macro of TCP_CHECK_TIMER
Now, TCP_CHECK_TIMER is not used for debuging, it does nothing.
And, it has been there for several years, maybe 6 years.

Remove it to keep code clearer.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-20 11:10:14 -08:00
David S. Miller
da935c66ba Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/e1000e/netdev.c
	net/xfrm/xfrm_policy.c
2011-02-19 19:17:35 -08:00
John Fastabend
226111d1fb net: dcb: match dcb_app protocol field with 802.1Qaz spec
The dcb_app protocol field is a __u32 however the 802.1Qaz
specification defines it as a 16 bit field. This patch brings
the structure inline with the spec making it a __u16.

CC: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-19 19:00:50 -08:00
David S. Miller
ece639caa3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2011-02-19 16:42:37 -08:00
Linus Torvalds
0cc9d52578 Merge branch 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  RTC: Re-enable UIE timer/polling emulation
  RTC: Revert UIE emulation removal
  RTC: Release mutex in error path of rtc_alarm_irq_enable
2011-02-18 14:20:46 -08:00
David S. Miller
fd23c3b311 ipv4: Add hash table of interface addresses.
This will be used to optimize __ip_dev_find() and friends.

With help from Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-18 12:42:28 -08:00
Linus Torvalds
bc3adfc670 Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long
  workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
  workqueue: wake up a worker when a rescuer is leaving a gcwq
2011-02-18 12:36:06 -08:00
Linus Torvalds
3c18d4de86 Expand CONFIG_DEBUG_LIST to several other list operations
When list debugging is enabled, we aim to readably show list corruption
errors, and the basic list_add/list_del operations end up having extra
debugging code in them to do some basic validation of the list entries.

However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
the debug code due to how they were written. This fixes that.

So the _next_ time we have list_move() problems with stale list entries,
we'll hopefully have an easier time finding them..

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-18 11:32:28 -08:00
David S. Miller
982721f391 ipv4: Use const'ify fib_result deep in the route call chains.
The only troublesome bit here is __mkroute_output which wants
to override res->fi and res->type, compute those in local
variables instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 15:54:42 -08:00
David S. Miller
b6bf3ca032 ipv4: Mark fib_combine_itag()'s 'res' arg as const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 15:52:59 -08:00
David S. Miller
3c7bd1a140 net: Add initial_ref arg to dst_alloc().
This allows avoiding multiple writes to the initial __refcnt.

The most simplest cases of wanting an initial reference of "1"
in ipv4 and ipv6 have been converted, the rest have been left
along and kept at the existing "0".

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 15:44:00 -08:00
John Stultz
456d66ecd0 RTC: Re-enable UIE timer/polling emulation
This patch re-enables UIE timer/polling emulation for rtc devices
that do not support alarm irqs.

CC: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-02-17 14:59:42 -08:00
John Stultz
6e57b1d6a8 RTC: Revert UIE emulation removal
Uwe pointed out that my alarm based UIE emulation is not sufficient
to replace the older timer/polling based UIE emulation on devices
where there is no alarm irq. This causes rtc devices without alarms
to return -EINVAL to UIE ioctls. The fix is to re-instate the old
timer/polling method for devices without alarm irqs.

This patch reverts the following commits:
042620a018 - Remove UIE emulation
1daeddd596 - Cleanup removed UIE emulation declaration
b5cc8ca1c9 - Remove Kconfig symbol for UIE emulation

The emulation mode will still need to be wired-in with a following
patch before it will work.

CC: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-02-17 14:59:41 -08:00
Michał Mirosław
e83d360d9a net: introduce NETIF_F_RXCSUM
Introduce NETIF_F_RXCSUM to replace device-private flags for RX checksum
offload. Integrate it with ndo_fix_features.

ethtool_op_get_rx_csum() is removed altogether as nothing in-tree uses it.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:35 -08:00
Michał Mirosław
5455c6998d net: Introduce new feature setting ops
This introduces a new framework to handle device features setting.
It consists of:
  - new fields in struct net_device:
	+ hw_features - features that hw/driver supports toggling
	+ wanted_features - features that user wants enabled, when possible
  - new netdev_ops:
	+ feat = ndo_fix_features(dev, feat) - API checking constraints for
		enabling features or their combinations
	+ ndo_set_features(dev) - API updating hardware state to match
		changed dev->features
  - new ethtool commands:
	+ ETHTOOL_GFEATURES/ETHTOOL_SFEATURES: get/set dev->wanted_features
		and trigger device reconfiguration if resulting dev->features
		changed
	+ ETHTOOL_GSTRINGS(ETH_SS_FEATURES): get feature bits names (meaning)

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:33 -08:00
Michał Mirosław
0a41770477 ethtool: factorize get/set_one_feature
This allows to enable GRO even if RX csum is disabled. GRO will not
be used for packets without hardware checksum anyway.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:33 -08:00
Michał Mirosław
212b573f55 ethtool: enable GSO and GRO by default
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:32 -08:00
Florian Westphal
d503b30bd6 netfilter: tproxy: do not assign timewait sockets to skb->sk
Assigning a socket in timewait state to skb->sk can trigger
kernel oops, e.g. in nfnetlink_log, which does:

if (skb->sk) {
        read_lock_bh(&skb->sk->sk_callback_lock);
        if (skb->sk->sk_socket && skb->sk->sk_socket->file) ...

in the timewait case, accessing sk->sk_callback_lock and sk->sk_socket
is invalid.

Either all of these spots will need to add a test for sk->sk_state != TCP_TIME_WAIT,
or xt_TPROXY must not assign a timewait socket to skb->sk.

This does the latter.

If a TW socket is found, assign the tproxy nfmark, but skip the skb->sk assignment,
thus mimicking behaviour of a '-m socket .. -j MARK/ACCEPT' re-routing rule.

The 'SYN to TW socket' case is left unchanged -- we try to redirect to the
listener socket.

Cc: Balazs Scheidler <bazsi@balabit.hu>
Cc: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-17 11:32:38 +01:00
Linus Torvalds
a2640111d5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] qla2xxx: Return DID_NO_CONNECT when FC device is lost.
  [SCSI] mptfusion: Bump version 03.04.18
  [SCSI] mptfusion: Fix Incorrect return value in mptscsih_dev_reset
  [SCSI] mptfusion: mptctl_release is required in mptctl.c
  [SCSI] target: fix use after free detected by SLUB poison
  [SCSI] target: Remove procfs based target_core_mib.c code
  [SCSI] target: Fix SCF_SCSI_CONTROL_SG_IO_CDB breakage
  [SCSI] target: Fix top-level configfs_subsystem default_group shutdown breakage
  [SCSI] target: fixed missing lock drop in error path
  [SCSI] target: Fix demo-mode MappedLUN shutdown UA/PR breakage
  [SCSI] target/iblock: Fix failed bd claim NULL pointer dereference
  [SCSI] target: iblock/pscsi claim checking for NULL instead of IS_ERR
  [SCSI] scsi_debug: Fix 32-bit overflow in do_device_access causing memory corruption
  [SCSI] qla2xxx: Change from irq to irqsave with host_lock
  [SCSI] qla2xxx: Fix race that could hang kthread_stop()
2011-02-16 09:07:00 -08:00
Tejun Heo
58a69cb47e workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
There are two spellings in use for 'freeze' + 'able' - 'freezable' and
'freezeable'.  The former is the more prominent one.  The latter is
mostly used by workqueue and in a few other odd places.  Unify the
spelling to 'freezable'.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
2011-02-16 17:48:59 +01:00
Andrea Arcangeli
a7d6e4ecdb thp: prevent hugepages during args/env copying into the user stack
Transparent hugepages can only be created if rmap is fully
functional. So we must prevent hugepages to be created while
is_vma_temporary_stack() is true.

This also optmizes away some harmless but unnecessary setting of
khugepaged_scan.address and it switches some BUG_ON to VM_BUG_ON.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-15 15:21:11 -08:00
Linus Torvalds
1cecd791f2 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix text_poke_smp_batch() deadlock
  perf tools: Fix thread_map event synthesizing in top and record
  watchdog, nmi: Lower the severity of error messages
  ARM: oprofile: Fix backtraces in timer mode
  oprofile: Fix usage of CONFIG_HW_PERF_EVENTS for oprofile_perf_init and friends
2011-02-15 10:18:48 -08:00
Linus Torvalds
87450bd55d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: matrix_keypad - increase the limit of rows and columns
  Input: wacom - fix error path in wacom_probe()
  Input: ads7846 - check proper condition when freeing gpio
  Revert "Input: do not pass injected events back to the originating handler"
  Input: sysrq - rework re-inject logic
  Input: serio - clear pending rescans after sysfs driver rebind
  Input: rotary_encoder - use proper irqflags
  Input: wacom_w8001 - report resolution to userland
2011-02-15 09:40:27 -08:00
Ingo Molnar
a252852afa Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent 2011-02-15 04:10:35 +01:00
Baruch Siach
d606ef3fe0 phy/micrel: add ability to support 50MHz RMII clock on KZS8051RNL
Platform code can now set the MICREL_PHY_50MHZ_CLK bit of dev_flags in a fixup
routine (registered with phy_register_fixup_for_uid()), to make the KZS8051RNL
PHY work with 50MHz RMII reference clock.

Cc: David J. Choi <david.choi@micrel.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-14 17:38:30 -08:00
Jiri Pirko
fbaec0ea54 rtnetlink: implement setting of master device
This patch allows userspace to enslave/release slave devices via netlink
interface using IFLA_MASTER. This introduces generic way to add/remove
underling devices.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-13 16:58:39 -08:00
David Miller
795abaf1e4 klist: Fix object alignment on 64-bit.
Commit c0e69a5bbc ("klist.c: bit 0 in pointer can't be used as flag")
intended to make sure that all klist objects were at least pointer size
aligned, but used the constant "4" which only works on 32-bit.

Use "sizeof(void *)" which is correct in all cases.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: stable <stable@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-13 16:54:24 -08:00
Jiri Pirko
1765a57533 net: make dev->master general
dev->master is now tightly connected to bonding driver. This patch makes
this pointer more general and ready to be used by others.

 - netdev_set_master() - bond specifics moved to new function
   netdev_set_bond_master()
 - introduced netif_is_bond_slave() to check if device is a bonding slave

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-13 10:42:07 -08:00
Jiri Pirko
d59cfde2fb net: remove the unnecessary dance around skb_bond_should_drop
No need to check (master) twice and to drive in and out the header file.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-13 10:42:06 -08:00
Nicholas Bellinger
1f6fe7cba1 [SCSI] target: fix use after free detected by SLUB poison
This patch moves a large number of memory release paths inside of the
configfs callback target_core_hba_item_ops->release() called from
within fs/configfs/item.c: config_item_cleanup() context.  This patch
resolves the SLUB 'Poison overwritten' warnings.

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-02-12 12:32:41 -06:00
Nicholas Bellinger
e89d15eead [SCSI] target: Remove procfs based target_core_mib.c code
This patch removes the legacy procfs based target_core_mib.c code,
and moves the necessary scsi_index_tables functions and defines into
target_core_transport.c and target_core_base.h code to allow existing
fabric independent statistics to function.

This includes the removal of a handful of 'atomic_t mib_ref_count'
counters used in struct se_node_acl, se_session and se_hba to prevent
removal while using seq_list procfs walking logic.

[jejb: fix up compile failures]
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-02-12 12:15:47 -06:00
John Fastabend
d033d526a4 ixgbe: DCB, implement 802.1Qaz routines
Implements 802.1Qaz support for ixgbe driver. Additionally,
this adds IEEE_8021QAZ_TSA_{} defines to dcbnl.h this is to
avoid having to use cryptic numeric codes for the TSA type.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-02-11 08:47:15 -08:00
Trilok Soni
cfaea56741 Input: matrix_keypad - increase the limit of rows and columns
Some keyboard controllers support more than 16 columns and rows.
Increase the limit to 32.

Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2011-02-11 01:01:23 -08:00
Chris Wright
6037b715d6 security: add cred argument to security_capable()
Expand security_capable() to include cred, so that it can be usable in a
wider range of call sites.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
2011-02-11 17:41:58 +11:00
David S. Miller
6431cbc25f inet: Create a mechanism for upward inetpeer propagation into routes.
If we didn't have a routing cache, we would not be able to properly
propagate certain kinds of dynamic path attributes, for example
PMTU information and redirects.

The reason is that if we didn't have a routing cache, then there would
be no way to lookup all of the active cached routes hanging off of
sockets, tunnels, IPSEC bundles, etc.

Consider the case where we created a cached route, but no inetpeer
entry existed and also we were not asked to pre-COW the route metrics
and therefore did not force the creation a new inetpeer entry.

If we later get a PMTU message, or a redirect, and store this
information in a new inetpeer entry, there is no way to teach that
cached route about the newly existing inetpeer entry.

The facilities implemented here handle this problem.

First we create a generation ID.  When we create a cached route of any
kind, we remember the generation ID at the time of attachment.  Any
time we force-create an inetpeer entry in response to new path
information, we bump that generation ID.

The dst_ops->check() callback is where the knowledge of this event
is propagated.  If the global generation ID does not equal the one
stored in the cached route, and the cached route has not attached
to an inetpeer yet, we look it up and attach if one is found.  Now
that we've updated the cached route's information, we update the
route's generation ID too.

This clears the way for implementing PMTU and redirects directly in
the inetpeer cache.  There is absolutely no need to consult cached
route information in order to maintain this information.

At this point nothing bumps the inetpeer genids, that comes in the
later changes which handle PMTUs and redirects using inetpeers.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-10 13:33:41 -08:00
David S. Miller
ddd4aa424b inetpeer: Add redirect and PMTU discovery cached info.
Validity of the cached PMTU information is indicated by it's
expiration value being non-zero, just as per dst->expires.

The scheme we will use is that we will remember the pre-ICMP value
held in the metrics or route entry, and then at expiration time
we will restore that value.

In this way PMTU expiration does not kill off the cached route as is
done currently.

Redirect information is permanent, or at least until another redirect
is received.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-10 13:29:30 -08:00
David S. Miller
7a71ed899e inetpeer: Abstract address representation further.
Future changes will add caching information, and some of
these new elements will be addresses.

Since the family is implicit via the ->daddr.family member,
replicating the family in ever address we store is entirely
redundant.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-10 13:22:28 -08:00