commit 1d326a94d1039e4543edd97cbaf0fc38d2cefbb5 Author: Greg Kroah-Hartman Date: Thu Oct 18 09:13:26 2018 +0200 Linux 4.9.134 commit 5a0f340f5ad6a6cc6518f212802f95b669e8fe27 Author: Dan Carpenter Date: Wed Oct 10 12:30:17 2018 -0700 ipv4: frags: precedence bug in ip_expire() (commit 70837ffe3085c9a91488b52ca13ac84424da1042 upstream) We accidentally removed the parentheses here, but they are required because '!' has higher precedence than '&'. Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.") Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 85e59af99a7f7c9bcd089f2404b405df7ee665ba Author: Taehee Yoo Date: Wed Oct 10 12:30:16 2018 -0700 ip: frags: fix crash in ip_do_fragment() commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream A kernel crash occurrs when defragmented packet is fragmented in ip_do_fragment(). In defragment routine, skb_orphan() is called and skb->ip_defrag_offset is set. but skb->sk and skb->ip_defrag_offset are same union member. so that frag->sk is not NULL. Hence crash occurrs in skb->sk check routine in ip_do_fragment() when defragmented packet is fragmented. test commands: %iptables -t nat -I POSTROUTING -j MASQUERADE %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000 splat looks like: [ 261.069429] kernel BUG at net/ipv4/ip_output.c:636! [ 261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3 [ 261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600 [ 261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c [ 261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202 [ 261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004 [ 261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8 [ 261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395 [ 261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4 [ 261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000 [ 261.174169] FS: 00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000 [ 261.183012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0 [ 261.198158] Call Trace: [ 261.199018] ? dst_output+0x180/0x180 [ 261.205011] ? save_trace+0x300/0x300 [ 261.209018] ? ip_copy_metadata+0xb00/0xb00 [ 261.213034] ? sched_clock_local+0xd4/0x140 [ 261.218158] ? kill_l4proto+0x120/0x120 [nf_conntrack] [ 261.223014] ? rt_cpu_seq_stop+0x10/0x10 [ 261.227014] ? find_held_lock+0x39/0x1c0 [ 261.233008] ip_finish_output+0x51d/0xb50 [ 261.237006] ? ip_fragment.constprop.56+0x220/0x220 [ 261.243011] ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack] [ 261.250152] ? rcu_is_watching+0x77/0x120 [ 261.255010] ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4] [ 261.261033] ? nf_hook_slow+0xb1/0x160 [ 261.265007] ip_output+0x1c7/0x710 [ 261.269005] ? ip_mc_output+0x13f0/0x13f0 [ 261.273002] ? __local_bh_enable_ip+0xe9/0x1b0 [ 261.278152] ? ip_fragment.constprop.56+0x220/0x220 [ 261.282996] ? nf_hook_slow+0xb1/0x160 [ 261.287007] raw_sendmsg+0x21f9/0x4420 [ 261.291008] ? dst_output+0x180/0x180 [ 261.297003] ? sched_clock_cpu+0x126/0x170 [ 261.301003] ? find_held_lock+0x39/0x1c0 [ 261.306155] ? stop_critical_timings+0x420/0x420 [ 261.311004] ? check_flags.part.36+0x450/0x450 [ 261.315005] ? _raw_spin_unlock_irq+0x29/0x40 [ 261.320995] ? _raw_spin_unlock_irq+0x29/0x40 [ 261.326142] ? cyc2ns_read_end+0x10/0x10 [ 261.330139] ? raw_bind+0x280/0x280 [ 261.334138] ? sched_clock_cpu+0x126/0x170 [ 261.338995] ? check_flags.part.36+0x450/0x450 [ 261.342991] ? __lock_acquire+0x4500/0x4500 [ 261.348994] ? inet_sendmsg+0x11c/0x500 [ 261.352989] ? dst_output+0x180/0x180 [ 261.357012] inet_sendmsg+0x11c/0x500 [ ... ] v2: - clear skb->sk at reassembly routine.(Eric Dumarzet) Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.") Suggested-by: Eric Dumazet Signed-off-by: Taehee Yoo Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4077ddb2cb48ca4592d738ea37cd58c5d41754bd Author: Peter Oskolkov Date: Wed Oct 10 12:30:15 2018 -0700 ip: process in-order fragments efficiently This patch changes the runtime behavior of IP defrag queue: incoming in-order fragments are added to the end of the current list/"run" of in-order fragments at the tail. On some workloads, UDP stream performance is substantially improved: RX: ./udp_stream -F 10 -T 2 -l 60 TX: ./udp_stream -c -H -F 10 -T 5 -l 60 with this patchset applied on a 10Gbps receiver: throughput=9524.18 throughput_units=Mbit/s upstream (net-next): throughput=4608.93 throughput_units=Mbit/s Reported-by: Willem de Bruijn Signed-off-by: Peter Oskolkov Cc: Eric Dumazet Cc: Florian Westphal Signed-off-by: David S. Miller (cherry picked from commit a4fd284a1f8fd4b6c59aa59db2185b1e17c5c11c) Signed-off-by: Greg Kroah-Hartman commit e9e4ac488c017739b2832177550ba2569fffc709 Author: Peter Oskolkov Date: Wed Oct 10 12:30:14 2018 -0700 ip: add helpers to process in-order fragments faster. This patch introduces several helper functions/macros that will be used in the follow-up patch. No runtime changes yet. The new logic (fully implemented in the second patch) is as follows: * Nodes in the rb-tree will now contain not single fragments, but lists of consecutive fragments ("runs"). * At each point in time, the current "active" run at the tail is maintained/tracked. Fragments that arrive in-order, adjacent to the previous tail fragment, are added to this tail run without triggering the re-balancing of the rb-tree. * If a fragment arrives out of order with the offset _before_ the tail run, it is inserted into the rb-tree as a single fragment. * If a fragment arrives after the current tail fragment (with a gap), it starts a new "tail" run, as is inserted into the rb-tree at the end as the head of the new run. skb->cb is used to store additional information needed here (suggested by Eric Dumazet). Reported-by: Willem de Bruijn Signed-off-by: Peter Oskolkov Cc: Eric Dumazet Cc: Florian Westphal Signed-off-by: David S. Miller (cherry picked from commit 353c9cb360874e737fb000545f783df756c06f9a) Signed-off-by: Greg Kroah-Hartman commit 10043954eadac2d8f8c1886190f7a7ee584ff939 Author: Peter Oskolkov Date: Wed Oct 10 12:30:13 2018 -0700 ip: use rb trees for IP frag queue. (commit fa0f527358bd900ef92f925878ed6bfbd51305cc upstream) Similar to TCP OOO RX queue, it makes sense to use rb trees to store IP fragments, so that OOO fragments are inserted faster. Tested: - a follow-up patch contains a rather comprehensive ip defrag self-test (functional) - ran neper `udp_stream -c -H -F 100 -l 300 -T 20`: netstat --statistics Ip: 282078937 total packets received 0 forwarded 0 incoming packets discarded 946760 incoming packets delivered 18743456 requests sent out 101 fragments dropped after timeout 282077129 reassemblies required 944952 packets reassembled ok 262734239 packet reassembles failed (The numbers/stats above are somewhat better re: reassemblies vs a kernel without this patchset. More comprehensive performance testing TBD). Reported-by: Jann Horn Reported-by: Juha-Matti Tilli Suggested-by: Eric Dumazet Signed-off-by: Peter Oskolkov Signed-off-by: Eric Dumazet Cc: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b475cf3bf1e8212b0287c6d15249e2c942693ae5 Author: Eric Dumazet Date: Wed Oct 10 12:30:12 2018 -0700 net: add rb_to_skb() and other rb tree helpers Geeralize private netem_rb_to_skb() TCP rtx queue will soon be converted to rb-tree, so we will need skb_rbtree_walk() helpers. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 18a4c0eab2623cc95be98a1e6af1ad18e7695977) Signed-off-by: Greg Kroah-Hartman commit 791521e2e377f66ef5ee6e5002dec758234d8d32 Author: Eric Dumazet Date: Wed Oct 10 12:30:11 2018 -0700 net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends After working on IP defragmentation lately, I found that some large packets defeat CHECKSUM_COMPLETE optimization because of NIC adding zero paddings on the last (small) fragment. While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed to CHECKSUM_NONE, forcing a full csum validation, even if all prior fragments had CHECKSUM_COMPLETE set. We can instead compute the checksum of the part we are trimming, usually smaller than the part we keep. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 88078d98d1bb085d72af8437707279e203524fa5) Signed-off-by: Greg Kroah-Hartman commit a8444b1ccb20339774af58e40ad42296074fb484 Author: Florian Westphal Date: Wed Oct 10 12:30:10 2018 -0700 ipv6: defrag: drop non-last frags smaller than min mtu don't bother with pathological cases, they only waste cycles. IPv6 requires a minimum MTU of 1280 so we should never see fragments smaller than this (except last frag). v3: don't use awkward "-offset + len" v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68). There were concerns that there could be even smaller frags generated by intermediate nodes, e.g. on radio networks. Cc: Peter Oskolkov Cc: Eric Dumazet Signed-off-by: Florian Westphal Signed-off-by: David S. Miller (cherry picked from commit 0ed4229b08c13c84a3c301a08defdc9e7f4467e6) Signed-off-by: Greg Kroah-Hartman commit 871695951ec6f6b0b1a258c9bb5336bfeffab409 Author: Peter Oskolkov Date: Wed Oct 10 12:30:09 2018 -0700 net: modify skb_rbtree_purge to return the truesize of all purged skbs. Tested: see the next patch is the series. Suggested-by: Eric Dumazet Signed-off-by: Peter Oskolkov Signed-off-by: Eric Dumazet Cc: Florian Westphal Signed-off-by: David S. Miller (cherry picked from commit 385114dec8a49b5e5945e77ba7de6356106713f4) Signed-off-by: Greg Kroah-Hartman commit f5d17b55f4be318adf3b642b50bd25e5245ecc17 Author: Eric Dumazet Date: Wed Oct 10 12:30:08 2018 -0700 net: speed up skb_rbtree_purge() As measured in my prior patch ("sch_netem: faster rb tree removal"), rbtree_postorder_for_each_entry_safe() is nice looking but much slower than using rb_next() directly, except when tree is small enough to fit in CPU caches (then the cost is the same) Also note that there is not even an increase of text size : $ size net/core/skbuff.o.before net/core/skbuff.o text data bss dec hex filename 40711 1298 0 42009 a419 net/core/skbuff.o.before 40711 1298 0 42009 a419 net/core/skbuff.o From: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 7c90584c66cc4b033a3b684b0e0950f79e7b7166) Signed-off-by: Greg Kroah-Hartman commit 82f36cbc74595f06900f478d4eaf7217a4f06e13 Author: Peter Oskolkov Date: Wed Oct 10 12:30:07 2018 -0700 ip: discard IPv4 datagrams with overlapping segments. This behavior is required in IPv6, and there is little need to tolerate overlapping fragments in IPv4. This change simplifies the code and eliminates potential DDoS attack vectors. Tested: ran ip_defrag selftest (not yet available uptream). Suggested-by: David S. Miller Signed-off-by: Peter Oskolkov Signed-off-by: Eric Dumazet Cc: Florian Westphal Acked-by: Stephen Hemminger Signed-off-by: David S. Miller (cherry picked from commit 7969e5c40dfd04799d4341f1b7cd266b6e47f227) Signed-off-by: Greg Kroah-Hartman commit d838486621c38f084b867743a0abd0968c6cb196 Author: Eric Dumazet Date: Wed Oct 10 12:30:06 2018 -0700 inet: frags: fix ip6frag_low_thresh boundary Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches, since linker might place next to it a non zero value preventing a change to ip6frag_low_thresh. ip6frag_low_thresh is not used anymore in the kernel, but we do not want to prematuraly break user scripts wanting to change it. Since specifying a minimal value of 0 for proc_doulongvec_minmax() is moot, let's remove these zero values in all defrag units. Fixes: 6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh") Signed-off-by: Eric Dumazet Reported-by: Maciej Żenczykowski Signed-off-by: David S. Miller (cherry picked from commit 3d23401283e80ceb03f765842787e0e79ff598b7) Signed-off-by: Greg Kroah-Hartman commit 316986fe4dcac011b4f85d9bbef1edf4953c0219 Author: Eric Dumazet Date: Wed Oct 10 12:30:05 2018 -0700 inet: frags: get rid of ipfrag_skb_cb/FRAG_CB ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately this integer is currently in a different cache line than skb->next, meaning that we use two cache lines per skb when finding the insertion point. By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields in a single cache line and save precious memory bandwidth. Note that after the fast path added by Changli Gao in commit d6bebca92c66 ("fragment: add fast path for in-order fragments") this change wont help the fast path, since we still need to access prev->len (2nd cache line), but will show great benefits when slow path is entered, since we perform a linear scan of a potentially long list. Also, note that this potential long list is an attack vector, we might consider also using an rb-tree there eventually. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit bf66337140c64c27fa37222b7abca7e49d63fb57) Signed-off-by: Greg Kroah-Hartman commit 5b68fda0a455be7f48fdf97407de1aa09d046fdd Author: Eric Dumazet Date: Wed Oct 10 12:30:04 2018 -0700 inet: frags: reorganize struct netns_frags Put the read-mostly fields in a separate cache line at the beginning of struct netns_frags, to reduce false sharing noticed in inet_frag_kill() Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit c2615cf5a761b32bf74e85bddc223dfff3d9b9f0) Signed-off-by: Greg Kroah-Hartman commit 6060bcdcffaba68c3ff158a88faab6df27210ffc Author: Eric Dumazet Date: Wed Oct 10 12:30:03 2018 -0700 rhashtable: reorganize struct rhashtable layout While under frags DDOS I noticed unfortunate false sharing between @nelems and @params.automatic_shrinking Move @nelems at the end of struct rhashtable so that first cache line is shared between all cpus, because almost never dirtied. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit e5d672a0780d9e7118caad4c171ec88b8299398d) Signed-off-by: Greg Kroah-Hartman commit cbc45497b39c4626adaeca2a409588f19ae19e34 Author: Eric Dumazet Date: Wed Oct 10 12:30:02 2018 -0700 ipv6: frags: rewrite ip6_expire_frag_queue() Make it similar to IPv4 ip_expire(), and release the lock before calling icmp functions. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 05c0b86b9696802fd0ce5676a92a63f1b455bdf3) Signed-off-by: Greg Kroah-Hartman commit 7a87ec92d36a660820d426d8c54794c44077277f Author: Eric Dumazet Date: Wed Oct 10 12:30:01 2018 -0700 inet: frags: do not clone skb in ip_expire() An skb_clone() was added in commit ec4fbd64751d ("inet: frag: release spinlock before calling icmp_send()") While fixing the bug at that time, it also added a very high cost for DDOS frags, as the ICMP rate limit is applied after this expensive operation (skb_clone() + consume_skb(), implying memory allocations, copy, and freeing) We can use skb_get(head) here, all we want is to make sure skb wont be freed by another cpu. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 1eec5d5670084ee644597bd26c25e22c69b9f748) Signed-off-by: Greg Kroah-Hartman commit 7f6170683223cb38cabaff21ecbb9a6375ad10f6 Author: Eric Dumazet Date: Wed Oct 10 12:30:00 2018 -0700 inet: frags: break the 2GB limit for frags storage Some users are willing to provision huge amounts of memory to be able to perform reassembly reasonnably well under pressure. Current memory tracking is using one atomic_t and integers. Switch to atomic_long_t so that 64bit arches can use more than 2GB, without any cost for 32bit arches. Note that this patch avoids an overflow error, if high_thresh was set to ~2GB, since this test in inet_frag_alloc() was never true : if (... || frag_mem_limit(nf) > nf->high_thresh) Tested: $ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh $ grep FRAG /proc/net/sockstat FRAG: inuse 14705885 memory 16000002880 $ nstat -n ; sleep 1 ; nstat | grep Reas IpReasmReqds 3317150 0.0 IpReasmFails 3317112 0.0 Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 3e67f106f619dcfaf6f4e2039599bdb69848c714) Signed-off-by: Greg Kroah-Hartman commit 965e2adc5850836586e0961c350b94c2092da319 Author: Eric Dumazet Date: Wed Oct 10 12:29:59 2018 -0700 inet: frags: remove inet_frag_maybe_warn_overflow() This function is obsolete, after rhashtable addition to inet defrag. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 2d44ed22e607f9a285b049de2263e3840673a260) Signed-off-by: Greg Kroah-Hartman commit 49106f36c253a3c4ce7cf297415826af0c4339ea Author: Eric Dumazet Date: Wed Oct 10 12:29:58 2018 -0700 inet: frags: get rif of inet_frag_evicting() This refactors ip_expire() since one indentation level is removed. Note: in the future, we should try hard to avoid the skb_clone() since this is a serious performance cost. Under DDOS, the ICMP message wont be sent because of rate limits. Fact that ip6_expire_frag_queue() does not use skb_clone() is disturbing too. Presumably IPv6 should have the same issue than the one we fixed in commit ec4fbd64751d ("inet: frag: release spinlock before calling icmp_send()") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 399d1404be660d355192ff4df5ccc3f4159ec1e4) Signed-off-by: Greg Kroah-Hartman commit ea7496f018adcfbac5396ead5756dcabb9866749 Author: Eric Dumazet Date: Wed Oct 10 12:29:57 2018 -0700 inet: frags: remove some helpers Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem() Also since we use rhashtable we can bring back the number of fragments in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was removed in commit 434d305405ab ("inet: frag: don't account number of fragment queues") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 6befe4a78b1553edb6eed3a78b4bcd9748526672) Signed-off-by: Greg Kroah-Hartman commit 23ce9c5ce704b985dad79bce944a348f0c205869 Author: Eric Dumazet Date: Wed Oct 10 12:29:56 2018 -0700 inet: frags: use rhashtables for reassembly units Some applications still rely on IP fragmentation, and to be fair linux reassembly unit is not working under any serious load. It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!) A work queue is supposed to garbage collect items when host is under memory pressure, and doing a hash rebuild, changing seed used in hash computations. This work queue blocks softirqs for up to 25 ms when doing a hash rebuild, occurring every 5 seconds if host is under fire. Then there is the problem of sharing this hash table for all netns. It is time to switch to rhashtables, and allocate one of them per netns to speedup netns dismantle, since this is a critical metric these days. Lookup is now using RCU. A followup patch will even remove the refcount hold/release left from prior implementation and save a couple of atomic operations. Before this patch, 16 cpus (16 RX queue NIC) could not handle more than 1 Mpps frags DDOS. After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB of storage for the fragments (exact number depends on frags being evicted after timeout) $ grep FRAG /proc/net/sockstat FRAG: inuse 1966916 memory 2140004608 A followup patch will change the limits for 64bit arches. Signed-off-by: Eric Dumazet Cc: Kirill Tkhai Cc: Herbert Xu Cc: Florian Westphal Cc: Jesper Dangaard Brouer Cc: Alexander Aring Cc: Stefan Schmidt Signed-off-by: David S. Miller (cherry picked from commit 648700f76b03b7e8149d13cc2bdb3355035258a9) Signed-off-by: Greg Kroah-Hartman commit fb19348bd709e3f948825ed995bdc477a0414772 Author: Eric Dumazet Date: Wed Oct 10 12:29:55 2018 -0700 rhashtable: add schedule points Rehashing and destroying large hash table takes a lot of time, and happens in process context. It is safe to add cond_resched() in rhashtable_rehash_table() and rhashtable_free_and_destroy() Signed-off-by: Eric Dumazet Acked-by: Herbert Xu Signed-off-by: David S. Miller (cherry picked from commit ae6da1f503abb5a5081f9f6c4a6881de97830f3e) Signed-off-by: Greg Kroah-Hartman commit 620018dd713da51daac7ec4cd0ae54b0f0fa0f75 Author: Eric Dumazet Date: Wed Oct 10 12:29:54 2018 -0700 ipv6: export ip6 fragments sysctl to unprivileged users IPv4 was changed in commit 52a773d645e9 ("net: Export ip fragment sysctl to unprivileged users") The only sysctl that is not per-netns is not used : ip6frag_secret_interval Signed-off-by: Eric Dumazet Cc: Nikolay Borisov Signed-off-by: David S. Miller (cherry picked from commit 18dcbe12fe9fca0ab825f7eff993060525ac2503) Signed-off-by: Greg Kroah-Hartman commit 1b363f81f38f28bd69ec90837da0f65161f36325 Author: Eric Dumazet Date: Wed Oct 10 12:29:53 2018 -0700 inet: frags: refactor lowpan_net_frag_init() We want to call lowpan_net_frag_init() earlier. Similar to commit "inet: frags: refactor ipv6_frag_init()" This is a prereq to "inet: frags: use rhashtables for reassembly units" Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 807f1844df4ac23594268fa9f41902d0549e92aa) Signed-off-by: Greg Kroah-Hartman commit dae73e7d73fce8d8d5132ec3c94de16280653fc6 Author: Eric Dumazet Date: Wed Oct 10 12:29:52 2018 -0700 inet: frags: refactor ipv6_frag_init() We want to call inet_frags_init() earlier. This is a prereq to "inet: frags: use rhashtables for reassembly units" Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 5b975bab23615cd0fdf67af6c9298eb01c4b9f61) Signed-off-by: Greg Kroah-Hartman commit bbf6d8604475f36279c7b2d9a1f425654bc24588 Author: Eric Dumazet Date: Wed Oct 10 12:29:51 2018 -0700 inet: frags: refactor ipfrag_init() We need to call inet_frags_init() before register_pernet_subsys(), as a prereq for following patch ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 483a6e4fa055123142d8956866fe2aa9c98d546d) Signed-off-by: Greg Kroah-Hartman commit 2ffb1c363dfa89858dded0b291f005faf1b72adc Author: Eric Dumazet Date: Wed Oct 10 12:29:50 2018 -0700 inet: frags: add a pointer to struct netns_frags In order to simplify the API, add a pointer to struct inet_frags. This will allow us to make things less complex. These functions no longer have a struct inet_frags parameter : inet_frag_destroy(struct inet_frag_queue *q /*, struct inet_frags *f */) inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */) inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */) inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */) ip6_expire_frag_queue(struct net *net, struct frag_queue *fq) Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 093ba72914b696521e4885756a68a3332782c8de) Signed-off-by: Greg Kroah-Hartman commit 7fca77153c5c2a2c59e70720332bce7088aef8e8 Author: Eric Dumazet Date: Wed Oct 10 12:29:49 2018 -0700 inet: frags: change inet_frags_init_net() return value We will soon initialize one rhashtable per struct netns_frags in inet_frags_init_net(). This patch changes the return value to eventually propagate an error. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 787bea7748a76130566f881c2342a0be4127d182) Signed-off-by: Greg Kroah-Hartman commit 6b2f36b94a0d57f5378742d482f4e46ae0acd78a Author: Eric Dumazet Date: Tue Oct 2 12:35:05 2018 -0700 inet: make sure to grab rcu_read_lock before using ireq->ireq_opt [ Upstream commit 2ab2ddd301a22ca3c5f0b743593e4ad2953dfa53 ] Timer handlers do not imply rcu_read_lock(), so my recent fix triggered a LOCKDEP warning when SYNACK is retransmit. Lets add rcu_read_lock()/rcu_read_unlock() pairs around ireq->ireq_opt usages instead of guessing what is done by callers, since it is not worth the pain. Get rid of ireq_opt_deref() helper since it hides the logic without real benefit, since it is now a standard rcu_dereference(). Fixes: 1ad98e9d1bdf ("tcp/dccp: fix lockdep issue when SYN is backlogged") Signed-off-by: Eric Dumazet Reported-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4cded0a3a9c69d3c08b2ab15a7dd8206775142e4 Author: Eric Dumazet Date: Mon Oct 1 15:02:26 2018 -0700 tcp/dccp: fix lockdep issue when SYN is backlogged [ Upstream commit 1ad98e9d1bdf4724c0a8532fabd84bf3c457c2bc ] In normal SYN processing, packets are handled without listener lock and in RCU protected ingress path. But syzkaller is known to be able to trick us and SYN packets might be processed in process context, after being queued into socket backlog. In commit 06f877d613be ("tcp/dccp: fix other lockdep splats accessing ireq_opt") I made a very stupid fix, that happened to work mostly because of the regular path being RCU protected. Really the thing protecting ireq->ireq_opt is RCU read lock, and the pseudo request refcnt is not relevant. This patch extends what I did in commit 449809a66c1d ("tcp/dccp: block BH for SYN processing") by adding an extra rcu_read_{lock|unlock} pair in the paths that might be taken when processing SYN from socket backlog (thus possibly in process context) Fixes: 06f877d613be ("tcp/dccp: fix other lockdep splats accessing ireq_opt") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 069bb7fd6a27487e1ab6915faff1926464a8e275 Author: Eric Dumazet Date: Tue Oct 2 15:47:35 2018 -0700 rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096 [ Upstream commit 0e1d6eca5113858ed2caea61a5adc03c595f6096 ] We have an impressive number of syzkaller bugs that are linked to the fact that syzbot was able to create a networking device with millions of TX (or RX) queues. Let's limit the number of RX/TX queues to 4096, this really should cover all known cases. A separate patch will add various cond_resched() in the loops handling sysfs entries at device creation and dismantle. Tested: lpaa6:~# ip link add gre-4097 numtxqueues 4097 numrxqueues 4097 type ip6gretap RTNETLINK answers: Invalid argument lpaa6:~# time ip link add gre-4096 numtxqueues 4096 numrxqueues 4096 type ip6gretap real 0m0.180s user 0m0.000s sys 0m0.107s Fixes: 76ff5cc91935 ("rtnl: allow to specify number of rx and tx queues on device creation") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c3291efb53cb6a374f7bf8de3298cfac0072e97b Author: Florian Fainelli Date: Tue Oct 2 16:52:03 2018 -0700 net: systemport: Fix wake-up interrupt race during resume [ Upstream commit 45ec318578c0c22a11f5b9927d064418e1ab1905 ] The AON_PM_L2 is normally used to trigger and identify the source of a wake-up event. Since the RX_SYS clock is no longer turned off, we also have an interrupt being sent to the SYSTEMPORT INTRL_2_0 controller, and that interrupt remains active up until the magic packet detector is disabled which happens much later during the driver resumption. The race happens if we have a CPU that is entering the SYSTEMPORT INTRL2_0 handler during resume, and another CPU has managed to clear the wake-up interrupt during bcm_sysport_resume_from_wol(). In that case, we have the first CPU stuck in the interrupt handler with an interrupt cause that has been cleared under its feet, and so we keep returning IRQ_NONE and we never make any progress. This was not a problem before because we would always turn off the RX_SYS clock during WoL, so the SYSTEMPORT INTRL2_0 would also be turned off as well, thus not latching the interrupt. The fix is to make sure we do not enable either the MPD or BRCM_TAG_MATCH interrupts since those are redundant with what the AON_PM_L2 interrupt controller already processes and they would cause such a race to occur. Fixes: bb9051a2b230 ("net: systemport: Add support for WAKE_FILTER") Fixes: 83e82f4c706b ("net: systemport: add Wake-on-LAN support") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 353825f2eae29d26ef38fe154a8de63405dddba6 Author: Maxime Chevallier Date: Fri Oct 5 09:04:40 2018 +0200 net: mvpp2: Extract the correct ethtype from the skb for tx csum offload [ Upstream commit 35f3625c21852ad839f20c91c7d81c4c1101e207 ] When offloading the L3 and L4 csum computation on TX, we need to extract the l3_proto from the ethtype, independently of the presence of a vlan tag. The actual driver uses skb->protocol as-is, resulting in packets with the wrong L4 checksum being sent when there's a vlan tag in the packet header and checksum offloading is enabled. This commit makes use of vlan_protocol_get() to get the correct ethtype regardless the presence of a vlan tag. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Maxime Chevallier Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7f10b7e254b3e30bcac471de29e562aff4e22c60 Author: Florian Fainelli Date: Tue Oct 9 16:48:57 2018 -0700 net: dsa: bcm_sf2: Fix unbind ordering [ Upstream commit bf3b452b7af787b8bf27de6490dc4eedf6f97599 ] The order in which we release resources is unfortunately leading to bus errors while dismantling the port. This is because we set priv->wol_ports_mask to 0 to tell bcm_sf2_sw_suspend() that it is now permissible to clock gate the switch. Later on, when dsa_slave_destroy() comes in from dsa_unregister_switch() and calls dsa_switch_ops::port_disable, we perform the same dismantling again, and this time we hit registers that are clock gated. Make sure that dsa_unregister_switch() is the first thing that happens, which takes care of releasing all user visible resources, then proceed with clock gating hardware. We still need to set priv->wol_ports_mask to 0 to make sure that an enabled port properly gets disabled in case it was previously used as part of Wake-on-LAN. Fixes: d9338023fb8e ("net: dsa: bcm_sf2: Make it a real platform device driver") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 68d6d3723acdeb828e93ebc5b5c03313d9ee2856 Author: Ido Schimmel Date: Mon Oct 1 12:21:59 2018 +0300 team: Forbid enslaving team device to itself [ Upstream commit 471b83bd8bbe4e89743683ef8ecb78f7029d8288 ] team's ndo_add_slave() acquires 'team->lock' and later tries to open the newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event that causes the VLAN driver to add VLAN 0 on the team device. team's ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and deadlock. Fix this by checking early at the enslavement function that a team device is not being enslaved to itself. A similar check was added to the bond driver in commit 09a89c219baf ("bonding: disallow enslaving a bond to itself"). WARNING: possible recursive locking detected 4.18.0-rc7+ #176 Not tainted -------------------------------------------- syz-executor4/6391 is trying to acquire lock: (____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868 but task is already holding lock: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&team->lock); lock(&team->lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by syz-executor4/6391: #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline] #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4662 #1: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947 stack backtrace: CPU: 1 PID: 6391 Comm: syz-executor4 Not tainted 4.18.0-rc7+ #176 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline] check_deadlock kernel/locking/lockdep.c:1809 [inline] validate_chain kernel/locking/lockdep.c:2405 [inline] __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 __mutex_lock_common kernel/locking/mutex.c:757 [inline] __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909 team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868 vlan_add_rx_filter_info+0x14a/0x1d0 net/8021q/vlan_core.c:210 __vlan_vid_add net/8021q/vlan_core.c:278 [inline] vlan_vid_add+0x63e/0x9d0 net/8021q/vlan_core.c:308 vlan_device_event.cold.12+0x2a/0x2f net/8021q/vlan.c:381 notifier_call_chain+0x180/0x390 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735 call_netdevice_notifiers net/core/dev.c:1753 [inline] dev_open+0x173/0x1b0 net/core/dev.c:1433 team_port_add drivers/net/team/team.c:1219 [inline] team_add_slave+0xa8b/0x1c30 drivers/net/team/team.c:1948 do_set_master+0x1c9/0x220 net/core/rtnetlink.c:2248 do_setlink+0xba4/0x3e10 net/core/rtnetlink.c:2382 rtnl_setlink+0x2a9/0x400 net/core/rtnetlink.c:2636 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4665 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2455 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4683 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343 netlink_sendmsg+0xa18/0xfd0 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:642 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:652 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2126 __sys_sendmsg+0x11d/0x290 net/socket.c:2164 __do_sys_sendmsg net/socket.c:2173 [inline] __se_sys_sendmsg net/socket.c:2171 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2171 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x456b29 Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f9706bf8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f9706bf96d4 RCX: 0000000000456b29 RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004 RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000004d3548 R14: 00000000004c8227 R15: 0000000000000000 Fixes: 87002b03baab ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls") Signed-off-by: Ido Schimmel Reported-and-tested-by: syzbot+bd051aba086537515cdb@syzkaller.appspotmail.com Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fef73013733fbb5b9635b7bb547ffac26f00ade6 Author: Giacinto Cifelli Date: Wed Oct 10 20:05:53 2018 +0200 qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface [ Upstream commit 4f7617705bfff84d756fe4401a1f4f032f374984 ] Added support for Gemalto's Cinterion ALASxx WWAN interfaces by adding QMI_FIXED_INTF with Cinterion's VID and PID. Signed-off-by: Giacinto Cifelli Acked-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 902730f1ad06b845d9787002a0414787c9a8c21e Author: Shahed Shaikh Date: Wed Sep 26 12:41:10 2018 -0700 qlcnic: fix Tx descriptor corruption on 82xx devices [ Upstream commit c333fa0c4f220f8f7ea5acd6b0ebf3bf13fd684d ] In regular NIC transmission flow, driver always configures MAC using Tx queue zero descriptor as a part of MAC learning flow. But with multi Tx queue supported NIC, regular transmission can occur on any non-zero Tx queue and from that context it uses Tx queue zero descriptor to configure MAC, at the same time TX queue zero could be used by another CPU for regular transmission which could lead to Tx queue zero descriptor corruption and cause FW abort. This patch fixes this in such a way that driver always configures learned MAC address from the same Tx queue which is used for regular transmission. Fixes: 7e2cf4feba05 ("qlcnic: change driver hardware interface mechanism") Signed-off-by: Shahed Shaikh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 91c41e3f77fb4242538796570731255063f72782 Author: Yu Zhao Date: Fri Sep 28 17:04:30 2018 -0600 net/usb: cancel pending work when unbinding smsc75xx [ Upstream commit f7b2a56e1f3dcbdb4cf09b2b63e859ffe0e09df8 ] Cancel pending work before freeing smsc75xx private data structure during binding. This fixes the following crash in the driver: BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 IP: mutex_lock+0x2b/0x3f Workqueue: events smsc75xx_deferred_multicast_write [smsc75xx] task: ffff8caa83e85700 task.stack: ffff948b80518000 RIP: 0010:mutex_lock+0x2b/0x3f Call Trace: smsc75xx_deferred_multicast_write+0x40/0x1af [smsc75xx] process_one_work+0x18d/0x2fc worker_thread+0x1a2/0x269 ? pr_cont_work+0x58/0x58 kthread+0xfa/0x10a ? pr_cont_work+0x58/0x58 ? rcu_read_unlock_sched_notrace+0x48/0x48 ret_from_fork+0x22/0x40 Signed-off-by: Yu Zhao Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a7fa1e6b8cb3af02f3e0ab0bdca267fac5bbe60b Author: Sean Tranchetti Date: Thu Sep 20 14:29:45 2018 -0600 netlabel: check for IPV4MASK in addrinfo_get [ Upstream commit f88b4c01b97e09535505cf3c327fdbce55c27f00 ] netlbl_unlabel_addrinfo_get() assumes that if it finds the NLBL_UNLABEL_A_IPV4ADDR attribute, it must also have the NLBL_UNLABEL_A_IPV4MASK attribute as well. However, this is not necessarily the case as the current checks in netlbl_unlabel_staticadd() and friends are not sufficent to enforce this. If passed a netlink message with NLBL_UNLABEL_A_IPV4ADDR, NLBL_UNLABEL_A_IPV6ADDR, and NLBL_UNLABEL_A_IPV6MASK attributes, these functions will all call netlbl_unlabel_addrinfo_get() which will then attempt dereference NULL when fetching the non-existent NLBL_UNLABEL_A_IPV4MASK attribute: Unable to handle kernel NULL pointer dereference at virtual address 0 Process unlab (pid: 31762, stack limit = 0xffffff80502d8000) Call trace: netlbl_unlabel_addrinfo_get+0x44/0xd8 netlbl_unlabel_staticremovedef+0x98/0xe0 genl_rcv_msg+0x354/0x388 netlink_rcv_skb+0xac/0x118 genl_rcv+0x34/0x48 netlink_unicast+0x158/0x1f0 netlink_sendmsg+0x32c/0x338 sock_sendmsg+0x44/0x60 ___sys_sendmsg+0x1d0/0x2a8 __sys_sendmsg+0x64/0xb4 SyS_sendmsg+0x34/0x4c el0_svc_naked+0x34/0x38 Code: 51001149 7100113f 540000a0 f9401508 (79400108) ---[ end trace f6438a488e737143 ]--- Kernel panic - not syncing: Fatal exception Signed-off-by: Sean Tranchetti Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ea14355a006652becbf500167fc02324d3eefb69 Author: Jeff Barnhill <0xeffeff@gmail.com> Date: Fri Sep 21 00:45:27 2018 +0000 net/ipv6: Display all addresses in output of /proc/net/if_inet6 [ Upstream commit 86f9bd1ff61c413a2a251fa736463295e4e24733 ] The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly handle starting/stopping the iteration. The problem is that at some point during the iteration, an overflow is detected and the process is subsequently stopped. The item being shown via seq_printf() when the overflow occurs is not actually shown, though. When start() is subsequently called to resume iterating, it returns the next item, and thus the item that was being processed when the overflow occurred never gets printed. Alter the meaning of the private data member "offset". Currently, when it is not 0 (which only happens at the very beginning), "offset" represents the next hlist item to be printed. After this change, "offset" always represents the current item. This is also consistent with the private data member "bucket", which represents the current bucket, and also the use of "pos" as defined in seq_file.txt: The pos passed to start() will always be either zero, or the most recent pos used in the previous session. Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com> Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8d59c3a6376bbc6dd3f7303968a719ffba75a4f1 Author: Sabrina Dubroca Date: Tue Oct 9 17:48:14 2018 +0200 net: ipv4: update fnhe_pmtu when first hop's MTU changes [ Upstream commit af7d6cce53694a88d6a1bb60c9a239a6a5144459 ] Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Signed-off-by: Sabrina Dubroca Reviewed-by: Stefano Brivio Reviewed-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit af26ef11071ad55aa63cc152cbe95bd7ded49cfe Author: Yunsheng Lin Date: Tue Sep 25 10:21:55 2018 +0100 net: hns: fix for unmapping problem when SMMU is on [ Upstream commit 2e9361efa707e186d91b938e44f9e326725259f7 ] If SMMU is on, there is more likely that skb_shinfo(skb)->frags[i] can not send by a single BD. when this happen, the hns_nic_net_xmit_hw function map the whole data in a frags using skb_frag_dma_map, but unmap each BD' data individually when tx is done, which causes problem when SMMU is on. This patch fixes this problem by ummapping the whole data in a frags when tx is done. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li Reviewed-by: Yisen Zhuang Signed-off-by: Salil Mehta Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d7720f5c2960ae8c2150861687e28482de9e9cb3 Author: Florian Fainelli Date: Tue Oct 9 16:48:58 2018 -0700 net: dsa: bcm_sf2: Call setup during switch resume [ Upstream commit 54baca096386d862d19c10f58f34bf787c6b3cbe ] There is no reason to open code what the switch setup function does, in fact, because we just issued a switch reset, we would make all the register get their default values, including for instance, having unused port be enabled again and wasting power and leading to an inappropriate switch core clock being selected. Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4ceb29cf0ee1ce055d43029b9c86692e206424c1 Author: Wei Wang Date: Thu Oct 4 10:12:37 2018 -0700 ipv6: take rcu lock in rawv6_send_hdrinc() [ Upstream commit a688caa34beb2fd2a92f1b6d33e40cde433ba160 ] In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we directly assign the dst to skb and set passed in dst to NULL to avoid double free. However, in error case, we free skb and then do stats update with the dst pointer passed in. This causes use-after-free on the dst. Fix it by taking rcu read lock right before dst could get released to make sure dst does not get freed until the stats update is done. Note: we don't have this issue in ipv4 cause dst is not used for stats update in v4. Syzkaller reported following crash: BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline] BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921 Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088 CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 rawv6_send_hdrinc net/ipv6/raw.c:692 [inline] rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114 __sys_sendmsg+0x11d/0x280 net/socket.c:2152 __do_sys_sendmsg net/socket.c:2161 [inline] __se_sys_sendmsg net/socket.c:2159 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457099 Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099 RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004 RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000 Allocated by task 32088: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554 dst_alloc+0xbb/0x1d0 net/core/dst.c:105 ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353 ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186 ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093 fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121 ip6_route_output include/net/ip6_route.h:88 [inline] ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079 rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114 __sys_sendmsg+0x11d/0x280 net/socket.c:2152 __do_sys_sendmsg net/socket.c:2161 [inline] __se_sys_sendmsg net/socket.c:2159 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 5356: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x83/0x290 mm/slab.c:3756 dst_destroy+0x267/0x3c0 net/core/dst.c:141 dst_destroy_rcu+0x16/0x19 net/core/dst.c:154 __rcu_reclaim kernel/rcu/rcu.h:236 [inline] rcu_do_batch kernel/rcu/tree.c:2576 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline] __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline] rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864 __do_softirq+0x30b/0xad8 kernel/softirq.c:292 Fixes: 1789a640f556 ("raw: avoid two atomics in xmit") Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c476a441f873729e4b8b63d73b6f5b7b37f39e42 Author: Eric Dumazet Date: Sun Sep 30 11:33:39 2018 -0700 ipv4: fix use-after-free in ip_cmsg_recv_dstaddr() [ Upstream commit 64199fc0a46ba211362472f7f942f900af9492fd ] Caching ip_hdr(skb) before a call to pskb_may_pull() is buggy, do not do it. Fixes: 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull") Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Reported-by: syzbot Acked-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2b7b2c4218a84e5b4143a275441459e2e6912529 Author: Paolo Abeni Date: Mon Sep 24 15:48:19 2018 +0200 ip_tunnel: be careful when accessing the inner header [ Upstream commit ccfec9e5cb2d48df5a955b7bf47f7782157d3bc2] Cong noted that we need the same checks introduced by commit 76c0ddd8c3a6 ("ip6_tunnel: be careful when accessing the inner header") even for ipv4 tunnels. Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Suggested-by: Cong Wang Signed-off-by: Paolo Abeni Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 22c1e77811aeda092e3edcba3e421420097e0341 Author: Paolo Abeni Date: Wed Sep 19 15:02:07 2018 +0200 ip6_tunnel: be careful when accessing the inner header [ Upstream commit 76c0ddd8c3a683f6e2c6e60e11dc1a1558caf4bc ] the ip6 tunnel xmit ndo assumes that the processed skb always contains an ip[v6] header, but syzbot has found a way to send frames that fall short of this assumption, leading to the following splat: BUG: KMSAN: uninit-value in ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline] BUG: KMSAN: uninit-value in ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390 CPU: 0 PID: 4504 Comm: syz-executor558 Not tainted 4.16.0+ #87 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683 ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline] ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390 __netdev_start_xmit include/linux/netdevice.h:4066 [inline] netdev_start_xmit include/linux/netdevice.h:4075 [inline] xmit_one net/core/dev.c:3026 [inline] dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042 __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590 packet_snd net/packet/af_packet.c:2944 [inline] packet_sendmsg+0x7c70/0x8a30 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmmsg+0x42d/0x800 net/socket.c:2136 SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167 SyS_sendmmsg+0x63/0x90 net/socket.c:2162 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x441819 RSP: 002b:00007ffe58ee8268 EFLAGS: 00000213 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819 RDX: 0000000000000002 RSI: 0000000020000100 RDI: 0000000000000003 RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000402510 R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234 sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085 packet_alloc_skb net/packet/af_packet.c:2803 [inline] packet_snd net/packet/af_packet.c:2894 [inline] packet_sendmsg+0x6454/0x8a30 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmmsg+0x42d/0x800 net/socket.c:2136 SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167 SyS_sendmmsg+0x63/0x90 net/socket.c:2162 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 This change addresses the issue adding the needed check before accessing the inner header. The ipv4 side of the issue is apparently there since the ipv4 over ipv6 initial support, and the ipv6 side predates git history. Fixes: c4d3efafcc93 ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+3fde91d4d394747d6db4@syzkaller.appspotmail.com Tested-by: Alexander Potapenko Signed-off-by: Paolo Abeni Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 22ad29373a7efa65bf97e644799a8cb9009b9cdb Author: Mahesh Bandewar Date: Mon Sep 24 14:40:11 2018 -0700 bonding: avoid possible dead-lock [ Upstream commit d4859d749aa7090ffb743d15648adb962a1baeae ] Syzkaller reported this on a slightly older kernel but it's still applicable to the current kernel - ====================================================== WARNING: possible circular locking dependency detected 4.18.0-next-20180823+ #46 Not tainted ------------------------------------------------------ syz-executor4/26841 is trying to acquire lock: 00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652 but task is already holding lock: 00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline] 00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (rtnl_mutex){+.+.}: __mutex_lock_common kernel/locking/mutex.c:925 [inline] __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77 bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline] bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320 process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153 worker_thread+0x189/0x13c0 kernel/workqueue.c:2296 kthread+0x35a/0x420 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 -> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}: process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129 worker_thread+0x189/0x13c0 kernel/workqueue.c:2296 kthread+0x35a/0x420 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 -> #0 ((wq_completion)bond_dev->name){+.+.}: lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901 flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655 drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820 destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155 __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138 bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734 register_netdevice+0x337/0x1100 net/core/dev.c:8410 bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453 rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115 __sys_sendmsg+0x11d/0x290 net/socket.c:2153 __do_sys_sendmsg net/socket.c:2162 [inline] __se_sys_sendmsg net/socket.c:2160 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock((work_completion)(&(&nnw->work)->work)); lock(rtnl_mutex); lock((wq_completion)bond_dev->name); *** DEADLOCK *** 1 lock held by syz-executor4/26841: stack backtrace: CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222 check_prev_add kernel/locking/lockdep.c:1862 [inline] check_prevs_add kernel/locking/lockdep.c:1975 [inline] validate_chain kernel/locking/lockdep.c:2416 [inline] __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412 lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901 flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655 drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820 destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155 __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138 bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734 register_netdevice+0x337/0x1100 net/core/dev.c:8410 bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453 rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115 __sys_sendmsg+0x11d/0x290 net/socket.c:2153 __do_sys_sendmsg net/socket.c:2162 [inline] __se_sys_sendmsg net/socket.c:2160 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457089 Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001 Signed-off-by: Mahesh Bandewar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b2be15bb02b961146177d49204de22df3dddd415 Author: Michael Chan Date: Wed Sep 26 00:41:04 2018 -0400 bnxt_en: Fix TX timeout during netpoll. [ Upstream commit 73f21c653f930f438d53eed29b5e4c65c8a0f906 ] The current netpoll implementation in the bnxt_en driver has problems that may miss TX completion events. bnxt_poll_work() in effect is only handling at most 1 TX packet before exiting. In addition, there may be in flight TX completions that ->poll() may miss even after we fix bnxt_poll_work() to handle all visible TX completions. netpoll may not call ->poll() again and HW may not generate IRQ because the driver does not ARM the IRQ when the budget (0 for netpoll) is reached. We fix it by handling all TX completions and to always ARM the IRQ when we exit ->poll() with 0 budget. Also, the logic to ACK the completion ring in case it is almost filled with TX completions need to be adjusted to take care of the 0 budget case, as discussed with Eric Dumazet Reported-by: Song Liu Reviewed-by: Song Liu Tested-by: Song Liu Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d5b33599b3feb88037271be5b0aaaa775bcf2778 Author: Mathias Nyman Date: Mon Feb 12 14:24:47 2018 +0200 xhci: Don't print a warning when setting link state for disabled ports commit 1208d8a84fdcae6b395c57911cdf907450d30e70 upstream. When disabling a USB3 port the hub driver will set the port link state to U3 to prevent "ejected" or "safely removed" devices that are still physically connected from immediately re-enumerating. If the device was really unplugged, then error messages were printed as the hub tries to set the U3 link state for a port that is no longer enabled. xhci-hcd ee000000.usb: Cannot set link state. usb usb8-port1: cannot disable (err = -32) Don't print error message in xhci-hub if hub tries to set port link state for a disabled port. Return -ENODEV instead which also silences hub driver. Signed-off-by: Mathias Nyman Tested-by: Yoshihiro Shimoda Signed-off-by: Ross Zwisler Signed-off-by: Greg Kroah-Hartman commit c97b64bd3571a184893a228738c413512f7d7d64 Author: Edgar Cherkasov Date: Thu Sep 27 11:56:03 2018 +0300 i2c: i2c-scmi: fix for i2c_smbus_write_block_data commit 08d9db00fe0e300d6df976e6c294f974988226dd upstream. The i2c-scmi driver crashes when the SMBus Write Block transaction is executed: WARNING: CPU: 9 PID: 2194 at mm/page_alloc.c:3931 __alloc_pages_slowpath+0x9db/0xec0 Call Trace: ? get_page_from_freelist+0x49d/0x11f0 ? alloc_pages_current+0x6a/0xe0 ? new_slab+0x499/0x690 __alloc_pages_nodemask+0x265/0x280 alloc_pages_current+0x6a/0xe0 kmalloc_order+0x18/0x40 kmalloc_order_trace+0x24/0xb0 ? acpi_ut_allocate_object_desc_dbg+0x62/0x10c __kmalloc+0x203/0x220 acpi_os_allocate_zeroed+0x34/0x36 acpi_ut_copy_eobject_to_iobject+0x266/0x31e acpi_evaluate_object+0x166/0x3b2 acpi_smbus_cmi_access+0x144/0x530 [i2c_scmi] i2c_smbus_xfer+0xda/0x370 i2cdev_ioctl_smbus+0x1bd/0x270 i2cdev_ioctl+0xaa/0x250 do_vfs_ioctl+0xa4/0x600 SyS_ioctl+0x79/0x90 do_syscall_64+0x73/0x130 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 ACPI Error: Evaluating _SBW: 4 (20170831/smbus_cmi-185) This problem occurs because the length of ACPI Buffer object is not defined/initialized in the code before a corresponding ACPI method is called. The obvious patch below fixes this issue. Signed-off-by: Edgar Cherkasov Acked-by: Viktor Krasnov Acked-by: Michael Brunner Signed-off-by: Wolfram Sang Signed-off-by: Greg Kroah-Hartman commit 06dc8b34bb44090734ba095a7420ae7c4c17e47e Author: Adrian Hunter Date: Tue Sep 11 14:45:03 2018 +0300 perf script python: Fix export-to-postgresql.py occasional failure commit 25e11700b54c7b6b5ebfc4361981dae12299557b upstream. Occasional export failures were found to be caused by truncating 64-bit pointers to 32-bits. Fix by explicitly setting types for all ctype arguments and results. Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180911114504.28516-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 9f5095da2d8dd82a85974fbd4318bf8a0f9c0f32 Author: Mikulas Patocka Date: Fri Aug 17 15:19:37 2018 -0400 mach64: detect the dot clock divider correctly on sparc commit 76ebebd2464c5c8a4453c98b6dbf9c95a599e810 upstream. On Sun Ultra 5, it happens that the dot clock is not set up properly for some videomodes. For example, if we set the videomode "r1024x768x60" in the firmware, Linux would incorrectly set a videomode with refresh rate 180Hz when booting (suprisingly, my LCD monitor can display it, although display quality is very low). The reason is this: Older mach64 cards set the divider in the register VCLK_POST_DIV. The register has four 2-bit fields (the field that is actually used is specified in the lowest two bits of the register CLOCK_CNTL). The 2 bits select divider "1, 2, 4, 8". On newer mach64 cards, there's another bit added - the top four bits of PLL_EXT_CNTL extend the divider selection, so we have possible dividers "1, 2, 4, 8, 3, 5, 6, 12". The Linux driver clears the top four bits of PLL_EXT_CNTL and never sets them, so it can work regardless if the card supports them. However, the sparc64 firmware may set these extended dividers during boot - and the mach64 driver detects incorrect dot clock in this case. This patch makes the driver read the additional divider bit from PLL_EXT_CNTL and calculate the initial refresh rate properly. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Acked-by: David S. Miller Reviewed-by: Ville Syrjälä Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 44ccf71e9c13ceebc915daaf7b5645e17e77f47d Author: Jann Horn Date: Fri Oct 5 15:52:03 2018 -0700 mm/vmstat.c: fix outdated vmstat_text commit 28e2c4bb99aa40f9d5f07ac130cbc4da0ea93079 upstream. 7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely") removed the VMACACHE_FULL_FLUSHES statistics, but didn't remove the corresponding entry in vmstat_text. This causes an out-of-bounds access in vmstat_show(). Luckily this only affects kernels with CONFIG_DEBUG_VM_VMACACHE=y, which is probably very rare. Link: http://lkml.kernel.org/r/20181001143138.95119-1-jannh@google.com Fixes: 7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely") Signed-off-by: Jann Horn Reviewed-by: Kees Cook Reviewed-by: Andrew Morton Acked-by: Michal Hocko Acked-by: Roman Gushchin Cc: Davidlohr Bueso Cc: Oleg Nesterov Cc: Christoph Lameter Cc: Kemi Wang Cc: Andy Lutomirski Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit a295ff4bff4c41823e7b33078d8164805cebdda4 Author: Daniel Rosenberg Date: Mon Oct 15 15:10:52 2018 -0700 ext4: Fix error code in ext4_xattr_set_entry() ext4_xattr_set_entry should return EFSCORRUPTED instead of EIO for corrupted xattr entries. Fixes b469713e0c0c ("ext4: add corruption check in ext4_xattr_set_entry()") Signed-off-by: Daniel Rosenberg Signed-off-by: Greg Kroah-Hartman commit 5b48d1a8ccaddc78dd916435ec1a00c67e83628e Author: Amber Lin Date: Wed Sep 12 21:42:18 2018 -0400 drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7 [ Upstream commit caaa4c8a6be2a275bd14f2369ee364978ff74704 ] A wrong register bit was examinated for checking SDMA status so it reports false failures. This typo only appears on gfx_v7. gfx_v8 checks the correct bit. Acked-by: Alex Deucher Signed-off-by: Amber Lin Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 08d2bf58afdeb5dad846423adf7afe3af1a8e279 Author: Vitaly Kuznetsov Date: Thu Aug 2 17:08:16 2018 +0200 x86/kvm/lapic: always disable MMIO interface in x2APIC mode [ Upstream commit d1766202779e81d0f2a94c4650a6ba31497d369d ] When VMX is used with flexpriority disabled (because of no support or if disabled with module parameter) MMIO interface to lAPIC is still available in x2APIC mode while it shouldn't be (kvm-unit-tests): PASS: apic_disable: Local apic enabled in x2APIC mode PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set FAIL: apic_disable: *0xfee00030: 50014 The issue appears because we basically do nothing while switching to x2APIC mode when APIC access page is not used. apic_mmio_{read,write} only check if lAPIC is disabled before proceeding to actual write. When APIC access is virtualized we correctly manipulate with VMX controls in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes in x2APIC mode so there's no issue. Disabling MMIO interface seems to be easy. The question is: what do we do with these reads and writes? If we add apic_x2apic_mode() check to apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will go to userspace. When lAPIC is in kernel, Qemu uses this interface to inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This somehow works with disabled lAPIC but when we're in xAPIC mode we will get a real injected MSI from every write to lAPIC. Not good. The simplest solution seems to be to just ignore writes to the region and return ~0 for all reads when we're in x2APIC mode. This is what this patch does. However, this approach is inconsistent with what currently happens when flexpriority is enabled: we allocate APIC access page and create KVM memory region so in x2APIC modes all reads and writes go to this pre-allocated page which is, btw, the same for all vCPUs. Signed-off-by: Vitaly Kuznetsov Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 3f724feca29a952acaa588a61a353621f9341751 Author: Nicolas Ferre Date: Fri Sep 14 17:48:11 2018 +0200 ARM: dts: at91: add new compatibility string for macb on sama5d3 [ Upstream commit 321cc359d899a8e988f3725d87c18a628e1cc624 ] We need this new compatibility string as we experienced different behavior for this 10/100Mbits/s macb interface on this particular SoC. Backward compatibility is preserved as we keep the alternative strings. Signed-off-by: Nicolas Ferre Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 4947996d9de57d8e6995c7976a1513185388bf68 Author: Nicolas Ferre Date: Fri Sep 14 17:48:10 2018 +0200 net: macb: disable scatter-gather for macb on sama5d3 [ Upstream commit eb4ed8e2d7fecb5f40db38e4498b9ee23cddf196 ] Create a new configuration for the sama5d3-macb new compatibility string. This configuration disables scatter-gather because we experienced lock down of the macb interface of this particular SoC under very high load. Signed-off-by: Nicolas Ferre Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit a2e81ee80f40351cc94642b29fb250c11ef32f35 Author: Jongsung Kim Date: Thu Sep 13 18:32:21 2018 +0900 stmmac: fix valid numbers of unicast filter entries [ Upstream commit edf2ef7242805e53ec2e0841db26e06d8bc7da70 ] Synopsys DWC Ethernet MAC can be configured to have 1..32, 64, or 128 unicast filter entries. (Table 7-8 MAC Address Registers from databook) Fix dwmac1000_validate_ucast_entries() to accept values between 1 and 32 in addition. Signed-off-by: Jongsung Kim Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit f7b72327db318c691fa448c155c2bd78ab76d381 Author: Yu Zhao Date: Tue Sep 11 15:14:04 2018 -0600 sound: enable interrupt after dma buffer initialization [ Upstream commit b61749a89f826eb61fc59794d9e4697bd246eb61 ] In snd_hdac_bus_init_chip(), we enable interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers. If irq has been acquired and irq handler uses the dma buffer, kernel may crash when interrupt comes in. Fix the problem by postponing enabling irq after dma buffer initialization. And warn once on null dma buffer pointer during the initialization. Reviewed-by: Takashi Iwai Signed-off-by: Yu Zhao Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit fccf159614b29580b4744b2dbd4ae65cb49e9f60 Author: Dan Carpenter Date: Sat Sep 8 11:42:27 2018 +0300 scsi: qla2xxx: Fix an endian bug in fcpcmd_is_corrupted() [ Upstream commit cbe3fd39d223f14b1c60c80fe9347a3dd08c2edb ] We should first do the le16_to_cpu endian conversion and then apply the FCP_CMD_LENGTH_MASK mask. Fixes: 5f35509db179 ("qla2xxx: Terminate exchange if corrupted") Signed-off-by: Dan Carpenter Acked-by: Quinn Tran Acked-by: Himanshu Madhani Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 1a5d197564ba649df02a9ec472352917cf6f1072 Author: Laura Abbott Date: Tue Sep 4 11:47:40 2018 -0700 scsi: iscsi: target: Don't use stack buffer for scatterlist [ Upstream commit 679fcae46c8b2352bba3485d521da070cfbe68e6 ] Fedora got a bug report of a crash with iSCSI: kernel BUG at include/linux/scatterlist.h:143! ... RIP: 0010:iscsit_do_crypto_hash_buf+0x154/0x180 [iscsi_target_mod] ... Call Trace: ? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod] iscsit_get_rx_pdu+0x4cd/0xa90 [iscsi_target_mod] ? native_sched_clock+0x3e/0xa0 ? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod] iscsi_target_rx_thread+0x81/0xf0 [iscsi_target_mod] kthread+0x120/0x140 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x3a/0x50 This is a BUG_ON for using a stack buffer with a scatterlist. There are two cases that trigger this bug. Switch to using a dynamically allocated buffer for one case and do not assign a NULL buffer in another case. Signed-off-by: Laura Abbott Reviewed-by: Mike Christie Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 8e123335850579ae88894556a43264683d563420 Author: Tony Lindgren Date: Wed Apr 25 07:29:22 2018 -0700 mfd: omap-usb-host: Fix dts probe of children [ Upstream commit 10492ee8ed9188d6d420e1f79b2b9bdbc0624e65 ] It currently only works if the parent bus uses "simple-bus". We currently try to probe children with non-existing compatible values. And we're missing .probe. I noticed this while testing devices configured to probe using ti-sysc interconnect target module driver. For that we also may want to rebind the driver, so let's remove __init and __exit. Signed-off-by: Tony Lindgren Acked-by: Roger Quadros Signed-off-by: Lee Jones Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 7c73f0df4387d44545aec11c9d798396a9bd6929 Author: Lei Yang Date: Wed Sep 5 17:57:15 2018 +0800 selftests: memory-hotplug: add required configs [ Upstream commit 4d85af102a66ee6aeefa596f273169e77fb2b48e ] add CONFIG_MEMORY_HOTREMOVE=y in config without this config, /sys/devices/system/memory/memory*/removable always return 0, I endup getting an early skip during test Signed-off-by: Lei Yang Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit b5be6fb6ad4dff080f342f2ab0d30522a80d9612 Author: Lei Yang Date: Wed Sep 5 11:14:49 2018 +0800 selftests/efivarfs: add required kernel configs [ Upstream commit 53cf59d6c0ad3edc4f4449098706a8f8986258b6 ] add config file Signed-off-by: Lei Yang Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 92110f1281e7819340756459e6939e7e65fd02e3 Author: Danny Smith Date: Thu Aug 23 10:26:20 2018 +0200 ASoC: sigmadsp: safeload should not have lower byte limit [ Upstream commit 5ea752c6efdf5aa8a57aed816d453a8f479f1b0a ] Fixed range in safeload conditional to allow safeload to up to 20 bytes, without a lower limit. Signed-off-by: Danny Smith Acked-by: Lars-Peter Clausen Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 931c5e88ed7381e064c4ed9d16d80ef302b68fa3 Author: Pierre-Louis Bossart Date: Wed Aug 22 22:49:36 2018 -0500 ASoC: wm8804: Add ACPI support [ Upstream commit 960cdd50ca9fdfeb82c2757107bcb7f93c8d7d41 ] HID made of either Wolfson/CirrusLogic PCI ID + 8804 identifier. This helps enumerate the HifiBerry Digi+ HAT boards on the Up2 platform. The scripts at https://github.com/thesofproject/acpi-scripts can be used to add the ACPI initrd overlays. Signed-off-by: Pierre-Louis Bossart Acked-by: Charles Keepax Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman