commit ecc160ece609498c946e73710e5c7c54c62b966a Author: Greg Kroah-Hartman Date: Sun Jul 22 14:28:52 2018 +0200 Linux 4.14.57 commit 779128d80cb01e6434936e13754fc25a1cc30929 Author: Tejun Heo Date: Tue Jan 9 07:21:15 2018 -0800 string: drop __must_check from strscpy() and restore strscpy() usages in cgroup commit 08a77676f9c5fc69a681ccd2cd8140e65dcb26c7 upstream. e7fd37ba1217 ("cgroup: avoid copying strings longer than the buffers") converted possibly unsafe strncpy() usages in cgroup to strscpy(). However, although the callsites are completely fine with truncated copied, because strscpy() is marked __must_check, it led to the following warnings. kernel/cgroup/cgroup.c: In function ‘cgroup_file_name’: kernel/cgroup/cgroup.c:1400:10: warning: ignoring return value of ‘strscpy’, declared with attribute warn_unused_result [-Wunused-result] strscpy(buf, cft->name, CGROUP_FILE_NAME_MAX); ^ To avoid the warnings, 50034ed49645 ("cgroup: use strlcpy() instead of strscpy() to avoid spurious warning") switched them to strlcpy(). strlcpy() is worse than strlcpy() because it unconditionally runs strlen() on the source string, and the only reason we switched to strlcpy() here was because it was lacking __must_check, which doesn't reflect any material differences between the two function. It's just that someone added __must_check to strscpy() and not to strlcpy(). These basic string copy operations are used in variety of ways, and one of not-so-uncommon use cases is safely handling truncated copies, where the caller naturally doesn't care about the return value. The __must_check doesn't match the actual use cases and forces users to opt for inferior variants which lack __must_check by happenstance or spread ugly (void) casts. Remove __must_check from strscpy() and restore strscpy() usages in cgroup. Signed-off-by: Tejun Heo Suggested-by: Linus Torvalds Cc: Ma Shimiao Cc: Arnd Bergmann Cc: Chris Metcalf [backport only the string.h portion to remove build warnings starting to show up - gregkh] Signed-off-by: Greg Kroah-Hartman commit 96fd60c8160c16dd16542efe4491f0d096d5c1b4 Author: Marc Zyngier Date: Fri Jul 20 10:53:12 2018 +0100 arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID commit 5d81f7dc9bca4f4963092433e27b508cbe524a32 upstream. Now that all our infrastructure is in place, let's expose the availability of ARCH_WORKAROUND_2 to guests. We take this opportunity to tidy up a couple of SMCCC constants. Acked-by: Christoffer Dall Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 1b749f8a241639951a9c65e64f51e7a6ac5b6ae9 Author: Marc Zyngier Date: Fri Jul 20 10:53:11 2018 +0100 arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests commit b4f18c063a13dfb33e3a63fe1844823e19c2265e upstream. In order to forward the guest's ARCH_WORKAROUND_2 calls to EL3, add a small(-ish) sequence to handle it at EL2. Special care must be taken to track the state of the guest itself by updating the workaround flags. We also rely on patching to enable calls into the firmware. Note that since we need to execute branches, this always executes after the Spectre-v2 mitigation has been applied. Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 805357aa65bbc419451fb9556968687c7fd1d2e1 Author: Marc Zyngier Date: Fri Jul 20 10:53:10 2018 +0100 arm64: KVM: Add ARCH_WORKAROUND_2 support for guests commit 55e3748e8902ff641e334226bdcb432f9a5d78d3 upstream. In order to offer ARCH_WORKAROUND_2 support to guests, we need a bit of infrastructure. Let's add a flag indicating whether or not the guest uses SSBD mitigation. Depending on the state of this flag, allow KVM to disable ARCH_WORKAROUND_2 before entering the guest, and enable it when exiting it. Reviewed-by: Christoffer Dall Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 0592871918f0acf369cec5c88de1fa76dbf2c1a2 Author: Marc Zyngier Date: Fri Jul 20 10:53:09 2018 +0100 arm64: KVM: Add HYP per-cpu accessors commit 85478bab409171de501b719971fd25a3d5d639f9 upstream. As we're going to require to access per-cpu variables at EL2, let's craft the minimum set of accessors required to implement reading a per-cpu variable, relying on tpidr_el2 to contain the per-cpu offset. Reviewed-by: Christoffer Dall Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit b769d86ea9d4b2d77387791cc747462b19e8af22 Author: Marc Zyngier Date: Fri Jul 20 10:53:08 2018 +0100 arm64: ssbd: Add prctl interface for per-thread mitigation commit 9cdc0108baa8ef87c76ed834619886a46bd70cbe upstream. If running on a system that performs dynamic SSBD mitigation, allow userspace to request the mitigation for itself. This is implemented as a prctl call, allowing the mitigation to be enabled or disabled at will for this particular thread. Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit e7d02797288f12ca257cdab49d19ccf9b3d2cb12 Author: Marc Zyngier Date: Fri Jul 20 10:53:07 2018 +0100 arm64: ssbd: Introduce thread flag to control userspace mitigation commit 9dd9614f5476687abbff8d4b12cd08ae70d7c2ad upstream. In order to allow userspace to be mitigated on demand, let's introduce a new thread flag that prevents the mitigation from being turned off when exiting to userspace, and doesn't turn it on on entry into the kernel (with the assumption that the mitigation is always enabled in the kernel itself). This will be used by a prctl interface introduced in a later patch. Reviewed-by: Mark Rutland Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit c5c89bb4deb8e1f0eed0968f37dfa936f6b5e4c1 Author: Marc Zyngier Date: Fri Jul 20 10:53:06 2018 +0100 arm64: ssbd: Restore mitigation status on CPU resume commit 647d0519b53f440a55df163de21c52a8205431cc upstream. On a system where firmware can dynamically change the state of the mitigation, the CPU will always come up with the mitigation enabled, including when coming back from suspend. If the user has requested "no mitigation" via a command line option, let's enforce it by calling into the firmware again to disable it. Similarily, for a resume from hibernate, the mitigation could have been disabled by the boot kernel. Let's ensure that it is set back on in that case. Acked-by: Will Deacon Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 02e26bd9ad58197533a7d5fcd62f975891a9e936 Author: Marc Zyngier Date: Fri Jul 20 10:53:05 2018 +0100 arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation commit 986372c4367f46b34a3c0f6918d7fb95cbdf39d6 upstream. In order to avoid checking arm64_ssbd_callback_required on each kernel entry/exit even if no mitigation is required, let's add yet another alternative that by default jumps over the mitigation, and that gets nop'ed out if we're doing dynamic mitigation. Think of it as a poor man's static key... Reviewed-by: Julien Grall Reviewed-by: Mark Rutland Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 8d6907af4583eec62e4eb7f839da3c33d570556e Author: Marc Zyngier Date: Fri Jul 20 10:53:04 2018 +0100 arm64: ssbd: Add global mitigation state accessor commit c32e1736ca03904c03de0e4459a673be194f56fd upstream. We're about to need the mitigation state in various parts of the kernel in order to do the right thing for userspace and guests. Let's expose an accessor that will let other subsystems know about the state. Reviewed-by: Julien Grall Reviewed-by: Mark Rutland Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 45808ab2f9245c2921b6e81025fde46ae6325013 Author: Marc Zyngier Date: Fri Jul 20 10:53:03 2018 +0100 arm64: Add 'ssbd' command-line option commit a43ae4dfe56a01f5b98ba0cb2f784b6a43bafcc6 upstream. On a system where the firmware implements ARCH_WORKAROUND_2, it may be useful to either permanently enable or disable the workaround for cases where the user decides that they'd rather not get a trap overhead, and keep the mitigation permanently on or off instead of switching it on exception entry/exit. In any case, default to the mitigation being enabled. Reviewed-by: Julien Grall Reviewed-by: Mark Rutland Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 837c87c233c1cbf53c8576c2c1a269dce8d68d65 Author: Marc Zyngier Date: Fri Jul 20 10:53:02 2018 +0100 arm64: Add ARCH_WORKAROUND_2 probing commit a725e3dda1813ed306734823ac4c65ca04e38500 upstream. As for Spectre variant-2, we rely on SMCCC 1.1 to provide the discovery mechanism for detecting the SSBD mitigation. A new capability is also allocated for that purpose, and a config option. Reviewed-by: Julien Grall Reviewed-by: Mark Rutland Acked-by: Will Deacon Reviewed-by: Suzuki K Poulose Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 1bffd48690119d79468355ebcf992475beefb0b5 Author: Marc Zyngier Date: Fri Jul 20 10:53:01 2018 +0100 arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2 commit 5cf9ce6e5ea50f805c6188c04ed0daaec7b6887d upstream. In a heterogeneous system, we can end up with both affected and unaffected CPUs. Let's check their status before calling into the firmware. Reviewed-by: Julien Grall Reviewed-by: Mark Rutland Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 5ad09d2abb5a0e4193b81c147efc828e13caf2ca Author: Marc Zyngier Date: Fri Jul 20 10:53:00 2018 +0100 arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1 commit 8e2906245f1e3b0d027169d9f2e55ce0548cb96e upstream. In order for the kernel to protect itself, let's call the SSBD mitigation implemented by the higher exception level (either hypervisor or firmware) on each transition between userspace and kernel. We must take the PSCI conduit into account in order to target the right exception level, hence the introduction of a runtime patching callback. Reviewed-by: Mark Rutland Reviewed-by: Julien Grall Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 1de2719134b5fe1156d04ef2a31a1b8b2307d3ba Author: Marc Zyngier Date: Fri Jul 20 10:52:59 2018 +0100 arm/arm64: smccc: Add SMCCC-specific return codes commit eff0e9e1078ea7dc1d794dc50e31baef984c46d7 upstream. We've so far used the PSCI return codes for SMCCC because they were extremely similar. But with the new ARM DEN 0070A specification, "NOT_REQUIRED" (-2) is clashing with PSCI's "PSCI_RET_INVALID_PARAMS". Let's bite the bullet and add SMCCC specific return codes. Users can be repainted as and when required. Acked-by: Will Deacon Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 2cdc2e62a6ac829832c2bf4ccb1098fccc67f82c Author: Christoffer Dall Date: Fri Jul 20 10:52:58 2018 +0100 KVM: arm64: Avoid storing the vcpu pointer on the stack Commit 4464e210de9e80e38de59df052fe09ea2ff80b1b upstream. We already have the percpu area for the host cpu state, which points to the VCPU, so there's no need to store the VCPU pointer on the stack on every context switch. We can be a little more clever and just use tpidr_el2 for the percpu offset and load the VCPU pointer from the host context. This has the benefit of being able to retrieve the host context even when our stack is corrupted, and it has a potential performance benefit because we trade a store plus a load for an mrs and a load on a round trip to the guest. This does require us to calculate the percpu offset without including the offset from the kernel mapping of the percpu array to the linear mapping of the array (which is what we store in tpidr_el1), because a PC-relative generated address in EL2 is already giving us the hyp alias of the linear mapping of a kernel address. We do this in __cpu_init_hyp_mode() by using kvm_ksym_ref(). The code that accesses ESR_EL2 was previously using an alternative to use the _EL1 accessor on VHE systems, but this was actually unnecessary as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2 accessor does the same thing on both systems. Cc: Ard Biesheuvel Reviewed-by: Marc Zyngier Reviewed-by: Andrew Jones Signed-off-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit dca7815605aff032d0b7f9c4f1d98af0e529cdee Author: Marc Zyngier Date: Fri Jul 20 10:52:57 2018 +0100 KVM: arm/arm64: Do not use kern_hyp_va() with kvm_vgic_global_state Commit 44a497abd621a71c645f06d3d545ae2f46448830 upstream. kvm_vgic_global_state is part of the read-only section, and is usually accessed using a PC-relative address generation (adrp + add). It is thus useless to use kern_hyp_va() on it, and actively problematic if kern_hyp_va() becomes non-idempotent. On the other hand, there is no way that the compiler is going to guarantee that such access is always PC relative. So let's bite the bullet and provide our own accessor. Acked-by: Catalin Marinas Reviewed-by: James Morse Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit e77175fafa7dd941873dc248ab0f20aa56fe3431 Author: Marc Zyngier Date: Fri Jul 20 10:52:56 2018 +0100 arm64: alternatives: Add dynamic patching feature Commit dea5e2a4c5bcf196f879a66cebdcca07793e8ba4 upstream. We've so far relied on a patching infrastructure that only gave us a single alternative, without any way to provide a range of potential replacement instructions. For a single feature, this is an all or nothing thing. It would be interesting to have a more flexible grained way of patching the kernel though, where we could dynamically tune the code that gets injected. In order to achive this, let's introduce a new form of dynamic patching, assiciating a callback to a patching site. This callback gets source and target locations of the patching request, as well as the number of instructions to be patched. Dynamic patching is declared with the new ALTERNATIVE_CB and alternative_cb directives: asm volatile(ALTERNATIVE_CB("mov %0, #0\n", callback) : "r" (v)); or alternative_cb callback mov x0, #0 alternative_cb_end where callback is the C function computing the alternative. Reviewed-by: Christoffer Dall Reviewed-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 286950e0831b9bf7180c09b2dd06ec88922fcc12 Author: James Morse Date: Fri Jul 20 10:52:55 2018 +0100 KVM: arm64: Stop save/restoring host tpidr_el1 on VHE Commit 1f742679c33bc083722cb0b442a95d458c491b56 upstream. Now that a VHE host uses tpidr_el2 for the cpu offset we no longer need KVM to save/restore tpidr_el1. Move this from the 'common' code into the non-vhe code. While we're at it, on VHE we don't need to save the ELR or SPSR as kernel_entry in entry.S will have pushed these onto the kernel stack, and will restore them from there. Move these to the non-vhe code as we need them to get back to the host. Finally remove the always-copy-tpidr we hid in the stage2 setup code, cpufeature's enable callback will do this for VHE, we only need KVM to do it for non-vhe. Add the copy into kvm-init instead. Signed-off-by: James Morse Reviewed-by: Christoffer Dall Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 0dac9f10d9525f9617eaf383e1629ae976e2f170 Author: James Morse Date: Fri Jul 20 10:52:54 2018 +0100 arm64: alternatives: use tpidr_el2 on VHE hosts Commit 6d99b68933fbcf51f84fcbba49246ce1209ec193 upstream. Now that KVM uses tpidr_el2 in the same way as Linux's cpu_offset in tpidr_el1, merge the two. This saves KVM from save/restoring tpidr_el1 on VHE hosts, and allows future code to blindly access per-cpu variables without triggering world-switch. Signed-off-by: James Morse Reviewed-by: Christoffer Dall Reviewed-by: Catalin Marinas Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 6256b86e851942f4ec145d36a0c8f3673390b9d9 Author: James Morse Date: Fri Jul 20 10:52:53 2018 +0100 KVM: arm64: Change hyp_panic()s dependency on tpidr_el2 Commit c97e166e54b662717d20ec2e36761758d2b6a7c2 upstream. Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and on VHE they can be the same register. KVM calls hyp_panic() when anything unexpected happens. This may occur while a guest owns the EL1 registers. KVM stashes the vcpu pointer in tpidr_el2, which it uses to find the host context in order to restore the host EL1 registers before parachuting into the host's panic(). The host context is a struct kvm_cpu_context allocated in the per-cpu area, and mapped to hyp. Given the per-cpu offset for this CPU, this is easy to find. Change hyp_panic() to take a pointer to the struct kvm_cpu_context. Wrap these calls with an asm function that retrieves the struct kvm_cpu_context from the host's per-cpu area. Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during kvm init. (Later patches will make this unnecessary for VHE hosts) We print out the vcpu pointer as part of the panic message. Add a back reference to the 'running vcpu' in the host cpu context to preserve this. Signed-off-by: James Morse Reviewed-by: Christoffer Dall Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 8ad56472d67cf8ce638876558d36f3c83975294a Author: James Morse Date: Fri Jul 20 10:52:52 2018 +0100 KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation Commit 36989e7fd386a9a5822c48691473863f8fbb404d upstream. kvm_host_cpu_state is a per-cpu allocation made from kvm_arch_init() used to store the host EL1 registers when KVM switches to a guest. Make it easier for ASM to generate pointers into this per-cpu memory by making it a static allocation. Signed-off-by: James Morse Acked-by: Christoffer Dall Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit ed812b882599c0ec27a9f58831fc6bfb4efdac80 Author: James Morse Date: Fri Jul 20 10:52:51 2018 +0100 KVM: arm64: Store vcpu on the stack during __guest_enter() Commit 32b03d1059667a39e089c45ee38ec9c16332430f upstream. KVM uses tpidr_el2 as its private vcpu register, which makes sense for non-vhe world switch as only KVM can access this register. This means vhe Linux has to use tpidr_el1, which KVM has to save/restore as part of the host context. If the SDEI handler code runs behind KVMs back, it mustn't access any per-cpu variables. To allow this on systems with vhe we need to make the host use tpidr_el2, saving KVM from save/restoring it. __guest_enter() stores the host_ctxt on the stack, do the same with the vcpu. Signed-off-by: James Morse Reviewed-by: Christoffer Dall Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 115df2a7c5ba248360e4f92d78102c34f45977ee Author: Tetsuo Handa Date: Wed Jul 18 18:57:27 2018 +0900 net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL. commit 3bc53be9db21040b5d2de4d455f023c8c494aa68 upstream. syzbot is reporting stalls at nfc_llcp_send_ui_frame() [1]. This is because nfc_llcp_send_ui_frame() is retrying the loop without any delay when nonblocking nfc_alloc_send_skb() returned NULL. Since there is no need to use MSG_DONTWAIT if we retry until sock_alloc_send_pskb() succeeds, let's use blocking call. Also, in case an unexpected error occurred, let's break the loop if blocking nfc_alloc_send_skb() failed. [1] https://syzkaller.appspot.com/bug?id=4a131cc571c3733e0eff6bc673f4e36ae48f19c6 Signed-off-by: Tetsuo Handa Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a4b57440d971f4ee139a923a0b953f91c3769d70 Author: Santosh Shilimkar Date: Thu Jun 14 11:52:34 2018 -0700 rds: avoid unenecessary cong_update in loop transport commit f1693c63ab133d16994cc50f773982b5905af264 upstream. Loop transport which is self loopback, remote port congestion update isn't relevant. Infact the xmit path already ignores it. Receive path needs to do the same. Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com Reviewed-by: Sowmini Varadhan Signed-off-by: Santosh Shilimkar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1bbe05e27af1d67b0c9c113d1f62373e933205ec Author: Jan Kara Date: Mon Jun 18 15:46:58 2018 +0200 bdi: Fix another oops in wb_workfn() commit 3ee7e8697d5860b173132606d80a9cd35e7113ee upstream. syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was WB_shutting_down after wb->bdi->dev became NULL. This indicates that unregister_bdi() failed to call wb_shutdown() on one of wb objects. The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus drops bdi's reference to wb structures before going through the list of wbs again and calling wb_shutdown() on each of them. This way the loop iterating through all wbs can easily miss a wb if that wb has already passed through cgwb_remove_from_bdi_list() called from wb_shutdown() from cgwb_release_workfn() and as a result fully shutdown bdi although wb_workfn() for this wb structure is still running. In fact there are also other ways cgwb_bdi_unregister() can race with cgwb_release_workfn() leading e.g. to use-after-free issues: CPU1 CPU2 cgwb_bdi_unregister() cgwb_kill(*slot); cgwb_release() queue_work(cgwb_release_wq, &wb->release_work); cgwb_release_workfn() wb = list_first_entry(&bdi->wb_list, ...) spin_unlock_irq(&cgwb_lock); wb_shutdown(wb); ... kfree_rcu(wb, rcu); wb_shutdown(wb); -> oops use-after-free We solve these issues by synchronizing writeback structure shutdown from cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That way we also no longer need synchronization using WB_shutting_down as the mutex provides it for CONFIG_CGROUP_WRITEBACK case and without CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from bdi_unregister(). Reported-by: syzbot Acked-by: Tejun Heo Signed-off-by: Jan Kara Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 28c74ff85efd192aeca9005499ca50c24d795f61 Author: Florian Westphal Date: Mon Jul 9 13:43:38 2018 +0200 netfilter: ipv6: nf_defrag: drop skb dst before queueing commit 84379c9afe011020e797e3f50a662b08a6355dcf upstream. Eric Dumazet reports: Here is a reproducer of an annoying bug detected by syzkaller on our production kernel [..] ./b78305423 enable_conntrack Then : sleep 60 dmesg | tail -10 [ 171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2 Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook. skb gets queued until frag timer expiry -- 1 minute. Normally nf_conntrack_reasm gets called during prerouting, so skb has no dst yet which might explain why this wasn't spotted earlier. Reported-by: Eric Dumazet Reported-by: John Sperbeck Signed-off-by: Florian Westphal Tested-by: Eric Dumazet Reported-by: syzbot Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit b124e97f3ef528e81f28c783b6ddce5ee0faf119 Author: Willem de Bruijn Date: Wed Jul 11 12:00:44 2018 -0400 nsh: set mac len based on inner packet commit bab2c80e5a6c855657482eac9e97f5f3eedb509a upstream. When pulling the NSH header in nsh_gso_segment, set the mac length based on the encapsulated packet type. skb_reset_mac_len computes an offset to the network header, which here still points to the outer packet: > skb_reset_network_header(skb); > [...] > __skb_pull(skb, nsh_len); > skb_reset_mac_header(skb); // now mac hdr starts nsh_len == 8B after net hdr > skb_reset_mac_len(skb); // mac len = net hdr - mac hdr == (u16) -8 == 65528 > [..] > skb_mac_gso_segment(skb, ..) Link: http://lkml.kernel.org/r/CAF=yD-KeAcTSOn4AxirAxL8m7QAS8GBBe1w09eziYwvPbbUeYA@mail.gmail.com Reported-by: syzbot+7b9ed9872dab8c32305d@syzkaller.appspotmail.com Fixes: c411ed854584 ("nsh: add GSO support") Signed-off-by: Willem de Bruijn Acked-by: Jiri Benc Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 00235ab80007696f1274eca9c0529b8f7c85def0 Author: Tomas Bortoli Date: Fri Jul 13 16:58:59 2018 -0700 autofs: fix slab out of bounds read in getname_kernel() commit 02f51d45937f7bc7f4dee21e9f85b2d5eac37104 upstream. The autofs subsystem does not check that the "path" parameter is present for all cases where it is required when it is passed in via the "param" struct. In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD ioctl command. To solve it, modify validate_dev_ioctl(function to check that a path has been provided for ioctl commands that require it. Link: http://lkml.kernel.org/r/153060031527.26631.18306637892746301555.stgit@pluto.themaw.net Signed-off-by: Tomas Bortoli Signed-off-by: Ian Kent Reported-by: syzbot+60c837b428dc84e83a93@syzkaller.appspotmail.com Cc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 30a7a7b04f8b4e38b1af9acda2a4dd533e260ed2 Author: Dave Watson Date: Thu Jul 12 08:03:43 2018 -0700 tls: Stricter error checking in zerocopy sendmsg path commit 32da12216e467dea70a09cd7094c30779ce0f9db upstream. In the zerocopy sendmsg() path, there are error checks to revert the zerocopy if we get any error code. syzkaller has discovered that tls_push_record can return -ECONNRESET, which is fatal, and happens after the point at which it is safe to revert the iter, as we've already passed the memory to do_tcp_sendpages. Previously this code could return -ENOMEM and we would want to revert the iter, but AFAIK this no longer returns ENOMEM after a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"), so we fail for all error codes. Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com Signed-off-by: Dave Watson Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d9bb71d76c07d287f3904c463d1234e6b7b8049d Author: Eric Biggers Date: Wed Jul 11 10:46:29 2018 -0700 KEYS: DNS: fix parsing multiple options commit c604cb767049b78b3075497b80ebb8fd530ea2cc upstream. My recent fix for dns_resolver_preparse() printing very long strings was incomplete, as shown by syzbot which still managed to hit the WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key: precision 50001 too large WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0 The bug this time isn't just a printing bug, but also a logical error when multiple options ("#"-separated strings) are given in the key payload. Specifically, when separating an option string into name and value, if there is no value then the name is incorrectly considered to end at the end of the key payload, rather than the end of the current option. This bypasses validation of the option length, and also means that specifying multiple options is broken -- which presumably has gone unnoticed as there is currently only one valid option anyway. A similar problem also applied to option values, as the kstrtoul() when parsing the "dnserror" option will read past the end of the current option and into the next option. Fix these bugs by correctly computing the length of the option name and by copying the option value, null-terminated, into a temporary buffer. Reproducer for the WARN_ONCE() that syzbot hit: perl -e 'print "#A#", "\0" x 50000' | keyctl padd dns_resolver desc @s Reproducer for "dnserror" option being parsed incorrectly (expected behavior is to fail when seeing the unknown option "foo", actual behavior was to read the dnserror value as "1#foo" and fail there): perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s Reported-by: syzbot Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cba5008502f2238b716334d7cb6d847560ce01b4 Author: Eric Biggers Date: Fri Jul 13 16:59:27 2018 -0700 reiserfs: fix buffer overflow with long warning messages commit fe10e398e860955bac4d28ec031b701d358465e4 upstream. ReiserFS prepares log messages into a 1024-byte buffer with no bounds checks. Long messages, such as the "unknown mount option" warning when userspace passes a crafted mount options string, overflow this buffer. This causes KASAN to report a global-out-of-bounds write. Fix it by truncating messages to the buffer size. Link: http://lkml.kernel.org/r/20180707203621.30922-1-ebiggers3@gmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+b890b3335a4d8c608963@syzkaller.appspotmail.com Signed-off-by: Eric Biggers Reviewed-by: Andrew Morton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 766a7ad6639b4b2026d19cb217ada2ed8a2cfe57 Author: Florian Westphal Date: Wed Jun 6 12:14:56 2018 +0200 netfilter: ebtables: reject non-bridge targets commit 11ff7288beb2b7da889a014aff0a7b80bf8efcf3 upstream. the ebtables evaluation loop expects targets to return positive values (jumps), or negative values (absolute verdicts). This is completely different from what xtables does. In xtables, targets are expected to return the standard netfilter verdicts, i.e. NF_DROP, NF_ACCEPT, etc. ebtables will consider these as jumps. Therefore reject any target found due to unspec fallback. v2: also reject watchers. ebtables ignores their return value, so a target that assumes skb ownership (and returns NF_STOLEN) causes use-after-free. The only watchers in the 'ebtables' front-end are log and nflog; both have AF_BRIDGE specific wrappers on kernel side. Reported-by: syzbot+2b43f681169a2a0d306a@syzkaller.appspotmail.com Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit b5199c61e95c58fdfd9478bcbf368f1575d61da1 Author: Dexuan Cui Date: Mon Jul 9 13:16:07 2018 -0500 PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg() commit 35a88a18d7ea58600e11590405bc93b08e16e7f5 upstream. Commit de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()") uses local_bh_disable()/enable(), because hv_pci_onchannelcallback() can also run in tasklet context as the channel event callback, so bottom halves should be disabled to prevent a race condition. With CONFIG_PROVE_LOCKING=y in the recent mainline, or old kernels that don't have commit f71b74bca637 ("irq/softirqs: Use lockdep to assert IRQs are disabled/enabled"), when the upper layer IRQ code calls hv_compose_msi_msg() with local IRQs disabled, we'll see a warning at the beginning of __local_bh_enable_ip(): IRQs not enabled as expected WARNING: CPU: 0 PID: 408 at kernel/softirq.c:162 __local_bh_enable_ip The warning exposes an issue in de0aa7b2f97d: local_bh_enable() can potentially call do_softirq(), which is not supposed to run when local IRQs are disabled. Let's fix this by using local_irq_save()/restore() instead. Note: hv_pci_onchannelcallback() is not a hot path because it's only called when the PCI device is hot added and removed, which is infrequent. Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()") Signed-off-by: Dexuan Cui Signed-off-by: Lorenzo Pieralisi Signed-off-by: Bjorn Helgaas Reviewed-by: Haiyang Zhang Cc: stable@vger.kernel.org Cc: Stephen Hemminger Cc: K. Y. Srinivasan Signed-off-by: Greg Kroah-Hartman commit aa6be396714c79c44d5fa6b4ae09c8090fef8727 Author: Alan Jenkins Date: Thu Apr 12 19:11:58 2018 +0100 block: do not use interruptible wait anywhere commit 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428 upstream. When blk_queue_enter() waits for a queue to unfreeze, or unset the PREEMPT_ONLY flag, do not allow it to be interrupted by a signal. The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI device is resumed asynchronously, i.e. after un-freezing userspace tasks. So that commit exposed the bug as a regression in v4.15. A mysterious SIGBUS (or -EIO) sometimes happened during the time the device was being resumed. Most frequently, there was no kernel log message, and we saw Xorg or Xwayland killed by SIGBUS.[1] [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979 Without this fix, I get an IO error in this test: # dd if=/dev/sda of=/dev/null iflag=direct & \ while killall -SIGUSR1 dd; do sleep 0.1; done & \ echo mem > /sys/power/state ; \ sleep 5; killall dd # stop after 5 seconds The interruptible wait was added to blk_queue_enter in commit 3ef28e83ab15 ("block: generic request_queue reference counting"). Before then, the interruptible wait was only in blk-mq, but I don't think it could ever have been correct. Reviewed-by: Bart Van Assche Cc: stable@vger.kernel.org Signed-off-by: Alan Jenkins Signed-off-by: Jens Axboe Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit f1059632a4fcb0a73b1c315082cdb623a3b3ef52 Author: Masahiro Yamada Date: Sat Jun 23 01:06:34 2018 +0900 mtd: rawnand: denali_dt: set clk_x_rate to 200 MHz unconditionally commit 3f6e6986045d47f87bd982910821b7ab9758487e upstream. Since commit 1bb88666775e ("mtd: nand: denali: handle timing parameters by setup_data_interface()"), denali_dt.c gets the clock rate from the clock driver. The driver expects the frequency of the bus interface clock, whereas the clock driver of SOCFPGA provides the core clock. Thus, the setup_data_interface() hook calculates timing parameters based on a wrong frequency. To make it work without relying on the clock driver, hard-code the clock frequency, 200MHz. This is fine for existing DT of UniPhier, and also fixes the issue of SOCFPGA because both platforms use 200 MHz for the bus interface clock. Fixes: 1bb88666775e ("mtd: nand: denali: handle timing parameters by setup_data_interface()") Cc: linux-stable #4.14+ Reported-by: Philipp Rosenberger Suggested-by: Boris Brezillon Signed-off-by: Masahiro Yamada Tested-by: Richard Weinberger Signed-off-by: Boris Brezillon Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit c4bfed85bae8eba6cce22f9b9cce530720527411 Author: Stephan Mueller Date: Sat Jul 7 20:41:47 2018 +0200 crypto: af_alg - Initialize sg_num_bytes in error code path commit 2546da99212f22034aecf279da9c47cbfac6c981 upstream. The RX SGL in processing is already registered with the RX SGL tracking list to support proper cleanup. The cleanup code path uses the sg_num_bytes variable which must therefore be always initialized, even in the error code path. Signed-off-by: Stephan Mueller Reported-by: syzbot+9c251bdd09f83b92ba95@syzkaller.appspotmail.com #syz test: https://github.com/google/kmsan.git master CC: #4.14 Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management") Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management") Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit 16b3ae12337eb1695de0031fe4e030fb7fd6c2d9 Author: Peter Zijlstra Date: Mon Apr 30 12:00:11 2018 +0200 clocksource: Initialize cs->wd_list commit 5b9e886a4af97574ca3ce1147f35545da0e7afc7 upstream. A number of places relies on list_empty(&cs->wd_list), however the list_head does not get initialized. Do so upon registration, such that thereafter it is possible to rely on list_empty() correctly reflecting the list membership status. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Tested-by: Diego Viola Reviewed-by: Rafael J. Wysocki Cc: stable@vger.kernel.org Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: rui.zhang@intel.com Link: https://lkml.kernel.org/r/20180430100344.472662715@infradead.org Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit a406abeb7416501355668222164919836da67034 Author: Sean Young Date: Tue Mar 6 08:57:57 2018 -0500 media: rc: oops in ir_timer_keyup after device unplug commit 8d4068810d9926250dd2435719a080b889eb44c3 upstream. If there is IR in the raw kfifo when ir_raw_event_unregister() is called, then kthread_stop() causes ir_raw_event_thread to be scheduled, decode some scancodes and re-arm timer_keyup. The timer_keyup then fires when the rc device is long gone. Cc: stable@vger.kernel.org Signed-off-by: Sean Young Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit 67f7c68a9085957a291e91ee5a177f2b208aeda6 Author: Mathias Nyman Date: Mon May 14 11:57:23 2018 +0300 xhci: Fix USB3 NULL pointer dereference at logical disconnect. commit 2278446e2b7cd33ad894b32e7eb63afc7db6c86e upstream. Hub driver will try to disable a USB3 device twice at logical disconnect, racing with xhci_free_dev() callback from the first port disable. This can be triggered with "udisksctl power-off --block-device " or by writing "1" to the "remove" sysfs file for a USB3 device in 4.17-rc4. USB3 devices don't have a similar disabled link state as USB2 devices, and use a U3 suspended link state instead. In this state the port is still enabled and connected. hub_port_connect() first disconnects the device, then later it notices that device is still enabled (due to U3 states) it will try to disable the port again (set to U3). The xhci_free_dev() called during device disable is async, so checking for existing xhci->devs[i] when setting link state to U3 the second time was successful, even if device was being freed. The regression was caused by, and whole thing revealed by, Commit 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device") which sets xhci->devs[i]->udev to NULL before xhci_virt_dev() returned. and causes a NULL pointer dereference the second time we try to set U3. Fix this by checking xhci->devs[i]->udev exists before setting link state. The original patch went to stable so this fix needs to be applied there as well. Fixes: 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device") Cc: Reported-by: Jordan Glover Tested-by: Jordan Glover Signed-off-by: Mathias Nyman Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit 2be27d444f61c5542df5c00892817124582428a4 Author: Stefan Wahren Date: Sun Jul 15 21:53:20 2018 +0200 net: lan78xx: Fix race in tx pending skb size calculation commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: https://github.com/raspberrypi/linux/issues/2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable Signed-off-by: Floris Bos Signed-off-by: Stefan Wahren Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 12c0949a07458103b539eaba2330763873ef4cb5 Author: Ping-Ke Shih Date: Thu Jun 28 10:02:27 2018 +0800 rtlwifi: rtl8821ae: fix firmware is not ready to run commit 9a98302de19991d51e067b88750585203b2a3ab6 upstream. Without this patch, firmware will not run properly on rtl8821ae, and it causes bad user experience. For example, bad connection performance with low rate, higher power consumption, and so on. rtl8821ae uses two kinds of firmwares for normal and WoWlan cases, and each firmware has firmware data buffer and size individually. Original code always overwrite size of normal firmware rtlpriv->rtlhal.fwsize, and this mismatch causes firmware checksum error, then firmware can't start. In this situation, driver gives message "Firmware is not ready to run!". Fixes: fe89707f0afa ("rtlwifi: rtl8821ae: Simplify loading of WOWLAN firmware") Signed-off-by: Ping-Ke Shih Cc: Stable # 4.0+ Reviewed-by: Larry Finger Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman commit ee8d2e719c1e5ade96524d4480cffc17f8c756a8 Author: Ping-Ke Shih Date: Fri Jun 22 13:31:57 2018 +0800 rtlwifi: Fix kernel Oops "Fw download fail!!" commit 12dfa2f68ab659636e092db13b5d17cf9aac82af upstream. When connecting to AP, mac80211 asks driver to enter and leave PS quickly, but driver deinit doesn't wait for delayed work complete when entering PS, then driver reinit procedure and delay work are running simultaneously. This will cause unpredictable kernel oops or crash like rtl8723be: error H2C cmd because of Fw download fail!!! WARNING: CPU: 3 PID: 159 at drivers/net/wireless/realtek/rtlwifi/ rtl8723be/fw.c:227 rtl8723be_fill_h2c_cmd+0x182/0x510 [rtl8723be] CPU: 3 PID: 159 Comm: kworker/3:2 Tainted: G O 4.16.13-2-ARCH #1 Hardware name: ASUSTeK COMPUTER INC. X556UF/X556UF, BIOS X556UF.406 10/21/2016 Workqueue: rtl8723be_pci rtl_c2hcmd_wq_callback [rtlwifi] RIP: 0010:rtl8723be_fill_h2c_cmd+0x182/0x510 [rtl8723be] RSP: 0018:ffffa6ab01e1bd70 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffa26069071520 RCX: 0000000000000001 RDX: 0000000080000001 RSI: ffffffff8be70e9c RDI: 00000000ffffffff RBP: 0000000000000000 R08: 0000000000000048 R09: 0000000000000348 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffffa26069071520 R14: 0000000000000000 R15: ffffa2607d205f70 FS: 0000000000000000(0000) GS:ffffa26081d80000(0000) knlGS:000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000443b39d3000 CR3: 000000037700a005 CR4: 00000000003606e0 Call Trace: ? halbtc_send_bt_mp_operation.constprop.17+0xd5/0xe0 [btcoexist] ? ex_btc8723b1ant_bt_info_notify+0x3b8/0x820 [btcoexist] ? rtl_c2hcmd_launcher+0xab/0x110 [rtlwifi] ? process_one_work+0x1d1/0x3b0 ? worker_thread+0x2b/0x3d0 ? process_one_work+0x3b0/0x3b0 ? kthread+0x112/0x130 ? kthread_create_on_node+0x60/0x60 ? ret_from_fork+0x35/0x40 Code: 00 76 b4 e9 e2 fe ff ff 4c 89 ee 4c 89 e7 e8 56 22 86 ca e9 5e ... This patch ensures all delayed works done before entering PS to satisfy our expectation, so use cancel_delayed_work_sync() instead. An exception is delayed work ips_nic_off_wq because running task may be itself, so add a parameter ips_wq to deinit function to handle this case. This issue is reported and fixed in below threads: https://github.com/lwfinger/rtlwifi_new/issues/367 https://github.com/lwfinger/rtlwifi_new/issues/366 Tested-by: Evgeny Kapun # 8723DE Tested-by: Shivam Kakkar # 8723BE on 4.18-rc1 Signed-off-by: Ping-Ke Shih Fixes: cceb0a597320 ("rtlwifi: Add work queue for c2h cmd.") Cc: Stable # 4.11+ Reviewed-by: Larry Finger Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman commit 3caea5150c15c4258a8d9ee0c5758d53d4753009 Author: Gustavo A. R. Silva Date: Mon Jul 16 20:59:58 2018 -0500 net: cxgb3_main: fix potential Spectre v1 commit 676bcfece19f83621e905aa55b5ed2d45cc4f2d3 upstream. t.qset_idx can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:2286 cxgb_extension_ioctl() warn: potential spectre issue 'adapter->msix_info' Fix this by sanitizing t.qset_idx before using it to index adapter->msix_info Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d8530e891edde17506727af174d66d3f0b4e6b34 Author: Claudio Imbrenda Date: Wed Jun 20 15:51:51 2018 +0200 VSOCK: fix loopback on big-endian systems [ Upstream commit e5ab564c9ebee77794842ca7d7476147b83d6a27 ] The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be used, and in fact in virtio_transport_common.c only 64 bit accessors are used. Using 32 bit accessors for 64 bit values breaks big endian systems. This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt. Fixes: b9116823189e85ccf384 ("VSOCK: add loopback to virtio_transport") Signed-off-by: Claudio Imbrenda Reviewed-by: Stefan Hajnoczi Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7eba6537c3d17ca582d4b3ab38af4463020563d9 Author: Jason Wang Date: Thu Jun 21 13:11:31 2018 +0800 vhost_net: validate sock before trying to put its fd [ Upstream commit b8f1f65882f07913157c44673af7ec0b308d03eb ] Sock will be NULL if we pass -1 to vhost_net_set_backend(), but when we meet errors during ubuf allocation, the code does not check for NULL before calling sockfd_put(), this will lead NULL dereferencing. Fixing by checking sock pointer before. Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support") Reported-by: Dan Carpenter Signed-off-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2dc4696ee6d9471ce110157fb695f75b8fd912e1 Author: Ilpo Järvinen Date: Fri Jun 29 13:07:53 2018 +0300 tcp: prevent bogus FRTO undos with non-SACK flows [ Upstream commit 1236f22fbae15df3736ab4a984c64c0c6ee6254c ] If SACK is not enabled and the first cumulative ACK after the RTO retransmission covers more than the retransmitted skb, a spurious FRTO undo will trigger (assuming FRTO is enabled for that RTO). The reason is that any non-retransmitted segment acknowledged will set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is no indication that it would have been delivered for real (the scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK case so the check for that bit won't help like it does with SACK). Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo in tcp_process_loss. We need to use more strict condition for non-SACK case and check that none of the cumulatively ACKed segments were retransmitted to prove that progress is due to original transmissions. Only then keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in non-SACK case. (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS to better indicate its purpose but to keep this change minimal, it will be done in another patch). Besides burstiness and congestion control violations, this problem can result in RTO loop: When the loss recovery is prematurely undoed, only new data will be transmitted (if available) and the next retransmission can occur only after a new RTO which in case of multiple losses (that are not for consecutive packets) requires one RTO per loss to recover. Signed-off-by: Ilpo Järvinen Tested-by: Neal Cardwell Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3373d6d056d7b9d1de397498389891b8b95dfd8a Author: Yuchung Cheng Date: Wed Jun 27 16:04:48 2018 -0700 tcp: fix Fast Open key endianness [ Upstream commit c860e997e9170a6d68f9d1e6e2cf61f572191aaf ] Fast Open key could be stored in different endian based on the CPU. Previously hosts in different endianness in a server farm using the same key config (sysctl value) would produce different cookies. This patch fixes it by always storing it as little endian to keep same API for LE hosts. Reported-by: Daniele Iamartino Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4f5f7bce308eebcfc8cf63134d4147ccfa8ccb32 Author: Doron Roberts-Kedes Date: Tue Jun 26 18:33:33 2018 -0700 strparser: Remove early eaten to fix full tcp receive buffer stall [ Upstream commit 977c7114ebda2e746a114840d3a875e0cdb826fb ] On receving an incomplete message, the existing code stores the remaining length of the cloned skb in the early_eaten field instead of incrementing the value returned by __strp_recv. This defers invocation of sock_rfree for the current skb until the next invocation of __strp_recv, which returns early_eaten if early_eaten is non-zero. This behavior causes a stall when the current message occupies the very tail end of a massive skb, and strp_peek/need_bytes indicates that the remainder of the current message has yet to arrive on the socket. The TCP receive buffer is totally full, causing the TCP window to go to zero, so the remainder of the message will never arrive. Incrementing the value returned by __strp_recv by the amount otherwise stored in early_eaten prevents stalls of this nature. Signed-off-by: Doron Roberts-Kedes Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 32761addd300814d4d8e76a9db1587b1b74e1cf8 Author: Bhadram Varka Date: Sun Jun 17 20:02:05 2018 +0530 stmmac: fix DMA channel hang in half-duplex mode [ Upstream commit b6cfffa7ad923c73f317ea50fd4ebcb3b4b6669c ] HW does not support Half-duplex mode in multi-queue scenario. Fix it by not advertising the Half-Duplex mode if multi-queue enabled. Signed-off-by: Bhadram Varka Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5e90946baa576df0c28b74cbde16db236f2afd79 Author: Jiri Slaby Date: Mon Jun 25 09:26:27 2018 +0200 r8152: napi hangup fix after disconnect [ Upstream commit 0ee1f4734967af8321ecebaf9c74221ace34f2d5 ] When unplugging an r8152 adapter while the interface is UP, the NIC becomes unusable. usb->disconnect (aka rtl8152_disconnect) deletes napi. Then, rtl8152_disconnect calls unregister_netdev and that invokes netdev->ndo_stop (aka rtl8152_close). rtl8152_close tries to napi_disable, but the napi is already deleted by disconnect above. So the first while loop in napi_disable never finishes. This results in complete deadlock of the network layer as there is rtnl_mutex held by unregister_netdev. So avoid the call to napi_disable in rtl8152_close when the device is already gone. The other calls to usb_kill_urb, cancel_delayed_work_sync, netif_stop_queue etc. seem to be fine. The urb and netdev is not destroyed yet. Signed-off-by: Jiri Slaby Cc: linux-usb@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d8c1603d0bb47caafbaeaf76100842e8e0c46de8 Author: Aleksander Morgado Date: Sat Jun 23 23:22:52 2018 +0200 qmi_wwan: add support for the Dell Wireless 5821e module [ Upstream commit e7e197edd09c25774b4f12cab19f9d5462f240f4 ] This module exposes two USB configurations: a QMI+AT capable setup on USB config #1 and a MBIM capable setup on USB config #2. By default the kernel will choose the MBIM capable configuration as long as the cdc_mbim driver is available. This patch adds support for the QMI port in the secondary configuration. Signed-off-by: Aleksander Morgado Acked-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dd537828bf733a81c2748a4ba90876b24d0720c5 Author: Sudarsana Reddy Kalluru Date: Sun Jul 1 20:03:05 2018 -0700 qed: Limit msix vectors in kdump kernel to the minimum required count. [ Upstream commit bb7858ba1102f82470a917e041fd23e6385c31be ] Memory size is limited in the kdump kernel environment. Allocation of more msix-vectors (or queues) consumes few tens of MBs of memory, which might lead to the kdump kernel failure. This patch adds changes to limit the number of MSI-X vectors in kdump kernel to minimum required value (i.e., 2 per engine). Fixes: fe56b9e6a ("qed: Add module with basic common support") Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 023a2043bc8aacbf4418d41ee849b73c83e1cd3c Author: Sudarsana Reddy Kalluru Date: Sun Jul 1 20:03:07 2018 -0700 qed: Fix use of incorrect size in memcpy call. [ Upstream commit cc9b27cdf7bd3c86df73439758ac1564bc8f5bbe ] Use the correct size value while copying chassis/port id values. Fixes: 6ad8c632e ("qed: Add support for query/config dcbx.") Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4c2849931b239432020a7d77d955817b9b5e049b Author: Sudarsana Reddy Kalluru Date: Sun Jul 1 20:03:06 2018 -0700 qed: Fix setting of incorrect eswitch mode. [ Upstream commit 538f8d00ba8bb417c4d9e76c61dee59d812d8287 ] By default, driver sets the eswitch mode incorrectly as VEB (virtual Ethernet bridging). Need to set VEB eswitch mode only when sriov is enabled, and it should be to set NONE by default. The patch incorporates this change. Fixes: 0fefbfbaa ("qed*: Management firmware - notifications and defaults") Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d725fde81ffc40e6e481c0705985471ee74a6a75 Author: Sudarsana Reddy Kalluru Date: Sun Jul 1 20:03:08 2018 -0700 qede: Adverstise software timestamp caps when PHC is not available. [ Upstream commit 82a4e71b1565dea8387f54503e806cf374e779ec ] When ptp clock is not available for a PF (e.g., higher PFs in NPAR mode), get-tsinfo() callback should return the software timestamp capabilities instead of returning the error. Fixes: 4c55215c ("qede: Add driver support for PTP") Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 35e324ebeee0fed7d7d30b6ed9873a3f06f86e94 Author: David Ahern Date: Mon Jun 18 12:30:37 2018 -0700 net/tcp: Fix socket lookups with SO_BINDTODEVICE [ Upstream commit 8c43bd1706885ba1acfa88da02bc60a2ec16f68c ] Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups need to fail if dev_match is not true. Currently, a packet to a given port can match a socket bound to device when it should not. In the VRF case, this causes the lookup to hit a VRF socket and not a global socket resulting in a response trying to go through the VRF when it should not. Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups") Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups") Reported-by: Lou Berger Diagnosed-by: Renato Westphal Tested-by: Renato Westphal Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b3c66b54d8fef1d032580d3f7660279cdd277576 Author: Eric Dumazet Date: Tue Jun 19 19:18:50 2018 -0700 net: sungem: fix rx checksum support [ Upstream commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f ] After commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"), sungem owners reported the infamous "eth0: hw csum failure" message. CHECKSUM_COMPLETE has in fact never worked for this driver, but this was masked by the fact that upper stacks had to strip the FCS, and therefore skb->ip_summed was set back to CHECKSUM_NONE before my recent change. Driver configures a number of bytes to skip when the chip computes the checksum, and for some reason only half of the Ethernet header was skipped. Then a second problem is that we should strip the FCS by default, unless the driver is updated to eventually support NETIF_F_RXFCS in the future. Finally, a driver should check if NETIF_F_RXCSUM feature is enabled or not, so that the admin can turn off rx checksum if wanted. Many thanks to Andreas Schwab and Mathieu Malaterre for their help in debugging this issue. Signed-off-by: Eric Dumazet Reported-by: Meelis Roos Reported-by: Mathieu Malaterre Reported-by: Andreas Schwab Tested-by: Andreas Schwab Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b36f997add36c35c893c3dff7e1e01caa9541a13 Author: Konstantin Khlebnikov Date: Fri Jun 15 13:27:31 2018 +0300 net_sched: blackhole: tell upper qdisc about dropped packets [ Upstream commit 7e85dc8cb35abf16455f1511f0670b57c1a84608 ] When blackhole is used on top of classful qdisc like hfsc it breaks qlen and backlog counters because packets are disappear without notice. In HFSC non-zero qlen while all classes are inactive triggers warning: WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc] and schedules watchdog work endlessly. This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS, this flag tells upper layer: this packet is gone and isn't queued. Signed-off-by: Konstantin Khlebnikov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5e6b4b9b28b72acd98729ad098e720a0606de2fd Author: Eric Dumazet Date: Thu Jun 21 14:16:02 2018 -0700 net/packet: fix use-after-free [ Upstream commit 945d015ee0c3095d2290e845565a23dedfd8027c ] We should put copy_skb in receive_queue only after a successful call to virtio_net_hdr_from_skb(). syzbot report : BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline] BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline] BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553 CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __skb_unlink include/linux/skbuff.h:1843 [inline] __skb_dequeue include/linux/skbuff.h:1863 [inline] skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852 packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331 packet_release+0x630/0xd90 net/packet/af_packet.c:2991 __sock_release+0xd7/0x260 net/socket.c:603 sock_close+0x19/0x20 net/socket.c:1186 __fput+0x35b/0x8b0 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x1ec/0x2a0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x1b08/0x2750 kernel/exit.c:865 do_group_exit+0x177/0x440 kernel/exit.c:968 __do_sys_exit_group kernel/exit.c:979 [inline] __se_sys_exit_group kernel/exit.c:977 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4448e9 Code: Bad RIP value. RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9 RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001 RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000 R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0 R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 skb_clone+0x1f5/0x500 net/core/skbuff.c:1282 tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 kfree_skbmem+0x154/0x230 net/core/skbuff.c:582 __kfree_skb net/core/skbuff.c:642 [inline] kfree_skb+0x1a5/0x580 net/core/skbuff.c:659 tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8801b044ecc0 which belongs to the cache skbuff_head_cache of size 232 The buggy address is located 0 bytes inside of 232-byte region [ffff8801b044ecc0, ffff8801b044eda8) The buggy address belongs to the page: page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0 raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc >ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ^ ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv") Signed-off-by: Eric Dumazet Reported-by: syzbot Cc: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ddbbd3e057438d4998904e39396871317a429b6b Author: Antoine Tenart Date: Fri Jun 22 10:15:39 2018 +0200 net: mvneta: fix the Rx desc DMA address in the Rx path [ Upstream commit 271f7ff5aa5a73488b7a9d8b84b5205fb5b2f7cc ] When using s/w buffer management, buffers are allocated and DMA mapped. When doing so on an arm64 platform, an offset correction is applied on the DMA address, before storing it in an Rx descriptor. The issue is this DMA address is then used later in the Rx path without removing the offset correction. Thus the DMA address is wrong, which can led to various issues. This patch fixes this by removing the offset correction from the DMA address retrieved from the Rx descriptor before using it in the Rx path. Fixes: 8d5047cf9ca2 ("net: mvneta: Convert to be 64 bits compatible") Signed-off-by: Antoine Tenart Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7ae129dd6778aa9b573b17083b13674d4336fa22 Author: Shay Agroskin Date: Tue May 22 14:14:02 2018 +0300 net/mlx5: Fix wrong size allocation for QoS ETC TC regitster [ Upstream commit d14fcb8d877caf1b8d6bd65d444bf62b21f2070c ] The driver allocates wrong size (due to wrong struct name) when issuing a query/set request to NIC's register. Fixes: d8880795dabf ("net/mlx5e: Implement DCBNL IEEE max rate") Signed-off-by: Shay Agroskin Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 46ff2bc7aeb1b3d2fc36b20ddda0411889165cf1 Author: Eli Cohen Date: Wed Jun 13 10:27:34 2018 +0300 net/mlx5: Fix required capability for manipulating MPFS [ Upstream commit f811980444ec59ad62f9e041adbb576a821132c7 ] Manipulating of the MPFS requires eswitch manager capabilities. Fixes: eeb66cdb6826 ('net/mlx5: Separate between E-Switch and MPFS') Signed-off-by: Eli Cohen Reviewed-by: Or Gerlitz Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 8b7b5f76693c865fd85715e0c047719766eea2be Author: Alex Vesker Date: Fri May 25 20:25:59 2018 +0300 net/mlx5: Fix incorrect raw command length parsing [ Upstream commit 603b7bcff824740500ddfa001d7a7168b0b38542 ] The NULL character was not set correctly for the string containing the command length, this caused failures reading the output of the command due to a random length. The fix is to initialize the output length string. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alex Vesker Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 075b5038124851d8fc79c68cb74680b20fbd2736 Author: Alex Vesker Date: Tue Jun 12 16:14:31 2018 +0300 net/mlx5: Fix command interface race in polling mode [ Upstream commit d412c31dae053bf30a1bc15582a9990df297a660 ] The command interface can work in two modes: Events and Polling. In the general case, each time we invoke a command, a work is queued to handle it. When working in events, the interrupt handler completes the command execution. On the other hand, when working in polling mode, the work itself completes it. Due to a bug in the work handler, a command could have been completed by the interrupt handler, while the work handler hasn't finished yet, causing the it to complete once again if the command interface mode was changed from Events to polling after the interrupt handler was called. mlx5_unload_one() mlx5_stop_eqs() // Destroy the EQ before cmd EQ ...cmd_work_handler() write_doorbell() --> EVENT_TYPE_CMD mlx5_cmd_comp_handler() // First free free_ent(cmd, ent->idx) complete(&ent->done) <-- mlx5_stop_eqs //cmd was complete // move to polling before destroying the last cmd EQ mlx5_cmd_use_polling() cmd->mode = POLL; --> cmd_work_handler (continues) if (cmd->mode == POLL) mlx5_cmd_comp_handler() // Double free The solution is to store the cmd->mode before writing the doorbell. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alex Vesker Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit c3994f4f8bdaf660ad6740347d0de7acdeb4680a Author: Or Gerlitz Date: Thu May 31 11:16:18 2018 +0300 net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager [ Upstream commit 0efc8562491b7d36f6bbc4fbc8f3348cb6641e9c ] In smartnic env, the host (PF) driver might not be an e-switch manager, hence the FW will err on driver attempts to deal with setting/unsetting the eswitch and as a result the overall setup of sriov will fail. Fix that by avoiding the operation if e-switch management is not allowed for this driver instance. While here, move to use the correct name for the esw manager capability name. Fixes: 81848731ff40 ('net/mlx5: E-Switch, Add SR-IOV (FDB) support') Signed-off-by: Or Gerlitz Reported-by: Guy Kushnir Reviewed-by: Eli Cohen Tested-by: Eli Cohen Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit b216867c02ac1146f3b66768845814150732f1e2 Author: Or Gerlitz Date: Mon Jun 4 19:46:53 2018 +0300 net/mlx5e: Don't attempt to dereference the ppriv struct if not being eswitch manager [ Upstream commit 8ffd569aaa818f2624ca821d9a246342fa8b8c50 ] The check for cpu hit statistics was not returning immediate false for any non vport rep netdev and hence we crashed (say on mlx5 probed VFs) if user-space tool was calling into any possible netdev in the system. Fix that by doing a proper check before dereferencing. Fixes: 1d447a39142e ('net/mlx5e: Extendable vport representor netdev private data') Signed-off-by: Or Gerlitz Reported-by: Eli Cohen Reviewed-by: Eli Cohen Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 1d8dda4444fa0e8799a40d74e3502aac841ea002 Author: Or Gerlitz Date: Thu May 31 11:32:56 2018 +0300 net/mlx5e: Avoid dealing with vport representors if not being e-switch manager [ Upstream commit 733d3e5497070d05971352ca5087bac83c197c3d ] In smartnic env, the host (PF) driver might not be an e-switch manager, hence the switchdev mode representors are running on the embedded cpu (EC) and not at the host. As such, we should avoid dealing with vport representors if not being esw manager. While here, make sure to disallow eswitch switchdev related setups through devlink if we are not esw managers. Fixes: cb67b832921c ('net/mlx5e: Introduce SRIOV VF representors') Signed-off-by: Or Gerlitz Reviewed-by: Eli Cohen Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit f389c17b8dc54b474d33728e5882dcc276a467fa Author: Harini Katakam Date: Wed Jun 20 17:04:20 2018 +0530 net: macb: Fix ptp time adjustment for large negative delta [ Upstream commit 64d7839af8c8f67daaf9bf387135052c55d85f90 ] When delta passed to gem_ptp_adjtime is negative, the sign is maintained in the ns_to_timespec64 conversion. Hence timespec_add should be used directly. timespec_sub will just subtract the negative value thus increasing the time difference. Signed-off-by: Harini Katakam Acked-by: Nicolas Ferre Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b364a914c4993f46c0d849dab20d67ba0d82809d Author: Sabrina Dubroca Date: Sat Jun 30 17:38:55 2018 +0200 net: fix use-after-free in GRO with ESP [ Upstream commit 603d4cf8fe095b1ee78f423d514427be507fb513 ] Since the addition of GRO for ESP, gro_receive can consume the skb and return -EINPROGRESS. In that case, the lower layer GRO handler cannot touch the skb anymore. Commit 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") converted some of the gro_receive handlers that can lead to ESP's gro_receive so that they wouldn't access the skb when -EINPROGRESS is returned, but missed other spots, mainly in tunneling protocols. This patch finishes the conversion to using skb_gro_flush_final(), and adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and GUE. Fixes: 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") Signed-off-by: Sabrina Dubroca Reviewed-by: Stefano Brivio Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fb6b14663d56afabf86a3a2078c37a8d28fc718a Author: Eric Dumazet Date: Fri Jun 22 06:44:15 2018 -0700 net: dccp: switch rx_tstamp_last_feedback to monotonic clock [ Upstream commit 0ce4e70ff00662ad7490e545ba0cd8c1fa179fca ] To compute delays, better not use time of the day which can be changed by admins or malicious programs. Also change ccid3_first_li() to use s64 type for delta variable to avoid potential overflows. Signed-off-by: Eric Dumazet Cc: Gerrit Renker Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a3225a836e56487b32f21ff1a8099c731256d10a Author: Eric Dumazet Date: Fri Jun 22 06:44:14 2018 -0700 net: dccp: avoid crash in ccid3_hc_rx_send_feedback() [ Upstream commit 74174fe5634ffbf645a7ca5a261571f700b2f332 ] On fast hosts or malicious bots, we trigger a DCCP_BUG() which seems excessive. syzbot reported : BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback() CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline] ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793 ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline] dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180 dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378 dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654 sk_backlog_rcv include/net/sock.h:914 [inline] __sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517 dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875 ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215 NF_HOOK include/linux/netfilter.h:287 [inline] ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396 NF_HOOK include/linux/netfilter.h:287 [inline] ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492 __netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 process_backlog+0x219/0x760 net/core/dev.c:5373 napi_poll net/core/dev.c:5771 [inline] net_rx_action+0x7da/0x1980 net/core/dev.c:5837 __do_softirq+0x2e8/0xb17 kernel/softirq.c:284 run_ksoftirqd+0x86/0x100 kernel/softirq.c:645 smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 Signed-off-by: Eric Dumazet Reported-by: syzbot Cc: Gerrit Renker Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a2e53d69f68587d5272d45e2f4674746977bd5b2 Author: Jesper Dangaard Brouer Date: Tue Jun 26 17:39:48 2018 +0200 ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing [ Upstream commit ad088ec480768850db019a5cc543685e868a513d ] The driver was combining the XDP_TX tail flush and XDP_REDIRECT map flushing (xdp_do_flush_map). This is suboptimal, these two flush operations should be kept separate. Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map") Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f5a42d63f0d417654a8a4d4448015e1c642ee338 Author: Xin Long Date: Thu Jun 21 12:56:04 2018 +0800 ipvlan: fix IFLA_MTU ignored on NEWLINK [ Upstream commit 30877961b1cdd6fdca783c2e8c4f0f47e95dc58c ] Commit 296d48568042 ("ipvlan: inherit MTU from master device") adjusted the mtu from the master device when creating a ipvlan device, but it would also override the mtu value set in rtnl_create_link. It causes IFLA_MTU param not to take effect. So this patch is to not adjust the mtu if IFLA_MTU param is set when creating a ipvlan device. Fixes: 296d48568042 ("ipvlan: inherit MTU from master device") Reported-by: Jianlin Shi Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d10c0baaae3f84185db3b3076a87d4343c9d62e6 Author: Eric Biggers Date: Sat Jun 30 15:26:56 2018 -0700 ipv6: sr: fix passing wrong flags to crypto_alloc_shash() [ Upstream commit fc9c2029e37c3ae9efc28bf47045e0b87e09660c ] The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags, not 'gfp_t'. So don't pass GFP_KERNEL to it. Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Signed-off-by: Eric Biggers Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e34e92d8b689179a6715518346a687bc1e87911a Author: Stephen Hemminger Date: Fri Jun 29 14:07:16 2018 -0700 hv_netvsc: split sub-channel setup into async and sync [ Upstream commit 3ffe64f1a641b80a82d9ef4efa7a05ce69049871 ] When doing device hotplug the sub channel must be async to avoid deadlock issues because device is discovered in softirq context. When doing changes to MTU and number of channels, the setup must be synchronous to avoid races such as when MTU and device settings are done in a single ip command. Reported-by: Thomas Walker Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Fixes: 732e49850c5e ("netvsc: fix race on sub channel creation") Signed-off-by: Stephen Hemminger Signed-off-by: Haiyang Zhang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 43c9207d029648dce52a45cd07dfb832d8a7957a Author: Gustavo A. R. Silva Date: Fri Jun 29 13:28:07 2018 -0500 atm: zatm: Fix potential Spectre v1 [ Upstream commit ced9e191501e52b95e1b57b8e0db00943869eed0 ] pool can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue 'zatm_dev->pool_info' (local cap) Fix this by sanitizing pool before using it to index zatm_dev->pool_info Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f93d65939a4a80f31e50af88bbd5fcae33266009 Author: David Woodhouse Date: Sat Jun 16 11:55:44 2018 +0100 atm: Preserve value of skb->truesize when accounting to vcc [ Upstream commit 9bbe60a67be5a1c6f79b3c9be5003481a50529ff ] ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on which they are to be sent. But it doesn't take ownership of those packets from the sock (if any) which originally owned them. They should remain owned by their actual sender until they've left the box. There's a hack in pskb_expand_head() to avoid adjusting skb->truesize for certain skbs, precisely to avoid messing up sk_wmem_alloc accounting. Ideally that hack would cover the ATM use case too, but it doesn't — skbs which aren't owned by any sock, for example PPP control frames, still get their truesize adjusted when the low-level ATM driver adds headroom. This has always been an issue, it seems. The truesize of a packet increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't for normal traffic, only for control frames. So I think we just got away with it, and we probably needed to send 2GiB of LCP echo frames before the misaccounting would ever have caused a problem and caused atm_may_send() to start refusing packets. Commit 14afee4b609 ("net: convert sock.sk_wmem_alloc from atomic_t to refcount_t") did exactly what it was intended to do, and turned this mostly-theoretical problem into a real one, causing PPPoATM to fail immediately as sk_wmem_alloc underflows and atm_may_send() *immediately* starts refusing to allow new packets. The least intrusive solution to this problem is to stash the value of skb->truesize that was accounted to the VCC, in a new member of the ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that value instead of the then-current value of skb->truesize. Fixes: 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()") Signed-off-by: David Woodhouse Tested-by: Kevin Darbyshire-Bryant Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c62e2f087af11564ecbd1c0d08bbd84ac92afc07 Author: Sabrina Dubroca Date: Fri Jun 29 17:51:26 2018 +0200 alx: take rtnl before calling __alx_open from resume [ Upstream commit bc800e8b39bad60ccdb83be828da63af71ab87b3 ] The __alx_open function can be called from ndo_open, which is called under RTNL, or from alx_resume, which isn't. Since commit d768319cd427, we're calling the netif_set_real_num_{tx,rx}_queues functions, which need to be called under RTNL. This is similar to commit 0c2cc02e571a ("igb: Move the calls to set the Tx and Rx queues into igb_open"). Fixes: d768319cd427 ("alx: enable multiple tx queues") Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 03bb9187754e2c82626dafcd3e12a137cbb952ce Author: Christian Lamparter Date: Fri Aug 25 15:47:24 2017 +0200 crypto: crypto4xx - fix crypto4xx_build_pdr, crypto4xx_build_sdr leak commit 5d59ad6eea82ef8df92b4109615a0dde9d8093e9 upstream. If one of the later memory allocations in rypto4xx_build_pdr() fails: dev->pdr (and/or) dev->pdr_uinfo wouldn't be freed. crypto4xx_build_sdr() has the same issue with dev->sdr. Signed-off-by: Christian Lamparter Signed-off-by: Herbert Xu Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman commit 996a6a393b3f82f306ec65f25756a31f51ca3967 Author: Christian Lamparter Date: Fri Aug 25 15:47:14 2017 +0200 crypto: crypto4xx - remove bad list_del commit a728a196d253530f17da5c86dc7dfbe58c5f7094 upstream. alg entries are only added to the list, after the registration was successful. If the registration failed, it was never added to the list in the first place. Signed-off-by: Christian Lamparter Signed-off-by: Herbert Xu Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman commit dc3782a3e9c61872df8c8ed395b60d3b7ad782b8 Author: Jaehoon Chung Date: Mon Jan 22 11:28:54 2018 +0900 PCI: exynos: Fix a potential init_clk_resources NULL pointer dereference commit b5d6bc90c9129279d363ccbc02ad11e7b657c0b4 upstream. In order to avoid triggering a NULL pointer dereference in exynos_pcie_probe() a check must be put in place to detect if the init_clk_resources hook is initialized before calling it. Add the respective function pointer check in exynos_pcie_probe(). Signed-off-by: Jaehoon Chung [lorenzo.pieralisi@arm.com: rewrote the commit log] Signed-off-by: Lorenzo Pieralisi Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman commit b1c3ce0cfff275ee2e88fe505a8f44712303f895 Author: Jonas Gorski Date: Sun Oct 1 13:02:16 2017 +0200 bcm63xx_enet: do not write to random DMA channel on BCM6345 commit d6213c1f2ad54a964b77471690264ed685718928 upstream. The DMA controller regs actually point to DMA channel 0, so the write to ENETDMA_CFG_REG will actually modify a random DMA channel. Since DMA controller registers do not exist on BCM6345, guard the write with the usual check for dma_has_sram. Signed-off-by: Jonas Gorski Signed-off-by: David S. Miller Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman commit b913a05ab75e806db1ef6d54fb206a40dd4d5f1b Author: Jonas Gorski Date: Sun Oct 1 13:02:15 2017 +0200 bcm63xx_enet: correct clock usage commit 9c86b846ce02f7e35d7234cf090b80553eba5389 upstream. Check the return code of prepare_enable and change one last instance of enable only to prepare_enable. Also properly disable and release the clock in error paths and on remove for enetsw. Signed-off-by: Jonas Gorski Signed-off-by: David S. Miller Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman commit 1ccab2bf726e1cd9292deecf4d72d732527035d6 Author: alex chen Date: Wed Nov 15 17:31:44 2017 -0800 ocfs2: ip_alloc_sem should be taken in ocfs2_get_block() commit 3e4c56d41eef5595035872a2ec5a483f42e8917f upstream. ip_alloc_sem should be taken in ocfs2_get_block() when reading file in DIRECT mode to prevent concurrent access to extent tree with ocfs2_dio_end_io_write(), which may cause BUGON in the following situation: read file 'A' end_io of writing file 'A' vfs_read __vfs_read ocfs2_file_read_iter generic_file_read_iter ocfs2_direct_IO __blockdev_direct_IO do_blockdev_direct_IO do_direct_IO get_more_blocks ocfs2_get_block ocfs2_extent_map_get_blocks ocfs2_get_clusters ocfs2_get_clusters_nocache() ocfs2_search_extent_list return the index of record which contains the v_cluster, that is v_cluster > rec[i]->e_cpos. ocfs2_dio_end_io ocfs2_dio_end_io_write down_write(&oi->ip_alloc_sem); ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ... --> modify the rec[i]->e_cpos, resulting in v_cluster < rec[i]->e_cpos. BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos)) [alex.chen@huawei.com: v3] Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Alex Chen Reviewed-by: Jun Piao Reviewed-by: Joseph Qi Reviewed-by: Gang He Acked-by: Changwei Ge Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Salvatore Bonaccorso Signed-off-by: Greg Kroah-Hartman commit c59a8f13f36b51f2100111121b39c6d15eca124d Author: alex chen Date: Wed Nov 15 17:31:48 2017 -0800 ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent commit 853bc26a7ea39e354b9f8889ae7ad1492ffa28d2 upstream. The subsystem.su_mutex is required while accessing the item->ci_parent, otherwise, NULL pointer dereference to the item->ci_parent will be triggered in the following situation: add node delete node sys_write vfs_write configfs_write_file o2nm_node_store o2nm_node_local_write do_rmdir vfs_rmdir configfs_rmdir mutex_lock(&subsys->su_mutex); unlink_obj item->ci_group = NULL; item->ci_parent = NULL; to_o2nm_cluster_from_node node->nd_item.ci_parent->ci_parent BUG since of NULL pointer dereference to nd_item.ci_parent Moreover, the o2nm_cluster also should be protected by the subsystem.su_mutex. [alex.chen@huawei.com: v2] Link: http://lkml.kernel.org/r/59EEAA69.9080703@huawei.com Link: http://lkml.kernel.org/r/59E9B36A.10700@huawei.com Signed-off-by: Alex Chen Reviewed-by: Jun Piao Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Salvatore Bonaccorso Signed-off-by: Greg Kroah-Hartman commit f5778c2d657e04782553ccc285e396f3062deb73 Author: Chuck Lever Date: Mon Mar 19 14:23:16 2018 -0400 xprtrdma: Fix corner cases when handling device removal commit 25524288631fc5b7d33259fca1e0dc38146be5d6 upstream. Michal Kalderon has found some corner cases around device unload with active NFS mounts that I didn't have the imagination to test when xprtrdma device removal was added last year. - The ULP device removal handler is responsible for deallocating the PD. That wasn't clear to me initially, and my own testing suggested it was not necessary, but that is incorrect. - The transport destruction path can no longer assume that there is a valid ID. - When destroying a transport, ensure that ib_free_cq() is not invoked on a CQ that was already released. Reported-by: Michal Kalderon Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from ...") Signed-off-by: Chuck Lever Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Anna Schumaker Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit 1083a7e8130ca54cef63a5b1d6c86878d4b9e61d Author: Prashanth Prakash Date: Fri Apr 27 11:35:27 2018 -0600 cpufreq / CPPC: Set platform specific transition_delay_us commit d4f3388afd488ed15368fa7413b8bd6d1f98bb1d upstream. Add support to specify platform specific transition_delay_us instead of using the transition delay derived from PCC. With commit 3d41386d556d (cpufreq: CPPC: Use transition_delay_us depending transition_latency) we are setting transition_delay_us directly and not applying the LATENCY_MULTIPLIER. Because of that, on Qualcomm Centriq we can end up with a very high rate of frequency change requests when using the schedutil governor (default rate_limit_us=10 compared to an earlier value of 10000). The PCC subspace describes the rate at which the platform can accept commands on the CPPC's PCC channel. This includes read and write command on the PCC channel that can be used for reasons other than frequency transitions. Moreover the same PCC subspace can be used by multiple freq domains and deriving transition_delay_us from it as we do now can be sub-optimal. Moreover if a platform does not use PCC for desired_perf register then there is no way to compute the transition latency or the delay_us. CPPC does not have a standard defined mechanism to get the transition rate or the latency at the moment. Given the above limitations, it is simpler to have a platform specific transition_delay_us and rely on PCC derived value only if a platform specific value is not available. Signed-off-by: Prashanth Prakash Cc: 4.14+ # 4.14+ Fixes: 3d41386d556d (cpufreq: CPPC: Use transition_delay_us depending transition_latency) Signed-off-by: Rafael J. Wysocki Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit 61a9f6b7fe0ca9706b49a23cecf5f9a9c802b6ce Author: Filipe Manana Date: Wed May 9 16:01:46 2018 +0100 Btrfs: fix duplicate extents after fsync of file with prealloc extents commit 31d11b83b96faaee4bb514d375a09489117c3e8d upstream. In commit 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay"), on fsync, we started to always log all prealloc extents beyond an inode's i_size in order to avoid losing them after a power failure. However under some cases this can lead to the log replay code to create duplicate extent items, with different lengths, in the extent tree. That happens because, as of that commit, we can now log extent items based on extent maps that are not on the "modified" list of extent maps of the inode's extent map tree. Logging extent items based on extent maps is used during the fast fsync path to save time and for this to work reliably it requires that the extent maps are not merged with other adjacent extent maps - having the extent maps in the list of modified extents gives such guarantee. Consider the following example, captured during a long run of fsstress, which illustrates this problem. We have inode 271, in the filesystem tree (root 5), for which all of the following operations and discussion apply to. A buffered write starts at offset 312391 with a length of 933471 bytes (end offset at 1245862). At this point we have, for this inode, the following extent maps with the their field values: em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613, block_len 0, orig_block_len 0 em B, start 40960, orig_start 40960, len 376832, block_start 1106399232, block_len 376832, orig_block_len 376832 em C, start 417792, orig_start 417792, len 782336, block_start 18446744073709551613, block_len 0, orig_block_len 0 em D, start 1200128, orig_start 1200128, len 835584, block_start 1106776064, block_len 835584, orig_block_len 835584 em E, start 2035712, orig_start 2035712, len 245760, block_start 1107611648, block_len 245760, orig_block_len 245760 Extent map A corresponds to a hole and extent maps D and E correspond to preallocated extents. Extent map D ends where extent map E begins (1106776064 + 835584 = 1107611648), but these extent maps were not merged because they are in the inode's list of modified extent maps. An fsync against this inode is made, which triggers the fast path (BTRFS_INODE_NEEDS_FULL_SYNC is not set). This fsync triggers writeback of the data previously written using buffered IO, and when the respective ordered extent finishes, btrfs_drop_extents() is called against the (aligned) range 311296..1249279. This causes a split of extent map D at btrfs_drop_extent_cache(), replacing extent map D with a new extent map D', also added to the list of modified extents, with the following values: em D', start 1249280, orig_start of 1200128, block_start 1106825216 (= 1106776064 + 1249280 - 1200128), orig_block_len 835584, block_len 786432 (835584 - (1249280 - 1200128)) Then, during the fast fsync, btrfs_log_changed_extents() is called and extent maps D' and E are removed from the list of modified extents. The flag EXTENT_FLAG_LOGGING is also set on them. After the extents are logged clear_em_logging() is called on each of them, and that makes extent map E to be merged with extent map D' (try_merge_map()), resulting in D' being deleted and E adjusted to: em E, start 1249280, orig_start 1200128, len 1032192, block_start 1106825216, block_len 1032192, orig_block_len 245760 A direct IO write at offset 1847296 and length of 360448 bytes (end offset at 2207744) starts, and at that moment the following extent maps exist for our inode: em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613, block_len 0, orig_block_len 0 em B, start 40960, orig_start 40960, len 270336, block_start 1106399232, block_len 270336, orig_block_len 376832 em C, start 311296, orig_start 311296, len 937984, block_start 1112842240, block_len 937984, orig_block_len 937984 em E (prealloc), start 1249280, orig_start 1200128, len 1032192, block_start 1106825216, block_len 1032192, orig_block_len 245760 The dio write results in drop_extent_cache() being called twice. The first time for a range that starts at offset 1847296 and ends at offset 2035711 (length of 188416), which results in a double split of extent map E, replacing it with two new extent maps: em F, start 1249280, orig_start 1200128, block_start 1106825216, block_len 598016, orig_block_len 598016 em G, start 2035712, orig_start 1200128, block_start 1107611648, block_len 245760, orig_block_len 1032192 It also creates a new extent map that represents a part of the requested IO (through create_io_em()): em H, start 1847296, len 188416, block_start 1107423232, block_len 188416 The second call to drop_extent_cache() has a range with a start offset of 2035712 and end offset of 2207743 (length of 172032). This leads to replacing extent map G with a new extent map I with the following values: em I, start 2207744, orig_start 1200128, block_start 1107783680, block_len 73728, orig_block_len 1032192 It also creates a new extent map that represents the second part of the requested IO (through create_io_em()): em J, start 2035712, len 172032, block_start 1107611648, block_len 172032 The dio write set the inode's i_size to 2207744 bytes. After the dio write the inode has the following extent maps: em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613, block_len 0, orig_block_len 0 em B, start 40960, orig_start 40960, len 270336, block_start 1106399232, block_len 270336, orig_block_len 376832 em C, start 311296, orig_start 311296, len 937984, block_start 1112842240, block_len 937984, orig_block_len 937984 em F, start 1249280, orig_start 1200128, len 598016, block_start 1106825216, block_len 598016, orig_block_len 598016 em H, start 1847296, orig_start 1200128, len 188416, block_start 1107423232, block_len 188416, orig_block_len 835584 em J, start 2035712, orig_start 2035712, len 172032, block_start 1107611648, block_len 172032, orig_block_len 245760 em I, start 2207744, orig_start 1200128, len 73728, block_start 1107783680, block_len 73728, orig_block_len 1032192 Now do some change to the file, like adding a xattr for example and then fsync it again. This triggers a fast fsync path, and as of commit 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay"), we use the extent map I to log a file extent item because it's a prealloc extent and it starts at an offset matching the inode's i_size. However when we log it, we create a file extent item with a value for the disk byte location that is wrong, as can be seen from the following output of "btrfs inspect-internal dump-tree": item 1 key (271 EXTENT_DATA 2207744) itemoff 3782 itemsize 53 generation 22 type 2 (prealloc) prealloc data disk byte 1106776064 nr 1032192 prealloc data offset 1007616 nr 73728 Here the disk byte value corresponds to calculation based on some fields from the extent map I: 1106776064 = block_start (1107783680) - 1007616 (extent_offset) extent_offset = 2207744 (start) - 1200128 (orig_start) = 1007616 The disk byte value of 1106776064 clashes with disk byte values of the file extent items at offsets 1249280 and 1847296 in the fs tree: item 6 key (271 EXTENT_DATA 1249280) itemoff 3568 itemsize 53 generation 20 type 2 (prealloc) prealloc data disk byte 1106776064 nr 835584 prealloc data offset 49152 nr 598016 item 7 key (271 EXTENT_DATA 1847296) itemoff 3515 itemsize 53 generation 20 type 1 (regular) extent data disk byte 1106776064 nr 835584 extent data offset 647168 nr 188416 ram 835584 extent compression 0 (none) item 8 key (271 EXTENT_DATA 2035712) itemoff 3462 itemsize 53 generation 20 type 1 (regular) extent data disk byte 1107611648 nr 245760 extent data offset 0 nr 172032 ram 245760 extent compression 0 (none) item 9 key (271 EXTENT_DATA 2207744) itemoff 3409 itemsize 53 generation 20 type 2 (prealloc) prealloc data disk byte 1107611648 nr 245760 prealloc data offset 172032 nr 73728 Instead of the disk byte value of 1106776064, the value of 1107611648 should have been logged. Also the data offset value should have been 172032 and not 1007616. After a log replay we end up getting two extent items in the extent tree with different lengths, one of 835584, which is correct and existed before the log replay, and another one of 1032192 which is wrong and is based on the logged file extent item: item 12 key (1106776064 EXTENT_ITEM 835584) itemoff 3406 itemsize 53 refs 2 gen 15 flags DATA extent data backref root 5 objectid 271 offset 1200128 count 2 item 13 key (1106776064 EXTENT_ITEM 1032192) itemoff 3353 itemsize 53 refs 1 gen 22 flags DATA extent data backref root 5 objectid 271 offset 1200128 count 1 Obviously this leads to many problems and a filesystem check reports many errors: (...) checking extents Extent back ref already exists for 1106776064 parent 0 root 5 owner 271 offset 1200128 num_refs 1 extent item 1106776064 has multiple extent items ref mismatch on [1106776064 835584] extent item 2, found 3 Incorrect local backref count on 1106776064 root 5 owner 271 offset 1200128 found 2 wanted 1 back 0x55b1d0ad7680 Backref 1106776064 root 5 owner 271 offset 1200128 num_refs 0 not found in extent tree Incorrect local backref count on 1106776064 root 5 owner 271 offset 1200128 found 1 wanted 0 back 0x55b1d0ad4e70 Backref bytes do not match extent backref, bytenr=1106776064, ref bytes=835584, backref bytes=1032192 backpointer mismatch on [1106776064 835584] checking free space cache block group 1103101952 has wrong amount of free space failed to load free space cache for block group 1103101952 checking fs roots (...) So fix this by logging the prealloc extents beyond the inode's i_size based on searches in the subvolume tree instead of the extent maps. Fixes: 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit edefb935700c130ff52a0a0da7edab7a34d7572c Author: Nick Desaulniers Date: Thu Jun 21 09:23:24 2018 -0700 x86/paravirt: Make native_save_fl() extern inline commit d0a8d9378d16eb3c69bd8e6d23779fbdbee3a8c7 upstream. native_save_fl() is marked static inline, but by using it as a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined. paravirt's use of native_save_fl() also requires that no GPRs other than %rax are clobbered. Compilers have different heuristics which they use to emit stack guard code, the emittance of which can break paravirt's callee saved assumption by clobbering %rcx. Marking a function definition extern inline means that if this version cannot be inlined, then the out-of-line version will be preferred. By having the out-of-line version be implemented in assembly, it cannot be instrumented with a stack protector, which might violate custom calling conventions that code like paravirt rely on. The semantics of extern inline has changed since gnu89. This means that folks using GCC versions >= 5.1 may see symbol redefinition errors at link time for subdirs that override KBUILD_CFLAGS (making the C standard used implicit) regardless of this patch. This has been cleaned up earlier in the patch set, but is left as a note in the commit message for future travelers. Reports: https://lkml.org/lkml/2018/5/7/534 https://github.com/ClangBuiltLinux/linux/issues/16 Discussion: https://bugs.llvm.org/show_bug.cgi?id=37512 https://lkml.org/lkml/2018/5/24/1371 Thanks to the many folks that participated in the discussion. Debugged-by: Alistair Strachan Debugged-by: Matthias Kaehlcke Suggested-by: Arnd Bergmann Suggested-by: H. Peter Anvin Suggested-by: Tom Stellar Reported-by: Sedat Dilek Tested-by: Sedat Dilek Signed-off-by: Nick Desaulniers Acked-by: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: joe@perches.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: thomas.lendacky@amd.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-4-ndesaulniers@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 92e50158fc0ace8e7a0c53845588cdc246ce5ca1 Author: H. Peter Anvin Date: Thu Jun 21 09:23:23 2018 -0700 x86/asm: Add _ASM_ARG* constants for argument registers to commit 0e2e160033283e20f688d8bad5b89460cc5bfcc4 upstream. i386 and x86-64 uses different registers for arguments; make them available so we don't have to #ifdef in the actual code. Native size and specified size (q, l, w, b) versions are provided. Signed-off-by: H. Peter Anvin Signed-off-by: Nick Desaulniers Reviewed-by: Sedat Dilek Acked-by: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: joe@perches.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: thomas.lendacky@amd.com Cc: tstellar@redhat.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-3-ndesaulniers@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 779145a6f6ec8976846aeefd60171627e3d40328 Author: Nick Desaulniers Date: Thu Jun 21 09:23:22 2018 -0700 compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations commit d03db2bc26f0e4a6849ad649a09c9c73fccdc656 upstream. Functions marked extern inline do not emit an externally visible function when the gnu89 C standard is used. Some KBUILD Makefiles overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as without an explicit C standard specified, the default is gnu11. Since c99, the semantics of extern inline have changed such that an externally visible function is always emitted. This can lead to multiple definition errors of extern inline functions at link time of compilation units whose build files have removed an explicit C standard compiler flag for users of GCC 5.1+ or Clang. Suggested-by: Arnd Bergmann Suggested-by: H. Peter Anvin Suggested-by: Joe Perches Signed-off-by: Nick Desaulniers Acked-by: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: sedat.dilek@gmail.com Cc: thomas.lendacky@amd.com Cc: tstellar@redhat.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-2-ndesaulniers@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman