commit 9f86f302ec0e37e84617481c587e11c47a397e3f Author: Greg Kroah-Hartman Date: Wed Jul 5 14:40:44 2017 +0200 Linux 4.9.36 commit a29fd27ca26832fe03341a7fec75ea3b4b86fb51 Author: Wanpeng Li Date: Mon Jun 5 05:19:09 2017 -0700 KVM: nVMX: Fix exception injection commit d4912215d1031e4fb3d1038d2e1857218dba0d0a upstream. WARNING: CPU: 3 PID: 2840 at arch/x86/kvm/vmx.c:10966 nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel] CPU: 3 PID: 2840 Comm: qemu-system-x86 Tainted: G OE 4.12.0-rc3+ #23 RIP: 0010:nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel] Call Trace: ? kvm_check_async_pf_completion+0xef/0x120 [kvm] ? rcu_read_lock_sched_held+0x79/0x80 vmx_queue_exception+0x104/0x160 [kvm_intel] ? vmx_queue_exception+0x104/0x160 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x1171/0x1ce0 [kvm] ? kvm_arch_vcpu_load+0x47/0x240 [kvm] ? kvm_arch_vcpu_load+0x62/0x240 [kvm] kvm_vcpu_ioctl+0x384/0x7b0 [kvm] ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm] ? __fget+0xf3/0x210 do_vfs_ioctl+0xa4/0x700 ? __fget+0x114/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x81/0x220 entry_SYSCALL64_slow_path+0x25/0x25 This is triggered occasionally by running both win7 and win2016 in L2, in addition, EPT is disabled on both L1 and L2. It can't be reproduced easily. Commit 0b6ac343fc (KVM: nVMX: Correct handling of exception injection) mentioned that "KVM wants to inject page-faults which it got to the guest. This function assumes it is called with the exit reason in vmcs02 being a #PF exception". Commit e011c663 (KVM: nVMX: Check all exceptions for intercept during delivery to L2) allows to check all exceptions for intercept during delivery to L2. However, there is no guarantee the exit reason is exception currently, when there is an external interrupt occurred on host, maybe a time interrupt for host which should not be injected to guest, and somewhere queues an exception, then the function nested_vmx_check_exception() will be called and the vmexit emulation codes will try to emulate the "Acknowledge interrupt on exit" behavior, the warning is triggered. Reusing the exit reason from the L2->L0 vmexit is wrong in this case, the reason must always be EXCEPTION_NMI when injecting an exception into L1 as a nested vmexit. Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li Fixes: e011c663b9c7 ("KVM: nVMX: Check all exceptions for intercept during delivery to L2") Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman commit d1d3756f07da10505699d1d3a1227b5201da3ab8 Author: Radim Krčmář Date: Thu May 18 19:37:30 2017 +0200 KVM: x86: zero base3 of unusable segments commit f0367ee1d64d27fa08be2407df5c125442e885e3 upstream. Static checker noticed that base3 could be used uninitialized if the segment was not present (useable). Random stack values probably would not pass VMCS entry checks. Reported-by: Dan Carpenter Fixes: 1aa366163b8b ("KVM: x86 emulator: consolidate segment accessors") Reviewed-by: Paolo Bonzini Reviewed-by: David Hildenbrand Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman commit f3c3ec96e5fb40b453693421577d446b5b22fc52 Author: Radim Krčmář Date: Thu May 18 19:37:31 2017 +0200 KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh() commit 34b0dadbdf698f9b277a31b2747b625b9a75ea1f upstream. Static analysis noticed that pmu->nr_arch_gp_counters can be 32 (INTEL_PMC_MAX_GENERIC) and therefore cannot be used to shift 'int'. I didn't add BUILD_BUG_ON for it as we have a better checker. Reported-by: Dan Carpenter Fixes: 25462f7f5295 ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch") Reviewed-by: Paolo Bonzini Reviewed-by: David Hildenbrand Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman commit 1eeb7942633225baad2f8465dd93a4fb72b4ec7f Author: Ladi Prosek Date: Tue Apr 25 16:42:44 2017 +0200 KVM: x86: fix emulation of RSM and IRET instructions commit 6ed071f051e12cf7baa1b69d3becb8f232fdfb7b upstream. On AMD, the effect of set_nmi_mask called by emulate_iret_real and em_rsm on hflags is reverted later on in x86_emulate_instruction where hflags are overwritten with ctxt->emul_flags (the kvm_set_hflags call). This manifests as a hang when rebooting Windows VMs with QEMU, OVMF, and >1 vcpu. Instead of trying to merge ctxt->emul_flags into vcpu->arch.hflags after an instruction is emulated, this commit deletes emul_flags altogether and makes the emulator access vcpu->arch.hflags using two new accessors. This way all changes, on the emulator side as well as in functions called from the emulator and accessing vcpu state with emul_to_vcpu, are preserved. More details on the bug and its manifestation with Windows and OVMF: It's a KVM bug in the interaction between SMI/SMM and NMI, specific to AMD. I believe that the SMM part explains why we started seeing this only with OVMF. KVM masks and unmasks NMI when entering and leaving SMM. When KVM emulates the RSM instruction in em_rsm, the set_nmi_mask call doesn't stick because later on in x86_emulate_instruction we overwrite arch.hflags with ctxt->emul_flags, effectively reverting the effect of the set_nmi_mask call. The AMD-specific hflag of interest here is HF_NMI_MASK. When rebooting the system, Windows sends an NMI IPI to all but the current cpu to shut them down. Only after all of them are parked in HLT will the initiating cpu finish the restart. If NMI is masked, other cpus never get the memo and the initiating cpu spins forever, waiting for hal!HalpInterruptProcessorsStarted to drop. That's the symptom we observe. Fixes: a584539b24b8 ("KVM: x86: pass the whole hflags field to emulator and back") Signed-off-by: Ladi Prosek Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 982d8d92f25613e88f3a34a8a57da484f68d4c1d Author: Mark Salter Date: Fri Mar 24 09:53:56 2017 -0400 arm64: fix NULL dereference in have_cpu_die() commit 335d2c2d192266358c5dfa64953a4c162f46e464 upstream. Commit 5c492c3f5255 ("arm64: smp: Add function to determine if cpus are stuck in the kernel") added a helper function to determine if die() is supported in cpu_ops. This function assumes a cpu will have a valid cpu_ops entry, but that may not be the case for cpu0 is spin-table or parking protocol is used to boot secondary cpus. In that case, there is a NULL dereference if have_cpu_die() is called by cpu0. So add a check for a valid cpu_ops before dereferencing it. Fixes: 5c492c3f5255 ("arm64: smp: Add function to determine if cpus are stuck in the kernel") Signed-off-by: Mark Salter Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit a4bfcab30928b1ef1a19b379f8d08efe10853a42 Author: Kamal Dasu Date: Fri Mar 3 16:16:53 2017 -0500 mtd: nand: brcmnand: Check flash #WP pin status before nand erase/program commit 9d2ee0a60b8bd9bef2a0082c533736d6a7b39873 upstream. On brcmnand controller v6.x and v7.x, the #WP pin is controlled through the NAND_WP bit in CS_SELECT register. The driver currently assumes that toggling the #WP pin is instantaneously enabling/disabling write-protection, but it actually takes some time to propagate the new state to the internal NAND chip logic. This behavior is sometime causing data corruptions when an erase/program operation is executed before write-protection has really been disabled. Fixes: 27c5b17cd1b1 ("mtd: nand: add NAND driver "library" for Broadcom STB NAND controller") Signed-off-by: Kamal Dasu Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman commit de5862335ed7c465b0900774fbd869bf91a23c58 Author: Jaedon Shin Date: Fri Mar 3 10:55:03 2017 +0900 i2c: brcmstb: Fix START and STOP conditions commit 2de3ec4f1d4ba6ee380478055104eb918bd50cce upstream. The BSC data buffers to send and receive data are each of size 32 bytes or 8 bytes 'xfersz' depending on SoC. The problem observed for all the combined message transfer was if length of data transfer was a multiple of 'xfersz' a repeated START was being transmitted by BSC driver. Fixed this by appropriately setting START/STOP conditions for such transfers. Fixes: dd1aa2524bc5 ("i2c: brcmstb: Add Broadcom settop SoC i2c controller driver") Signed-off-by: Jaedon Shin Acked-by: Kamal Dasu Signed-off-by: Wolfram Sang Signed-off-by: Greg Kroah-Hartman commit 8ee785016d5a05afa9ddd872ae7befa11798bfbf Author: Rafał Miłecki Date: Wed Jan 4 12:09:41 2017 +0100 brcmfmac: avoid writing channel out of allocated array commit 77c0d0cd10e793989d1e8b835a9a09694182cb39 upstream. Our code was assigning number of channels to the index variable by default. If firmware reported channel we didn't predict this would result in using that initial index value and writing out of array. This never happened so far (we got a complete list of supported channels) but it means possible memory corruption so we should handle it anyway. This patch simply detects unexpected channel and ignores it. As we don't try to create new entry now, it's also safe to drop hw_value and center_freq assignment. For known channels we have these set anyway. I decided to fix this issue by assigning NULL or a target channel to the channel variable. This was one of possible ways, I prefefred this one as it also avoids using channel[index] over and over. Fixes: 58de92d2f95e ("brcmfmac: use static superset of channels for wiphy bands") Signed-off-by: Rafał Miłecki Acked-by: Arend van Spriel Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman commit 65fc82cea84f38ce918553b557f3a24c8d8c9649 Author: Arnd Bergmann Date: Fri Mar 24 23:02:48 2017 +0100 infiniband: hns: avoid gcc-7.0.1 warning for uninitialized data commit 5b0ff9a00755d4d9c209033a77f1ed8f3186fe5c upstream. hns_roce_v1_cq_set_ci() calls roce_set_bit() on an uninitialized field, which will then change only a few of its bits, causing a warning with the latest gcc: infiniband/hw/hns/hns_roce_hw_v1.c: In function 'hns_roce_v1_cq_set_ci': infiniband/hw/hns/hns_roce_hw_v1.c:1854:23: error: 'doorbell[1]' is used uninitialized in this function [-Werror=uninitialized] roce_set_bit(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_HW_SYNS_S, 1); The code is actually correct since we always set all bits of the port_vlan field, but gcc correctly points out that the first access does contain uninitialized data. This initializes the field to zero first before setting the individual bits. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Arnd Bergmann Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit 3e51ccbadd15aa4a0e0a64535ec0566749361938 Author: Josh Poimboeuf Date: Thu Mar 2 16:57:23 2017 -0600 objtool: Fix another GCC jump table detection issue commit 5c51f4ae84df0f9df33ac08aa5be50061a8b4242 upstream. Arnd Bergmann reported a (false positive) objtool warning: drivers/infiniband/sw/rxe/rxe_resp.o: warning: objtool: rxe_responder()+0xfe: sibling call from callable instruction with changed frame pointer The issue is in find_switch_table(). It tries to find a switch statement's jump table by walking backwards from an indirect jump instruction, looking for a relocation to the .rodata section. In this case it stopped walking prematurely: the first .rodata relocation it encountered was for a variable (resp_state_name) instead of a jump table, so it just assumed there wasn't a jump table. The fix is to ignore any .rodata relocation which refers to an ELF object symbol. This works because the jump tables are anonymous and have no symbols associated with them. Reported-by: Arnd Bergmann Tested-by: Arnd Bergmann Signed-off-by: Josh Poimboeuf Cc: Denys Vlasenko Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: 3732710ff6f2 ("objtool: Improve rare switch jump table pattern detection") Link: http://lkml.kernel.org/r/20170302225723.3ndbsnl4hkqbne7a@treble Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 92e66676523a9f921dfaa383e37d3a4e2edf15df Author: Sudeep Holla Date: Fri Jan 6 12:34:30 2017 +0000 clk: scpi: don't add cpufreq device if the scpi dvfs node is disabled commit 67bcc2c5f1da8c5bb58e72354274ea5c59a3950a upstream. Currently we add the virtual cpufreq device unconditionally even when the SCPI DVFS clock provider node is disabled. This will cause cpufreq driver to throw errors when it gets initailised on boot/modprobe and also when the CPUs are hot-plugged back in. This patch fixes the issue by adding the virtual cpufreq device only if the SCPI DVFS clock provider is available and registered. Fixes: 9490f01e2471 ("clk: scpi: add support for cpufreq virtual device") Reported-by: Michał Zegan Cc: Neil Armstrong Signed-off-by: Sudeep Holla Tested-by: Michał Zegan Signed-off-by: Stephen Boyd Signed-off-by: Greg Kroah-Hartman commit 8a6f400a374c2366ae2e0a3e528a2c9791b1dcd1 Author: Dan Carpenter Date: Tue Feb 7 16:19:06 2017 +0300 cpufreq: s3c2416: double free on driver init error path commit a69261e4470d680185a15f748d9cdafb37c57a33 upstream. The "goto err_armclk;" error path already does a clk_put(s3c_freq->hclk); so this is a double free. Fixes: 34ee55075265 ([CPUFREQ] Add S3C2416/S3C2450 cpufreq driver) Signed-off-by: Dan Carpenter Reviewed-by: Krzysztof Kozlowski Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit 1781a29b31faee2cae9e7f353d8ab99ceb619c15 Author: Suravee Suthikulpanit Date: Mon Jun 26 04:28:04 2017 -0500 iommu/amd: Fix interrupt remapping when disable guest_mode commit 84a21dbdef0b96d773599c33c2afbb002198d303 upstream. Pass-through devices to VM guest can get updated IRQ affinity information via irq_set_affinity() when not running in guest mode. Currently, AMD IOMMU driver in GA mode ignores the updated information if the pass-through device is setup to use vAPIC regardless of guest_mode. This could cause invalid interrupt remapping. Also, the guest_mode bit should be set and cleared only when SVM updates posted-interrupt interrupt remapping information. Signed-off-by: Suravee Suthikulpanit Cc: Joerg Roedel Fixes: d98de49a53e48 ('iommu/amd: Enable vAPIC interrupt remapping mode by default') Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit 0e55856b8f2918f3a6b8caf3c72867ee88f816dd Author: Pan Bian Date: Sun Apr 23 18:23:21 2017 +0800 iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid() commit 73dbd4a4230216b6a5540a362edceae0c9b4876b upstream. In function amd_iommu_bind_pasid(), the control flow jumps to label out_free when pasid_state->mm and mm is NULL. And mmput(mm) is called. In function mmput(mm), mm is referenced without validation. This will result in a NULL dereference bug. This patch fixes the bug. Signed-off-by: Pan Bian Fixes: f0aac63b873b ('iommu/amd: Don't hold a reference to mm_struct') Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit f0c31c674abdf563d2ad5d9ecfcad8d237f939f0 Author: Robin Murphy Date: Thu Mar 16 17:00:17 2017 +0000 iommu/dma: Don't reserve PCI I/O windows commit 938f1bbe35e3a7cb07e1fa7c512e2ef8bb866bdf upstream. Even if a host controller's CPU-side MMIO windows into PCI I/O space do happen to leak into PCI memory space such that it might treat them as peer addresses, trying to reserve the corresponding I/O space addresses doesn't do anything to help solve that problem. Stop doing a silly thing. Fixes: fade1ec055dc ("iommu/dma: Avoid PCI host bridge windows") Reviewed-by: Eric Auger Signed-off-by: Robin Murphy Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit d7fcb303d1ee4416a6e4772735cfacc36e86bff7 Author: Robin Murphy Date: Mon Jan 16 12:58:07 2017 +0000 iommu: Handle default domain attach failure commit 797a8b4d768c58caac58ee3e8cb36a164d1b7751 upstream. We wouldn't normally expect ops->attach_dev() to fail, but on IOMMUs with limited hardware resources, or generally misconfigured systems, it is certainly possible. We report failure correctly from the external iommu_attach_device() interface, but do not do so in iommu_group_add() when attaching to the default domain. The result of failure there is that the device, group and domain all get left in a broken, part-configured state which leads to weird errors and misbehaviour down the line when IOMMU API calls sort-of-but-don't-quite work. Check the return value of __iommu_attach_device() on the default domain, and refactor the error handling paths to cope with its failure and clean up correctly in such cases. Fixes: e39cb8a3aa98 ("iommu: Make sure a device is always attached to a domain") Reported-by: Punit Agrawal Signed-off-by: Robin Murphy Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit c19bfc6765d44847a3880333474e2c992d63802f Author: David Dillow Date: Mon Jan 30 19:11:11 2017 -0800 iommu/vt-d: Don't over-free page table directories commit f7116e115acdd74bc75a4daf6492b11d43505125 upstream. dma_pte_free_level() recurses down the IOMMU page tables and frees directory pages that are entirely contained in the given PFN range. Unfortunately, it incorrectly calculates the starting address covered by the PTE under consideration, which can lead to it clearing an entry that is still in use. This occurs if we have a scatterlist with an entry that has a length greater than 1026 MB and is aligned to 2 MB for both the IOMMU and physical addresses. For example, if __domain_mapping() is asked to map a two-entry scatterlist with 2 MB and 1028 MB segments to PFN 0xffff80000, it will ask if dma_pte_free_pagetable() is asked to PFNs from 0xffff80200 to 0xffffc05ff, it will also incorrectly clear the PFNs from 0xffff80000 to 0xffff801ff because of this issue. The current code will set level_pfn to 0xffff80200, and 0xffff80200-0xffffc01ff fits inside the range being cleared. Properly setting the level_pfn for the current level under consideration catches that this PTE is outside of the range being cleared. This patch also changes the value passed into dma_pte_free_level() when it recurses. This only affects the first PTE of the range being cleared, and is handled by the existing code that ensures we start our cursor no lower than start_pfn. This was found when using dma_map_sg() to map large chunks of contiguous memory, which immediatedly led to faults on the first access of the erroneously-deleted mappings. Fixes: 3269ee0bd668 ("intel-iommu: Fix leaks in pagetable freeing") Reviewed-by: Benjamin Serebrin Signed-off-by: David Dillow Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit d5c5e8ba5d9d7b3378cf08274c86c8a340110b05 Author: Junxiao Bi Date: Wed May 3 14:51:41 2017 -0700 ocfs2: o2hb: revert hb threshold to keep compatible commit 33496c3c3d7b88dcbe5e55aa01288b05646c6aca upstream. Configfs is the interface for ocfs2-tools to set configure to kernel and $configfs_dir/cluster/$clustername/heartbeat/dead_threshold is the one used to configure heartbeat dead threshold. Kernel has a default value of it but user can set O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb to override it. Commit 45b997737a80 ("ocfs2/cluster: use per-attribute show and store methods") changed heartbeat dead threshold name while ocfs2-tools did not, so ocfs2-tools won't set this configurable and the default value is always used. So revert it. Fixes: 45b997737a80 ("ocfs2/cluster: use per-attribute show and store methods") Link: http://lkml.kernel.org/r/1490665245-15374-1-git-send-email-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi Acked-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 8af88a950b4207f589b210657edc7c94b86b48e8 Author: Andy Lutomirski Date: Sat Apr 22 00:01:22 2017 -0700 x86/mm: Fix flush_tlb_page() on Xen commit dbd68d8e84c606673ebbcf15862f8c155fa92326 upstream. flush_tlb_page() passes a bogus range to flush_tlb_others() and expects the latter to fix it up. native_flush_tlb_others() has the fixup but Xen's version doesn't. Move the fixup to flush_tlb_others(). AFAICS the only real effect is that, without this fix, Xen would flush everything instead of just the one page on remote vCPUs in when flush_tlb_page() was called. Signed-off-by: Andy Lutomirski Reviewed-by: Boris Ostrovsky Cc: Andrew Morton Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: Michal Hocko Cc: Nadav Amit Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Fixes: e7b52ffd45a6 ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range") Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 3667dafd6c04b46a827398b62fa97b9cf73d32f5 Author: Joerg Roedel Date: Thu Apr 6 16:19:22 2017 +0200 x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space commit 5ed386ec09a5d75bcf073967e55e895c2607a5c3 upstream. When this function fails it just sends a SIGSEGV signal to user-space using force_sig(). This signal is missing essential information about the cause, e.g. the trap_nr or an error code. Fix this by propagating the error to the only caller of mpx_handle_bd_fault(), do_bounds(), which sends the correct SIGSEGV signal to the process. Signed-off-by: Joerg Roedel Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: fe3d197f84319 ('x86, mpx: On-demand kernel allocation of bounds tables') Link: http://lkml.kernel.org/r/1491488362-27198-1-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit b287ade87c9192b4ae6fe525eaa66fd25455bfb1 Author: Baoquan He Date: Tue Jun 27 20:39:06 2017 +0800 x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug commit 8eabf42ae5237e6b699aeac687b5b629e3537c8d upstream. Kernel text KASLR is separated into physical address and virtual address randomization. And for virtual address randomization, we only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE. So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR, but not the original kernel loading address 'output'. The bug will cause kernel boot failure if kernel is loaded at a different position than the address, 16M, which is decided at compiled time. Kexec/kdump is such practical case. To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial value. Tested-by: Dave Young Signed-off-by: Baoquan He Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately") Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 15541e64163c0c5a2d2e3e8d1b73057888170f62 Author: Arnaldo Carvalho de Melo Date: Mon Apr 24 11:58:54 2017 -0300 tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel commit e883d09c9eb2ffddfd057c17e6a0cef446ec8c9b upstream. Just a minor fix done in: Fixes: 26a37ab319a2 ("x86/mce: Fix copy/paste error in exception table entries") Cc: Tony Luck Link: http://lkml.kernel.org/n/tip-ni9jzdd5yxlail6pq8cuexw2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit a2c222bef08f1ada42f85f12114f482a0682ea56 Author: Doug Berger Date: Thu Jun 29 18:41:36 2017 +0100 ARM: 8685/1: ensure memblock-limit is pmd-aligned commit 9e25ebfe56ece7541cd10a20d715cbdd148a2e06 upstream. The pmd containing memblock_limit is cleared by prepare_page_table() which creates the opportunity for early_alloc() to allocate unmapped memory if memblock_limit is not pmd aligned causing a boot-time hang. Commit 965278dcb8ab ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM") attempted to resolve this problem, but there is a path through the adjust_lowmem_bounds() routine where if all memory regions start and end on pmd-aligned addresses the memblock_limit will be set to arm_lowmem_limit. Since arm_lowmem_limit can be affected by the vmalloc early parameter, the value of arm_lowmem_limit may not be pmd-aligned. This commit corrects this oversight such that memblock_limit is always rounded down to pmd-alignment. Fixes: 965278dcb8ab ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM") Signed-off-by: Doug Berger Suggested-by: Mark Rutland Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit 7661b19687b2399783de2c00cf88981c93bc8383 Author: Lorenzo Pieralisi Date: Fri May 26 17:40:02 2017 +0100 ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation commit cb7cf772d83d2d4e6995c5bb9e0fb59aea8f7080 upstream. The BAD_MADT_GICC_ENTRY() macro checks if a GICC MADT entry passes muster from an ACPI specification standpoint. Current macro detects the MADT GICC entry length through ACPI firmware version (it changed from 76 to 80 bytes in the transition from ACPI 5.1 to ACPI 6.0 specification) but always uses (erroneously) the ACPICA (latest) struct (ie struct acpi_madt_generic_interrupt - that is 80-bytes long) length to check if the current GICC entry memory record exceeds the MADT table end in memory as defined by the MADT table header itself, which may result in false negatives depending on the ACPI firmware version and how the MADT entries are laid out in memory (ie on ACPI 5.1 firmware MADT GICC entries are 76 bytes long, so by adding 80 to a GICC entry start address in memory the resulting address may well be past the actual MADT end, triggering a false negative). Fix the BAD_MADT_GICC_ENTRY() macro by reshuffling the condition checks and update them to always use the firmware version specific MADT GICC entry length in order to carry out boundary checks. Fixes: b6cfb277378e ("ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro") Reported-by: Julien Grall Acked-by: Will Deacon Acked-by: Marc Zyngier Signed-off-by: Lorenzo Pieralisi Cc: Julien Grall Cc: Hanjun Guo Cc: Al Stone Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman commit 4efe34b500a740016e5eabb8114ceeb395af771e Author: Adam Ford Date: Mon Mar 6 12:56:55 2017 -0600 ARM: dts: OMAP3: Fix MFG ID EEPROM commit 06e1a5cc570703796ff1bd3a712e8e3b15c6bb0d upstream. The manufacturing information is stored in the EEPROM. This chip is an AT24C64 not not (nor has it ever been) 24C02. This patch will correctly address the EEPROM to read the entire contents and not just 256 bytes (of 0xff). Fixes: 5e3447a29a38 ("ARM: dts: LogicPD Torpedo: Add AT24 EEPROM Support") Signed-off-by: Adam Ford Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit 07bb2c7e7ea369f03a8893e445639324726680a5 Author: Dave Gerlach Date: Thu Mar 30 14:58:18 2017 -0500 ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer commit 04abaf07f6d5cdf22b7a478a86e706dfeeeef960 upstream. Starting from commit 5de85b9d57ab ("PM / runtime: Re-init runtime PM states at probe error and driver unbind") pm_runtime core now changes device runtime_status back to after RPM_SUSPENDED after a probe defer. Certain OMAP devices make use of "ti,no-idle-on-init" flag which causes omap_device_enable to be called during the BUS_NOTIFY_ADD_DEVICE event during probe, along with pm_runtime_set_active. This call to pm_runtime_set_active typically will prevent a call to pm_runtime_get in a driver probe function from re-enabling the omap_device. However, in the case of a probe defer that happens before the driver probe function is able to run, such as a missing pinctrl states defer, pm_runtime_reinit will set the device as RPM_SUSPENDED and then once driver probe is actually able to run, pm_runtime_get will see the device as suspended and call through to the omap_device layer, attempting to enable the already enabled omap_device and causing errors like this: omap-gpmc 50000000.gpmc: omap_device: omap_device_enable() called from invalid state 1 omap-gpmc 50000000.gpmc: use pm_runtime_put_sync_suspend() in driver? We can avoid this error by making sure the pm_runtime status of a device matches the omap_device state before a probe attempt. By extending the omap_device bus notifier to act on the BUS_NOTIFY_BIND_DRIVER event we can check if a device is enabled in omap_device but with a pm_runtime status of RPM_SUSPENDED and once again mark the device as RPM_ACTIVE to avoid a second incorrect call to omap_device_enable. Fixes: 5de85b9d57ab ("PM / runtime: Re-init runtime PM states at probe error and driver unbind") Tested-by: Franklin S Cooper Jr. Signed-off-by: Dave Gerlach Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit e57aa416ca4ce2af2570f3b776d738c04d9a8e3e Author: Andrew F. Davis Date: Fri Feb 10 11:55:47 2017 -0600 regulator: tps65086: Fix DT node referencing in of_parse_cb commit 6308f1787fb85bc98b7241a08a9f7f33b47f8b61 upstream. When we check for additional DT properties in the current node we use the device_node passed in with the configuration data, this will not point to the correct DT node, use the one passed in for this purpose. Fixes: d2a2e729a666 ("regulator: tps65086: Add regulator driver for the TPS65086 PMIC") Reported-by: Steven Kipisz Signed-off-by: Andrew F. Davis Tested-by: Steven Kipisz Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit 88baad2e715967f237396bea47c496830d82a9c2 Author: Andrew F. Davis Date: Fri Feb 10 11:55:46 2017 -0600 regulator: tps65086: Fix expected switch DT node names commit 1c47f7c316de38c30b481e1886cc6352c9efdcc1 upstream. The three load switches are called SWA1, SWB1, and SWB2. The node names describing properties for these are expected to be the same, but due to a typo they are not. Fix this here. Fixes: d2a2e729a666 ("regulator: tps65086: Add regulator driver for the TPS65086 PMIC") Reported-by: Steven Kipisz Signed-off-by: Andrew F. Davis Tested-by: Steven Kipisz Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit 9846c67974d6af64f665707bb4f68ae458684faa Author: Johan Hovold Date: Mon Jan 30 17:47:05 2017 +0100 spi: fix device-node leaks commit 8324147f38019865b29d03baf28412d2ec0bd828 upstream. Make sure to release the device-node reference taken in of_register_spi_device() on errors and when deregistering the device. Fixes: 284b01897340 ("spi: Add OF binding support for SPI busses") Signed-off-by: Johan Hovold Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit c52829f60f5f6e228a70162717df199e874898a8 Author: Daniel Kurtz Date: Fri Jan 27 00:21:53 2017 +0800 spi: When no dma_chan map buffers with spi_master's parent commit 88b0aa544af58ce3be125a1845a227264ec9ab89 upstream. Back before commit 1dccb598df54 ("arm64: simplify dma_get_ops"), for arm64, devices for which dma_ops were not explicitly set were automatically configured to use swiotlb_dma_ops, since this was hard-coded as the global "dma_ops" in arm64_dma_init(). Now that global "dma_ops" has been removed, all devices much have their dma_ops explicitly set by a call to arch_setup_dma_ops(), otherwise the device is assigned dummy_dma_ops, and thus calls to map_sg for such a device will fail (return 0). Mediatek SPI uses DMA but does not use a dma channel. Support for this was added by commit c37f45b5f1cd ("spi: support spi without dma channel to use can_dma()"), which uses the master_spi dev to DMA map buffers. The master_spi device is not a platform device, rather it is created in spi_alloc_device(), and therefore its dma_ops are never set. Therefore, when the mediatek SPI driver when it does DMA (for large SPI transactions > 32 bytes), SPI will use spi_map_buf()->dma_map_sg() to map the buffer for use in DMA. But dma_map_sg()->dma_map_sg_attrs() returns 0, because ops->map_sg is dummy_dma_ops->__dummy_map_sg, and hence spi_map_buf() returns -ENOMEM (-12). Fix this by using the real spi_master's parent device which should be a real physical device with DMA properties. Signed-off-by: Daniel Kurtz Fixes: c37f45b5f1cd ("spi: support spi without dma channel to use can_dma()") Cc: Leilk Liu Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit 478273e11521915b7a0fd977b4d43587997ec7b2 Author: Matt Fleming Date: Fri Feb 17 12:07:30 2017 +0000 sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting commit 6e5f32f7a43f45ee55c401c0b9585eb01f9629a8 upstream. If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to the pending sample window time on exit, setting the next update not one window into the future, but two. This situation on exiting NO_HZ is described by: this_rq->calc_load_update < jiffies < calc_load_update In this scenario, what we should be doing is: this_rq->calc_load_update = calc_load_update [ next window ] But what we actually do is: this_rq->calc_load_update = calc_load_update + LOAD_FREQ [ next+1 window ] This has the effect of delaying load average updates for potentially up to ~9seconds. This can result in huge spikes in the load average values due to per-cpu uninterruptible task counts being out of sync when accumulated across all CPUs. It's safe to update the per-cpu active count if we wake between sample windows because any load that we left in 'calc_load_idle' will have been zero'd when the idle load was folded in calc_global_load(). This issue is easy to reproduce before, commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") just by forking short-lived process pipelines built from ps(1) and grep(1) in a loop. I'm unable to reproduce the spikes after that commit, but the bug still seems to be present from code review. Signed-off-by: Matt Fleming Signed-off-by: Peter Zijlstra (Intel) Cc: Frederic Weisbecker Cc: Linus Torvalds Cc: Mike Galbraith Cc: Mike Galbraith Cc: Morten Rasmussen Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vincent Guittot Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again") Link: http://lkml.kernel.org/r/20170217120731.11868-2-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit eea0261db8efda7a5b3732c0d9a76e9b06bf040d Author: Eric Anholt Date: Thu Apr 27 18:02:32 2017 -0700 watchdog: bcm281xx: Fix use of uninitialized spinlock. commit fedf266f9955d9a019643cde199a2fd9a0259f6f upstream. The bcm_kona_wdt_set_resolution_reg() call takes the spinlock, so initialize it earlier. Fixes a warning at boot with lock debugging enabled. Fixes: 6adb730dc208 ("watchdog: bcm281xx: Watchdog Driver") Signed-off-by: Eric Anholt Reviewed-by: Florian Fainelli Reviewed-by: Guenter Roeck Signed-off-by: Guenter Roeck Signed-off-by: Wim Van Sebroeck Signed-off-by: Greg Kroah-Hartman commit 4211442b2088554f1c99a72b0476f967c0509a0e Author: Florian Westphal Date: Fri Feb 17 08:39:28 2017 +0100 netfilter: use skb_to_full_sk in ip_route_me_harder commit 29e09229d9f26129a39462fae0ddabc4d9533989 upstream. inet_sk(skb->sk) is illegal in case skb is attached to request socket. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Reported by: Daniel J Blueman Signed-off-by: Florian Westphal Tested-by: Daniel J Blueman Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit ac2730234cc1454b901656ed7f59ca1b519cdaf1 Author: Dan Carpenter Date: Wed Jun 14 13:34:05 2017 +0300 xfrm: Oops on error in pfkey_msg2xfrm_state() commit 1e3d0c2c70cd3edb5deed186c5f5c75f2b84a633 upstream. There are some missing error codes here so we accidentally return NULL instead of an error pointer. It results in a NULL pointer dereference. Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.") Signed-off-by: Dan Carpenter Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit c460f2beb6f081fa22eb7291db49c13c266ffd86 Author: Dan Carpenter Date: Wed Jun 14 13:35:37 2017 +0300 xfrm: NULL dereference on allocation failure commit e747f64336fc15e1c823344942923195b800aa1e upstream. The default error code in pfkey_msg2xfrm_state() is -ENOBUFS. We added a new call to security_xfrm_state_alloc() which sets "err" to zero so there several places where we can return ERR_PTR(0) if kmalloc() fails. The caller is expecting error pointers so it leads to a NULL dereference. Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.") Signed-off-by: Dan Carpenter Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit 1e1666257cb69022e7a6fe61b1cf041a852ce1bc Author: Sabrina Dubroca Date: Wed May 3 16:43:19 2017 +0200 xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY commit 9b3eb54106cf6acd03f07cf0ab01c13676a226c2 upstream. When CONFIG_XFRM_SUB_POLICY=y, xfrm_dst stores a copy of the flowi for that dst. Unfortunately, the code that allocates and fills this copy doesn't care about what type of flowi (flowi, flowi4, flowi6) gets passed. In multiple code paths (from raw_sendmsg, from TCP when replying to a FIN, in vxlan, geneve, and gre), the flowi that gets passed to xfrm is actually an on-stack flowi4, so we end up reading stuff from the stack past the end of the flowi4 struct. Since xfrm_dst->origin isn't used anywhere following commit ca116922afa8 ("xfrm: Eliminate "fl" and "pol" args to xfrm_bundle_ok()."), just get rid of it. xfrm_dst->partner isn't used either, so get rid of that too. Fixes: 9d6ec938019c ("ipv4: Use flowi4 in public route lookup interfaces.") Signed-off-by: Sabrina Dubroca Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit 647f605276c0b5e3019fcf8ad302d217d87adedc Author: Ard Biesheuvel Date: Fri Jun 23 15:08:41 2017 -0700 mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings commit 029c54b09599573015a5c18dbe59cbdf42742237 upstream. Existing code that uses vmalloc_to_page() may assume that any address for which is_vmalloc_addr() returns true may be passed into vmalloc_to_page() to retrieve the associated struct page. This is not un unreasonable assumption to make, but on architectures that have CONFIG_HAVE_ARCH_HUGE_VMAP=y, it no longer holds, and we need to ensure that vmalloc_to_page() does not go off into the weeds trying to dereference huge PUDs or PMDs as table entries. Given that vmalloc() and vmap() themselves never create huge mappings or deal with compound pages at all, there is no correct answer in this case, so return NULL instead, and issue a warning. When reading /proc/kcore on arm64, you will hit an oops as soon as you hit the huge mappings used for the various segments that make up the mapping of vmlinux. With this patch applied, you will no longer hit the oops, but the kcore contents willl be incorrect (these regions will be zeroed out) We are fixing this for kcore specifically, so it avoids vread() for those regions. At least one other problematic user exists, i.e., /dev/kmem, but that is currently broken on arm64 for other reasons. Link: http://lkml.kernel.org/r/20170609082226.26152-1-ard.biesheuvel@linaro.org Signed-off-by: Ard Biesheuvel Acked-by: Mark Rutland Reviewed-by: Laura Abbott Cc: Michal Hocko Cc: zhong jiang Cc: Dave Hansen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [ardb: non-trivial backport to v4.9] Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman commit f9f73c58feefa8a5dda019df9c549c6e355e15be Author: Eugeniu Rosca Date: Tue Jun 6 00:08:10 2017 +0200 ravb: Fix use-after-free on `ifconfig eth0 down` [ Upstream commit 79514ef670e9e575a1fe36922268c439d0f0ca8a ] Commit a47b70ea86bd ("ravb: unmap descriptors when freeing rings") has introduced the issue seen in [1] reproduced on H3ULCB board. Fix this by relocating the RX skb ringbuffer free operation, so that swiotlb page unmapping can be done first. Freeing of aligned TX buffers is not relevant to the issue seen in [1]. Still, reposition TX free calls as well, to have all kfree() operations performed consistently _after_ dma_unmap_*()/dma_free_*(). [1] Console screenshot with the problem reproduced: salvator-x login: root root@salvator-x:~# ifconfig eth0 up Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: \ attached PHY driver [Micrel KSZ9031 Gigabit PHY] \ (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=235) IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready root@salvator-x:~# root@salvator-x:~# ifconfig eth0 down ================================================================== BUG: KASAN: use-after-free in swiotlb_tbl_unmap_single+0xc4/0x35c Write of size 1538 at addr ffff8006d884f780 by task ifconfig/1649 CPU: 0 PID: 1649 Comm: ifconfig Not tainted 4.12.0-rc4-00004-g112eb07287d1 #32 Hardware name: Renesas H3ULCB board based on r8a7795 (DT) Call trace: [] dump_backtrace+0x0/0x3a4 [] show_stack+0x14/0x1c [] dump_stack+0xf8/0x150 [] print_address_description+0x7c/0x330 [] kasan_report+0x2e0/0x2f4 [] check_memory_region+0x20/0x14c [] memcpy+0x48/0x68 [] swiotlb_tbl_unmap_single+0xc4/0x35c [] unmap_single+0x90/0xa4 [] swiotlb_unmap_page+0xc/0x14 [] __swiotlb_unmap_page+0xcc/0xe4 [] ravb_ring_free+0x514/0x870 [] ravb_close+0x288/0x36c [] __dev_close_many+0x14c/0x174 [] __dev_close+0xc8/0x144 [] __dev_change_flags+0xd8/0x194 [] dev_change_flags+0x60/0xb0 [] devinet_ioctl+0x484/0x9d4 [] inet_ioctl+0x190/0x194 [] sock_do_ioctl+0x78/0xa8 [] sock_ioctl+0x110/0x3c4 [] vfs_ioctl+0x90/0xa0 [] do_vfs_ioctl+0x148/0xc38 [] SyS_ioctl+0x44/0x74 [] el0_svc_naked+0x24/0x28 The buggy address belongs to the page: page:ffff7e001b6213c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x4000000000000000() raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff raw: 0000000000000000 ffff7e001b6213e0 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8006d884f680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8006d884f700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff8006d884f780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff8006d884f800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8006d884f880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Disabling lock debugging due to kernel taint root@salvator-x:~# Fixes: a47b70ea86bd ("ravb: unmap descriptors when freeing rings") Signed-off-by: Eugeniu Rosca Acked-by: Sergei Shtylyov Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit adfe95fe5b4290693a57f1682fcf3c4f61951086 Author: Peter Dawson Date: Fri May 26 06:35:18 2017 +1000 ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets [ Upstream commit 0e9a709560dbcfbace8bf4019dc5298619235891 ] This fix addresses two problems in the way the DSCP field is formulated on the encapsulating header of IPv6 tunnels. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661 1) The IPv6 tunneling code was manipulating the DSCP field of the encapsulating packet using the 32b flowlabel. Since the flowlabel is only the lower 20b it was incorrect to assume that the upper 12b containing the DSCP and ECN fields would remain intact when formulating the encapsulating header. This fix handles the 'inherit' and 'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable. 2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was incorrect and resulted in the DSCP value always being set to 0. Commit 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class is non-0") caused the regression by masking out the flowlabel which exposed the incorrect handling of the DSCP portion of the flowlabel in ip6_tunnel and ip6_gre. Fixes: 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class is non-0") Signed-off-by: Peter Dawson Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 168bd51ec5efbb92eb9bcdefb1327ef22e4898a9 Author: Xin Long Date: Tue Feb 7 20:56:08 2017 +0800 sctp: check af before verify address in sctp_addr_id2transport [ Upstream commit 912964eacb111551db73429719eb5fadcab0ff8a ] Commit 6f29a1306131 ("sctp: sctp_addr_id2transport should verify the addr before looking up assoc") invoked sctp_verify_addr to verify the addr. But it didn't check af variable beforehand, once users pass an address with family = 0 through sockopt, sctp_get_af_specific will return NULL and NULL pointer dereference will be caused by af->sockaddr_len. This patch is to fix it by returning NULL if af variable is NULL. Fixes: 6f29a1306131 ("sctp: sctp_addr_id2transport should verify the addr before looking up assoc") Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 399566f8a4fb1ea442046942640e37d9ea9fa0d6 Author: Jack Morgenstein Date: Mon Jan 16 18:31:39 2017 +0200 net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV [ Upstream commit 9577b174cd0323d287c994ef0891db71666d0765 ] When running SRIOV, warnings for SRQ LIMIT events flood the Hypervisor's message log when (correct, normally operating) apps use SRQ LIMIT events as a trigger to post WQEs to SRQs. Add more information to the existing debug printout for SRQ_LIMIT, and output the warning messages only for the SRQ CATAS ERROR event. Fixes: acba2420f9d2 ("mlx4_core: Add wrapper functions and comm channel and slave event support to EQs") Fixes: e0debf9cb50d ("mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level") Signed-off-by: Jack Morgenstein Signed-off-by: Tariq Toukan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit b6f75b986a7f7b79953b94f9778de295a253c624 Author: Masami Hiramatsu Date: Wed Jan 11 15:01:57 2017 +0900 perf probe: Fix to probe on gcc generated functions in modules [ Upstream commit 613f050d68a8ed3c0b18b9568698908ef7bbc1f7 ] Fix to probe on gcc generated functions on modules. Since probing on a module is based on its symbol name, it should be adjusted on actual symbols. E.g. without this fix, perf probe shows probe definition on non-exist symbol as below. $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -F in_range* in_range.isra.12 $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -D in_range p:probe/in_range nf_nat:in_range+0 With this fix, perf probe correctly shows a probe on gcc-generated symbol. $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -D in_range p:probe/in_range nf_nat:in_range.isra.12+0 This also fixes same problem on online module as below. $ perf probe -m i915 -D assert_plane p:probe/assert_plane i915:assert_plane.constprop.134+0 Signed-off-by: Masami Hiramatsu Tested-by: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/148411450673.9978.14905987549651656075.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 9f8ffe4e09520e209f41d01c73a29598414123b1 Author: Parthasarathy Bhuvaragan Date: Fri Jan 13 15:46:25 2017 +0100 tipc: allocate user memory with GFP_KERNEL flag [ Upstream commit 57d5f64d83ab5b5a5118b1597386dd76eaf4340d ] Until now, we allocate memory always with GFP_ATOMIC flag. When the system is under memory pressure and a user tries to send, the send fails due to low memory. However, the user application can wait for free memory if we allocate it using GFP_KERNEL flag. In this commit, we use allocate memory with GFP_KERNEL for all user allocation. Reported-by: Rune Torgersen Acked-by: Jon Maloy Signed-off-by: Parthasarathy Bhuvaragan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 18b200e0c8ee07e7e3f2b1bd7a5552b58457452f Author: Karicheri, Muralidharan Date: Fri Jan 13 09:32:34 2017 -0500 net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types [ Upstream commit 34c55cf2fc75f8bf6ba87df321038c064cf2d426 ] Currently dp83867 driver returns error if phy interface type PHY_INTERFACE_MODE_RGMII_RXID is used to set the rx only internal delay. Similarly issue happens for PHY_INTERFACE_MODE_RGMII_TXID. Fix this by checking also the interface type if a particular delay value is missing in the phy dt bindings. Also update the DT document accordingly. Signed-off-by: Murali Karicheri Signed-off-by: Sekhar Nori Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit e1eac347d971b59f3b7de732d488ef00e087e2f8 Author: Masami Hiramatsu Date: Wed Jan 11 14:59:38 2017 +0900 perf probe: Fix to show correct locations for events on modules [ Upstream commit d2d4edbebe07ddb77980656abe7b9bc7a9e0cdf7 ] Fix to show correct locations for events on modules by relocating given address instead of retrying after failure. This happens when the module text size is big enough, bigger than sh_addr, because the original code retries with given address + sh_addr if it failed to find CU DIE at the given address. Any address smaller than sh_addr always fails and it retries with the correct address, but addresses bigger than sh_addr will get a CU DIE which is on the given address (not adjusted by sh_addr). In my environment(x86-64), the sh_addr of ".text" section is 0x10030. Since i915 is a huge kernel module, we can see this issue as below. $ grep "[Tt] .*\[i915\]" /proc/kallsyms | sort | head -n1 ffffffffc0270000 t i915_switcheroo_can_switch [i915] ffffffffc0270000 + 0x10030 = ffffffffc0280030, so we'll check symbols cross this boundary. $ grep "[Tt] .*\[i915\]" /proc/kallsyms | grep -B1 ^ffffffffc028\ | head -n 2 ffffffffc027ff80 t haswell_init_clock_gating [i915] ffffffffc0280110 t valleyview_init_clock_gating [i915] So setup probes on both function and see what happen. $ sudo ./perf probe -m i915 -a haswell_init_clock_gating \ -a valleyview_init_clock_gating Added new events: probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915) probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915) You can now use it in all perf tools, such as: perf record -e probe:valleyview_init_clock_gating -aR sleep 1 $ sudo ./perf probe -l probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915) probe:valleyview_init_clock_gating (on i915_vga_set_decode:4@gpu/drm/i915/i915_drv.c in i915) As you can see, haswell_init_clock_gating is correctly shown, but valleyview_init_clock_gating is not. With this patch, both events are shown correctly. $ sudo ./perf probe -l probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915) probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915) Committer notes: In my case: # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating Added new events: probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915) probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915) You can now use it in all perf tools, such as: perf record -e probe:valleyview_init_clock_gating -aR sleep 1 # perf probe -l probe:haswell_init_clock_gating (on i915_getparam+432@gpu/drm/i915/i915_drv.c in i915) probe:valleyview_init_clock_gating (on __i915_printk+240@gpu/drm/i915/i915_drv.c in i915) # # readelf -SW /lib/modules/4.9.0+/build/vmlinux | egrep -w '.text|Name' [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 1] .text PROGBITS ffffffff81000000 200000 822fd3 00 AX 0 0 4096 # So both are b0rked, now with the fix: # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating Added new events: probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915) probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915) You can now use it in all perf tools, such as: perf record -e probe:valleyview_init_clock_gating -aR sleep 1 # perf probe -l probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915) probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915) # Both looks correct. Signed-off-by: Masami Hiramatsu Tested-by: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/148411436777.9978.1440275861947194930.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit cc439964fab1a58f5f7d9041845228bdd6ddfa6c Author: Ivan Vecera Date: Fri Jan 13 22:38:29 2017 +0100 be2net: fix MAC addr setting on privileged BE3 VFs [ Upstream commit 34393529163af7163ef8459808e3cf2af7db7f16 ] During interface opening MAC address stored in netdev->dev_addr is programmed in the HW with exception of BE3 VFs where the initial MAC is programmed by parent PF. This is OK when MAC address is not changed when an interfaces is down. In this case the requested MAC is stored to netdev->dev_addr and later is stored into HW during opening. But this is not done for all BE3 VFs so the NIC HW does not know anything about this change and all traffic is filtered. This is the case of bonding if fail_over_mac == 0 where the MACs of the slaves are changed while they are down. The be2net behavior is too restrictive because if a BE3 VF has the FILTMGMT privilege then it is able to modify its MAC without any restriction. To solve the described problem the driver should take care about these privileged BE3 VFs so the MAC is programmed during opening. And by contrast unpriviled BE3 VFs should not be allowed to change its MAC in any case. Cc: Sathya Perla Cc: Ajit Khaparde Cc: Sriharsha Basavapatna Cc: Somnath Kotur Signed-off-by: Ivan Vecera Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 02434def6fd0df57a5c4b1309b7d16f985234a7d Author: Ivan Vecera Date: Fri Jan 13 22:38:28 2017 +0100 be2net: don't delete MAC on close on unprivileged BE3 VFs [ Upstream commit 6d928ae590c8d58cfd5cca997d54394de139cbb7 ] BE3 VFs without FILTMGMT privilege are not allowed to modify its MAC, VLAN table and UC/MC lists. So don't try to delete MAC on such VFs. Cc: Sathya Perla Cc: Ajit Khaparde Cc: Sriharsha Basavapatna Cc: Somnath Kotur Signed-off-by: Ivan Vecera Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit fa1dbf505aefe87cb3adbe279c3eaac087d5790d Author: Ivan Vecera Date: Fri Jan 13 22:38:27 2017 +0100 be2net: fix status check in be_cmd_pmac_add() [ Upstream commit fe68d8bfe59c561664aa87d827aa4b320eb08895 ] Return value from be_mcc_notify_wait() contains a base completion status together with an additional status. The base_status() macro need to be used to access base status. Fixes: e3a7ae2 be2net: Changing MAC Address of a VF was broken Cc: Sathya Perla Cc: Ajit Khaparde Cc: Sriharsha Basavapatna Cc: Somnath Kotur Signed-off-by: Ivan Vecera Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 5f54c4e1e2afd0a437e24c0b9689728c1afc1591 Author: Amelie Delaunay Date: Thu Jan 12 16:09:44 2017 +0100 usb: dwc2: gadget: Fix GUSBCFG.USBTRDTIM value [ Upstream commit ca02954ada711b08e5b0d84590a631fd63ed39f9 ] USBTrdTim must be programmed to 0x5 when phy has a UTMI+ 16-bit wide interface or 0x9 when it has a 8-bit wide interface. GUSBCFG reset value (Value After Reset: 0x1400) sets USBTrdTim to 0x5. In case of 8-bit UTMI+, without clearing GUSBCFG.USBTRDTIM mask, USBTrdTim results in 0xD (0x5 | 0x9). That's why we need to clear GUSBCFG.USBTRDTIM mask before setting USBTrdTim value, to ensure USBTrdTim is correctly set in case of 8-bit UTMI+. Signed-off-by: Amelie Delaunay Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 0e9867b7113c56b367f2e753cd411cf7cef0d2ec Author: Heiko Carstens Date: Wed Dec 28 11:33:48 2016 +0100 s390/ctl_reg: make __ctl_load a full memory barrier [ Upstream commit e991c24d68b8c0ba297eeb7af80b1e398e98c33f ] We have quite a lot of code that depends on the order of the __ctl_load inline assemby and subsequent memory accesses, like e.g. disabling lowcore protection and the writing to lowcore. Since the __ctl_load macro does not have memory barrier semantics, nor any other dependencies the compiler is, theoretically, free to shuffle code around. Or in other words: storing to lowcore could happen before lowcore protection is disabled. In order to avoid this class of potential bugs simply add a full memory barrier to the __ctl_load macro. Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 9d00195bc0afa0252b9cdb157eb4ed1e13631bc6 Author: Nikita Yushchenko Date: Wed Jan 11 21:56:31 2017 +0300 swiotlb: ensure that page-sized mappings are page-aligned [ Upstream commit 602d9858f07c72eab64f5f00e2fae55f9902cfbe ] Some drivers do depend on page mappings to be page aligned. Swiotlb already enforces such alignment for mappings greater than page, extend that to page-sized mappings as well. Without this fix, nvme hits BUG() in nvme_setup_prps(), because that routine assumes page-aligned mappings. Signed-off-by: Nikita Yushchenko Reviewed-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 68a5dc38573586ad47befe5b91c62d7c2cb8141d Author: Dave Kleikamp Date: Wed Jan 11 13:25:00 2017 -0600 coredump: Ensure proper size of sparse core files [ Upstream commit 4d22c75d4c7b5c5f4bd31054f09103ee490878fd ] If the last section of a core file ends with an unmapped or zero page, the size of the file does not correspond with the last dump_skip() call. gdb complains that the file is truncated and can be confusing to users. After all of the vma sections are written, make sure that the file size is no smaller than the current file position. This problem can be demonstrated with gdb's bigcore testcase on the sparc architecture. Signed-off-by: Dave Kleikamp Cc: Alexander Viro Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Al Viro Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit d21816c24591060a0af9fd258f85a1e5c04fba0f Author: Shaohua Li Date: Tue Dec 13 12:09:56 2016 -0800 aio: fix lock dep warning [ Upstream commit a12f1ae61c489076a9aeb90bddca7722bf330df3 ] lockdep reports a warnning. file_start_write/file_end_write only acquire/release the lock for regular files. So checking the files in aio side too. [ 453.532141] ------------[ cut here ]------------ [ 453.533011] WARNING: CPU: 1 PID: 1298 at ../kernel/locking/lockdep.c:3514 lock_release+0x434/0x670 [ 453.533011] DEBUG_LOCKS_WARN_ON(depth <= 0) [ 453.533011] Modules linked in: [ 453.533011] CPU: 1 PID: 1298 Comm: fio Not tainted 4.9.0+ #964 [ 453.533011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014 [ 453.533011] ffff8803a24b7a70 ffffffff8196cffb ffff8803a24b7ae8 0000000000000000 [ 453.533011] ffff8803a24b7ab8 ffffffff81091ee1 ffff8803a5dba700 00000dba00000008 [ 453.533011] ffffed0074496f59 ffff8803a5dbaf54 ffff8803ae0f8488 fffffffffffffdef [ 453.533011] Call Trace: [ 453.533011] [] dump_stack+0x67/0x9c [ 453.533011] [] __warn+0x111/0x130 [ 453.533011] [] warn_slowpath_fmt+0x97/0xb0 [ 453.533011] [] ? __warn+0x130/0x130 [ 453.533011] [] ? blk_finish_plug+0x29/0x60 [ 453.533011] [] lock_release+0x434/0x670 [ 453.533011] [] ? import_single_range+0xd4/0x110 [ 453.533011] [] ? rw_verify_area+0x65/0x140 [ 453.533011] [] ? aio_write+0x1f6/0x280 [ 453.533011] [] aio_write+0x229/0x280 [ 453.533011] [] ? aio_complete+0x640/0x640 [ 453.533011] [] ? debug_check_no_locks_freed+0x1a0/0x1a0 [ 453.533011] [] ? debug_lockdep_rcu_enabled.part.2+0x1a/0x30 [ 453.533011] [] ? debug_lockdep_rcu_enabled+0x35/0x40 [ 453.533011] [] ? __might_fault+0x7e/0xf0 [ 453.533011] [] do_io_submit+0x94c/0xb10 [ 453.533011] [] ? do_io_submit+0x23e/0xb10 [ 453.533011] [] ? SyS_io_destroy+0x270/0x270 [ 453.533011] [] ? mark_held_locks+0x23/0xc0 [ 453.533011] [] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 453.533011] [] SyS_io_submit+0x10/0x20 [ 453.533011] [] entry_SYSCALL_64_fastpath+0x18/0xad [ 453.533011] [] ? trace_hardirqs_off_caller+0xc0/0x110 [ 453.533011] ---[ end trace b2fbe664d1cc0082 ]--- Cc: Dmitry Monakhov Cc: Jan Kara Cc: Christoph Hellwig Cc: Al Viro Reviewed-by: Christoph Hellwig Signed-off-by: Shaohua Li Signed-off-by: Al Viro Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 82835fb33ce54820206c14580eb1a149c473c50c Author: Jiri Olsa Date: Tue Jan 3 15:24:54 2017 +0100 perf/x86: Reject non sampling events with precise_ip [ Upstream commit 18e7a45af91acdde99d3aa1372cc40e1f8142f7b ] As Peter suggested [1] rejecting non sampling PEBS events, because they dont make any sense and could cause bugs in the NMI handler [2]. [1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop [2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.org Signed-off-by: Jiri Olsa Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: Vince Weaver Link: http://lkml.kernel.org/r/20170103142454.GA26251@krava Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 1c68633329d230dc350bc8c521689be4703f6016 Author: Peter Zijlstra Date: Fri Dec 9 14:59:00 2016 +0100 perf/core: Fix sys_perf_event_open() vs. hotplug [ Upstream commit 63cae12bce9861cec309798d34701cf3da20bc71 ] There is problem with installing an event in a task that is 'stuck' on an offline CPU. Blocked tasks are not dis-assosciated from offlined CPUs, after all, a blocked task doesn't run and doesn't require a CPU etc.. Only on wakeup do we ammend the situation and place the task on a available CPU. If we hit such a task with perf_install_in_context() we'll loop until either that task wakes up or the CPU comes back online, if the task waking depends on the event being installed, we're stuck. While looking into this issue, I also spotted another problem, if we hit a task with perf_install_in_context() that is in the middle of being migrated, that is we observe the old CPU before sending the IPI, but run the IPI (on the old CPU) while the task is already running on the new CPU, things also go sideways. Rework things to rely on task_curr() -- outside of rq->lock -- which is rather tricky. Imagine the following scenario where we're trying to install the first event into our task 't': CPU0 CPU1 CPU2 (current == t) t->perf_event_ctxp[] = ctx; smp_mb(); cpu = task_cpu(t); switch(t, n); migrate(t, 2); switch(p, t); ctx = t->perf_event_ctxp[]; // must not be NULL smp_function_call(cpu, ..); generic_exec_single() func(); spin_lock(ctx->lock); if (task_curr(t)) // false add_event_to_ctx(); spin_unlock(ctx->lock); perf_event_context_sched_in(); spin_lock(ctx->lock); // sees event So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing'. Because if CPU2's load of that variable were to observe NULL, it would not try to schedule the ctx and we'd have a task running without its counter, which would be 'bad'. As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it first and not see the event yet, then CPU0 must observe task_curr() and retry. If the install happens first, then we must see the event on sched-in and all is well. I think we can translate the first part (until the 'must not be NULL') of the scenario to a litmus test like: C C-peterz { } P0(int *x, int *y) { int r1; WRITE_ONCE(*x, 1); smp_mb(); r1 = READ_ONCE(*y); } P1(int *y, int *z) { WRITE_ONCE(*y, 1); smp_store_release(z, 1); } P2(int *x, int *z) { int r1; int r2; r1 = smp_load_acquire(z); smp_mb(); r2 = READ_ONCE(*x); } exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0) Where: x is perf_event_ctxp[], y is our tasks's CPU, and z is our task being placed on the rq of CPU2. The P0 smp_mb() is the one added by this patch, ordering the store to perf_event_ctxp[] from find_get_context() and the load of task_cpu() in task_function_call(). The smp_store_release/smp_load_acquire model the RCpc locking of the rq->lock and the smp_mb() of P2 is the context switch switching from whatever CPU2 was running to our task 't'. This litmus test evaluates into: Test C-peterz Allowed States 7 0:r1=0; 2:r1=0; 2:r2=0; 0:r1=0; 2:r1=0; 2:r2=1; 0:r1=0; 2:r1=1; 2:r2=1; 0:r1=1; 2:r1=0; 2:r2=0; 0:r1=1; 2:r1=0; 2:r2=1; 0:r1=1; 2:r1=1; 2:r2=0; 0:r1=1; 2:r1=1; 2:r2=1; No Witnesses Positive: 0 Negative: 7 Condition exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0) Observation C-peterz Never 0 7 Hash=e427f41d9146b2a5445101d3e2fcaa34 And the strong and weak model agree. Reported-by: Mark Rutland Tested-by: Mark Rutland Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Sebastian Andrzej Siewior Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: Will Deacon Cc: jeremy.linton@arm.com Link: http://lkml.kernel.org/r/20161209135900.GU3174@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 48131dd0f2b19dd297147c23dc634432fecee638 Author: Tobias Klauser Date: Thu Jan 12 16:53:11 2017 +0100 x86/mpx: Use compatible types in comparison to fix sparse error [ Upstream commit 453828625731d0ba7218242ef6ec88f59408f368 ] info->si_addr is of type void __user *, so it should be compared against something from the same address space. This fixes the following sparse error: arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Tobias Klauser Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 283994074501393b67590220ec8015f60ee670a8 Author: Len Brown Date: Fri Jan 13 01:11:18 2017 -0500 x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc() [ Upstream commit 695085b4bc7603551db0b3da897b8bf9893ca218 ] The Intel Denverton microserver uses a 25 MHz TSC crystal, so we can derive its exact [*] TSC frequency using CPUID and some arithmetic, eg.: TSC: 1800 MHz (25000000 Hz * 216 / 3 / 1000000) [*] 'exact' is only as good as the crystal, which should be +/- 20ppm Signed-off-by: Len Brown Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/306899f94804aece6d8fa8b4223ede3b48dbb59c.1484287748.git.len.brown@intel.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 6baa8c92dab9a43f0b363f1b7d7bd269d5efcf8d Author: Felix Fietkau Date: Fri Jan 13 11:28:25 2017 +0100 mac80211: initialize SMPS field in HT capabilities [ Upstream commit 43071d8fb3b7f589d72663c496a6880fb097533c ] ibss and mesh modes copy the ht capabilites from the band without overriding the SMPS state. Unfortunately the default value 0 for the SMPS field means static SMPS instead of disabled. This results in HT ibss and mesh setups using only single-stream rates, even though SMPS is not supposed to be active. Initialize SMPS to disabled for all bands on ieee80211_hw_register to ensure that the value is sane where it is not overriden with the real SMPS state. Reported-by: Elektra Wagenrad Signed-off-by: Felix Fietkau [move VHT TODO comment to a better place] Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 8eaaf66d41adf7b9b31486f03d93de3a1013e28d Author: Stefan Hajnoczi Date: Thu Jan 5 10:05:46 2017 +0000 pmem: return EIO on read_pmem() failure [ Upstream commit d47d1d27fd6206c18806440f6ebddf51a806be4f ] The read_pmem() function uses memcpy_mcsafe() on x86 where an EFAULT error code indicates a failed read. Block I/O should use EIO to indicate failure. Other pmem code paths (like bad blocks) already use EIO so let's be consistent. This fixes compatibility with consumers like btrfs that try to parse the specific error code rather than treat all errors the same. Reviewed-by: Jeff Moyer Signed-off-by: Stefan Hajnoczi Signed-off-by: Dan Williams Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 25319ae8e8a72a3fcdac7c964d267ca3c4e7c0a0 Author: Rex Zhu Date: Tue Jan 10 15:47:50 2017 +0800 drm/amd/powerplay: refine vce dpm update code on Cz. [ Upstream commit ab8db87b8256e13a62f10af1d32f5fc233c398cc ] Program HardMin based on the vce_arbiter.ecclk if ecclk is 0, disable ECLK DPM 0. Otherwise VCE could hang if switching SCLK from DPM 0 to 6/7 Signed-off-by: Rex Zhu Acked-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit f275ac7fc5d2b6013980864f14d1ced016211349 Author: Rex Zhu Date: Tue Jan 10 19:26:49 2017 +0800 drm/amd/powerplay: fix vce cg logic error on CZ/St. [ Upstream commit 3731d12dce83d47b357753ffc450ce03f1b49688 ] can fix Bug 191281: vce ib test failed. when vce idle, set vce clock gate, so the clock in vce domain will be disabled. when need to encode, disable vce clock gate, enable the clocks to vce engine. Signed-off-by: Rex Zhu Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 77e82094a3c9d3ca8308a48a4b11037c6234a262 Author: Alex Deucher Date: Tue Dec 20 16:35:50 2016 -0500 drm/radeon/si: load special ucode for certain MC configs [ Upstream commit ef736d394e85b1bf1fd65ba5e5257b85f6c82325 ] Special MC ucode is required for these memory configurations. Acked-by: Edward O'Callaghan Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 4ae8dc6acb710419c8766c290b7fb5eac2f1ed68 Author: Vadim Lomovtsev Date: Thu Jan 12 07:28:06 2017 -0800 net: thunderx: acpi: fix LMAC initialization [ Upstream commit 7aa4865506a26c607e00bd9794a85785b55ebca7 ] While probing BGX we requesting appropriate QLM for it's configuration and get LMAC count by that request. Then, while reading configured MAC values from SSDT table we need to save them in proper mapping: BGX[i]->lmac[j].mac = to later provide for initialization stuff. In order to fill such mapping properly we need to add lmac index to be used while acpi initialization since at this moment bgx->lmac_count already contains actual value. Signed-off-by: Vadim Lomovtsev Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit f88f06e1831878ecdd5fa78090a45ea8ff77f38f Author: Ard Biesheuvel Date: Wed Jan 11 14:54:53 2017 +0000 arm64: assembler: make adr_l work in modules under KASLR [ Upstream commit 41c066f2c4d436c535616fe182331766c57838f0 ] When CONFIG_RANDOMIZE_MODULE_REGION_FULL=y, the offset between loaded modules and the core kernel may exceed 4 GB, putting symbols exported by the core kernel out of the reach of the ordinary adrp/add instruction pairs used to generate relative symbol references. So make the adr_l macro emit a movz/movk sequence instead when executing in module context. While at it, remove the pointless special case for the stack pointer. Acked-by: Mark Rutland Acked-by: Will Deacon Signed-off-by: Ard Biesheuvel Signed-off-by: Catalin Marinas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit aabb797b4c1204b2e8518538b2616e476f2bac92 Author: Kevin Hilman Date: Wed Jan 11 18:18:40 2017 -0800 spi: davinci: use dma_mapping_error() [ Upstream commit c5a2a394835f473ae23931eda5066d3771d7b2f8 ] The correct error checking for dma_map_single() is to use dma_mapping_error(). Signed-off-by: Kevin Hilman Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit c32462d0b5232712f8a2a1d6cedb731115ba6f7b Author: Roberto Sassu Date: Wed Jan 11 11:06:42 2017 +0100 scsi: lpfc: avoid double free of resource identifiers [ Upstream commit cd60be4916ae689387d04b86b6fc15931e4c95ae ] Set variables initialized in lpfc_sli4_alloc_resource_identifiers() to NULL if an error occurred. Otherwise, lpfc_sli4_driver_resource_unset() attempts to free the memory again. Signed-off-by: Roberto Sassu Signed-off-by: Johannes Thumshirn Acked-by: James Smart Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 582c1ca0ea1d13a9e2912c5a7530f0728b3c3d1c Author: Brendan McGrath Date: Sat Jan 7 08:01:38 2017 +1100 HID: i2c-hid: Add sleep between POWER ON and RESET [ Upstream commit a89af4abdf9b353cdd6f61afc0eaaac403304873 ] Support for the Asus Touchpad was recently added. It turns out this device can fail initialisation (and become unusable) when the RESET command is sent too soon after the POWER ON command. Unfortunately the i2c-hid specification does not specify the need for a delay between these two commands. But it was discovered the Windows driver has a 1ms delay. As a result, this patch modifies the i2c-hid module to add a sleep inbetween the POWER ON and RESET commands which lasts between 1ms and 5ms. See https://github.com/vlasenko/hid-asus-dkms/issues/24 for further details. Signed-off-by: Brendan McGrath Reviewed-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit c78b8de5c05c73ff451b7c5a085766b421920ccd Author: Colin King Date: Wed Jan 11 11:43:10 2017 +0000 perf/x86/intel: Use ULL constant to prevent undefined shift behaviour [ Upstream commit ad5013d5699d30ded0cdbbc68b93b2aa28222c6e ] When x86_pmu.num_counters is 32 the shift of the integer constant 1 is exceeding 32bit and therefor undefined behaviour. Fix this by shifting 1ULL instead of 1. Reported-by: CoverityScan CID#1192105 ("Bad bit shift operation") Signed-off-by: Colin Ian King Cc: Andi Kleen Cc: Peter Zijlstra Cc: Kan Liang Cc: Stephane Eranian Cc: Alexander Shishkin Link: http://lkml.kernel.org/r/20170111114310.17928-1-colin.king@canonical.com Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 6130fac994818eb0fbc9dfc95056292e71fb3791 Author: Johannes Berg Date: Thu Oct 20 08:52:50 2016 +0200 mac80211: recalculate min channel width on VHT opmode changes [ Upstream commit d2941df8fbd9708035d66d889ada4d3d160170ce ] When an associated station changes its VHT operating mode this can/will affect the bandwidth it's using, and consequently we must recalculate the minimum bandwidth we need to use. Failure to do so can lead to one of two scenarios: 1) we use a too high bandwidth, this is benign 2) we use a too narrow bandwidth, causing rate control and actual PHY configuration to be out of sync, which can in turn cause problems/crashes Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit d48cb21fd50bf6bea379ad04dc2baced20cf5275 Author: Russell King Date: Tue Jan 10 23:13:45 2017 +0000 net: phy: marvell: fix Marvell 88E1512 used in SGMII mode [ Upstream commit a13c06525ab9ff442924e67df9393a5efa914c56 ] When an Marvell 88E1512 PHY is connected to a nic in SGMII mode, the fiber page is used for the SGMII host-side connection. The PHY driver notices that SUPPORTED_FIBRE is set, so it tries reading the fiber page for the link status, and ends up reading the MAC-side status instead of the outgoing (copper) link. This leads to incorrect results reported via ethtool. If the PHY is connected via SGMII to the host, ignore the fiber page. However, continue to allow the existing power management code to suspend and resume the fiber page. Fixes: 6cfb3bcc0641 ("Marvell phy: check link status in case of fiber link.") Signed-off-by: Russell King Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 849f2d0665e049c21dbac8c0fa566a8ac04fead5 Author: Andy Shevchenko Date: Mon Jan 2 14:07:22 2017 +0200 pinctrl: intel: Set pin direction properly [ Upstream commit 17fab473693e8357a9aa6fee4fbed6c13a34bd81 ] There are two bits in the PADCFG0 register to configure direction, one per TX/RX buffers. For now we wrongly assume that the GPIO is always requested before it is being used, which is not true when the GPIO is used through irqchip. In this case the GPIO is never requested and we never enable RX buffer for it. Fix this by setting both bits accordingly. Reported-by: Jarkko Nikula Signed-off-by: Andy Shevchenko Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 3a6edbc95ba0df871e1eb72a411c0fa06644785e Author: Prarit Bhargava Date: Thu Jan 5 10:09:25 2017 -0500 perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code [ Upstream commit 6d6daa20945f3f598e56e18d1f926c08754f5801 ] hswep_uncore_cpu_init() uses a hardcoded physical package id 0 for the boot cpu. This works as long as the boot CPU is actually on the physical package 0, which is normaly the case after power on / reboot. But it fails with a NULL pointer dereference when a kdump kernel is started on a secondary socket which has a different physical package id because the locigal package translation for physical package 0 does not exist. Use the logical package id of the boot cpu instead of hard coded 0. [ tglx: Rewrote changelog once more ] Fixes: cf6d445f6897 ("perf/x86/uncore: Track packages, not per CPU data") Signed-off-by: Prarit Bhargava Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Harish Chegondi Cc: Jiri Olsa Cc: Kan Liang Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1483628965-2890-1-git-send-email-prarit@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit b8c5e7b1241362a131a2364fd166f8c8fdd9b363 Author: Lucas Stach Date: Mon Dec 12 16:15:17 2016 +0100 drm/etnaviv: trick drm_mm into giving out a low IOVA [ Upstream commit 3546fb0cdac25a79c89d87020566fab52b92867d ] After rollover of the IOVA space, we want to get a low IOVA address, otherwise the the games we play by remembering the last IOVA are pointless. When we search for a free hole with DRM_MM_SEARCH_DEFAULT, drm_mm will pop the next entry from the free holes stack, which will likely be a high IOVA. By using DRM_MM_SEARCH_BELOW we can trick drm_mm into reversing the search and provide us with a low IOVA. Signed-off-by: Lucas Stach Reviewed-by: Wladimir van der Laan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 2bc8fcd633d8e7d59a242eb4d86fbebb8cf7ff61 Author: John Crispin Date: Wed Jan 25 09:20:54 2017 +0100 Documentation: devicetree: change the mediatek ethernet compatible string [ Upstream commit 61976fff20f92aceecc3670f6168bfc57a79e047 ] When the binding was defined, I was not aware that mt2701 was an earlier version of the SoC. For sake of consistency, the ethernet driver should use mt2701 inside the compat string as this is the earliest SoC with the ethernet core. The ethernet driver is currently of no real use until we finish and upstream the DSA driver. There are no users of this binding yet. It should be safe to fix this now before it is too late and we need to provide backward compatibility for the mt7623-eth compat string. Reported-by: Sean Wang Signed-off-by: John Crispin Reviewed-by: Matthias Brugger Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit c5c8743642aee21300b99540643549054edbf17f Author: Jiri Slaby Date: Tue Jan 24 15:18:29 2017 -0800 kernel/panic.c: add missing \n [ Upstream commit ff7a28a074ccbea999dadbb58c46212cf90984c6 ] When a system panics, the "Rebooting in X seconds.." message is never printed because it lacks a new line. Fix it. Link: http://lkml.kernel.org/r/20170119114751.2724-1-jslaby@suse.cz Signed-off-by: Jiri Slaby Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 00f468f51dd5182390b4e859dced75f22e89034e Author: Thomas Huth Date: Tue Jan 24 07:28:41 2017 +0100 ibmveth: Add a proper check for the availability of the checksum features [ Upstream commit 23d28a859fb847fd7fcfbd31acb3b160abb5d6ae ] When using the ibmveth driver in a KVM/QEMU based VM, it currently always prints out a scary error message like this when it is started: ibmveth 71000003 (unregistered net_device): unable to change checksum offload settings. 1 rc=-2 ret_attr=71000003 This happens because the driver always tries to enable the checksum offloading without checking for the availability of this feature first. QEMU does not support checksum offloading for the spapr-vlan device, thus we always get the error message here. According to the LoPAPR specification, the "ibm,illan-options" property of the corresponding device tree node should be checked first to see whether the H_ILLAN_ATTRIUBTES hypercall and thus the checksum offloading feature is available. Thus let's do this in the ibmveth driver, too, so that the error message is really only limited to cases where something goes wrong, and does not occur if the feature is just missing. Signed-off-by: Thomas Huth Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 32bd4d2ed9d8355edc2263947286c8039c6bf171 Author: Balakrishnan Raman Date: Mon Jan 23 20:44:33 2017 -0800 vxlan: do not age static remote mac entries [ Upstream commit efb5f68f32995c146944a9d4257c3cf8eae2c4a1 ] Mac aging is applicable only for dynamically learnt remote mac entries. Check for user configured static remote mac entries and skip aging. Signed-off-by: Balakrishnan Raman Signed-off-by: Roopa Prabhu Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit b07bf2364605dc7d78401b7eb02a533b0b6ddc05 Author: Eric Dumazet Date: Mon Jan 23 16:43:05 2017 -0800 ip6_tunnel: must reload ipv6h in ip6ip6_tnl_xmit() [ Upstream commit 21b995a9cb093fff33ec91d7cb3822b882a90a1e ] Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull(), we must reload any pointer that was related to skb->head (or skb->data), or risk use after free. Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Eric Dumazet Cc: Dmitry Kozlov Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 7fdc81f6e1a9b3f520e40cfc4ebccc94858da62d Author: Michael S. Tsirkin Date: Mon Jan 23 21:37:52 2017 +0200 virtio_net: fix PAGE_SIZE > 64k [ Upstream commit d0fa28f00052391b5df328f502fbbdd4444938b7 ] I don't have any guests with PAGE_SIZE > 64k but the code seems to be clearly broken in that case as PAGE_SIZE / MERGEABLE_BUFFER_ALIGN will need more than 8 bit and so the code in mergeable_ctx_to_buf_address does not give us the actual true size. Cc: John Fastabend Signed-off-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit a6c3e01bf32e82494fb634801982e31f257f25cc Author: Ido Schimmel Date: Mon Jan 23 11:11:42 2017 +0100 mlxsw: spectrum_router: Correctly reallocate adjacency entries [ Upstream commit a59b7e0246774e28193126fe7fdbbd0ae9c67dcc ] mlxsw_sp_nexthop_group_mac_update() is called in one of two cases: 1) When the MAC of a nexthop needs to be updated 2) When the size of a nexthop group has changed In the second case the adjacency entries for the nexthop group need to be reallocated from the adjacency table. In this case we must write to the entries the MAC addresses of all the nexthops that should be offloaded and not only those whose MAC changed. Otherwise, these entries would be filled with garbage data, resulting in packet loss. Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing") Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit ff3b1dd026bb1f9df6f345ec91b9a754d363306f Author: Greg Kurz Date: Tue Jan 24 17:50:26 2017 +0100 vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null [ Upstream commit bd00fdf198e2da475a2f4265a83686ab42d998a8 ] The recently added mediated VFIO driver doesn't know about powerpc iommu. It thus doesn't register a struct iommu_table_group in the iommu group upon device creation. The iommu_data pointer hence remains null. This causes a kernel oops when userspace tries to set the iommu type of a container associated with a mediated device to VFIO_SPAPR_TCE_v2_IOMMU. [ 82.585440] mtty mtty: MDEV: Registered [ 87.655522] iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group 10 [ 87.655527] vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id = 10 [ 116.297184] Unable to handle kernel paging request for data at address 0x00000030 [ 116.297389] Faulting instruction address: 0xd000000007870524 [ 116.297465] Oops: Kernel access of bad area, sig: 11 [#1] [ 116.297611] SMP NR_CPUS=2048 [ 116.297611] NUMA [ 116.297627] PowerNV ... [ 116.297954] CPU: 33 PID: 7067 Comm: qemu-system-ppc Not tainted 4.10.0-rc5-mdev-test #8 [ 116.297993] task: c000000e7718b680 task.stack: c000000e77214000 [ 116.298025] NIP: d000000007870524 LR: d000000007870518 CTR: 0000000000000000 [ 116.298064] REGS: c000000e77217990 TRAP: 0300 Not tainted (4.10.0-rc5-mdev-test) [ 116.298103] MSR: 9000000000009033 [ 116.298107] CR: 84004444 XER: 00000000 [ 116.298154] CFAR: c00000000000888c DAR: 0000000000000030 DSISR: 40000000 SOFTE: 1 GPR00: d000000007870518 c000000e77217c10 d00000000787b0ed c000000eed2103c0 GPR04: 0000000000000000 0000000000000000 c000000eed2103e0 0000000f24320000 GPR08: 0000000000000104 0000000000000001 0000000000000000 d0000000078729b0 GPR12: c00000000025b7e0 c00000000fe08400 0000000000000001 000001002d31d100 GPR16: 000001002c22c850 00003ffff315c750 0000000043145680 0000000043141bc0 GPR20: ffffffffffffffed fffffffffffff000 0000000020003b65 d000000007706018 GPR24: c000000f16cf0d98 d000000007706000 c000000003f42980 c000000003f42980 GPR28: c000000f1575ac00 c000000003f429c8 0000000000000000 c000000eed2103c0 [ 116.298504] NIP [d000000007870524] tce_iommu_attach_group+0x10c/0x360 [vfio_iommu_spapr_tce] [ 116.298555] LR [d000000007870518] tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce] [ 116.298601] Call Trace: [ 116.298610] [c000000e77217c10] [d000000007870518] tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce] (unreliable) [ 116.298671] [c000000e77217cb0] [d0000000077033a0] vfio_fops_unl_ioctl+0x278/0x3e0 [vfio] [ 116.298713] [c000000e77217d40] [c0000000002a3ebc] do_vfs_ioctl+0xcc/0x8b0 [ 116.298745] [c000000e77217de0] [c0000000002a4700] SyS_ioctl+0x60/0xc0 [ 116.298782] [c000000e77217e30] [c00000000000b220] system_call+0x38/0xfc [ 116.298812] Instruction dump: [ 116.298828] 7d3f4b78 409effc8 3d220000 e9298020 3c800140 38a00018 608480c0 e8690028 [ 116.298869] 4800249d e8410018 7c7f1b79 41820230 2fa90000 419e0114 e9090020 [ 116.298914] ---[ end trace 1e10b0ced08b9120 ]--- This patch fixes the oops. Reported-by: Vaibhav Jain Signed-off-by: Greg Kurz Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 8895ef4e5357fa54e614c5654eb4416623c2feb6 Author: Ding Pixel Date: Wed Jan 18 17:26:38 2017 +0800 drm/amdgpu: check ring being ready before using [ Upstream commit c5f21c9f878b8dcd54d0b9739c025ca73cb4c091 ] Return success when the ring is properly initialized, otherwise return failure. Tonga SRIOV VF doesn't have UVD and VCE engines, the initialization of these IPs is bypassed. The system crashes if application submit IB to their rings which are not ready to use. It could be a common issue if IP having ring buffer is disabled for some reason on specific ASIC, so it should check the ring being ready to use. Bug: amdgpu_test crashes system on Tonga VF. Signed-off-by: Ding Pixel Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit e5a2ba9af818cf214f2a0a1e431fb2b1102883c0 Author: Florian Fainelli Date: Fri Jan 20 16:05:05 2017 -0800 net: dsa: Check return value of phy_connect_direct() [ Upstream commit 4078b76cac68e50ccf1f76a74e7d3d5788aec3fe ] We need to check the return value of phy_connect_direct() in dsa_slave_phy_connect() otherwise we may be continuing the initialization of a slave network device with a PHY that already attached somewhere else and which will soon be in error because the PHY device is in error. The conditions for such an error to occur are that we have a port of our switch that is not disabled, and has the same port number as a PHY address (say both 5) that can be probed using the DSA slave MII bus. We end-up having this slave network device find a PHY at the same address as our port number, and we try to attach to it. A slave network (e.g: port 0) has already attached to our PHY device, and we try to re-attach it with a different network device, but since we ignore the error we would end-up initializating incorrect device references by the time the slave network interface is opened. The code has been (re)organized several times, making it hard to provide an exact Fixes tag, this is a bugfix nonetheless. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit c6f284899e01f9ea095d0e5d7aa2f3814915def1 Author: Lendacky, Thomas Date: Fri Jan 20 12:14:13 2017 -0600 amd-xgbe: Check xgbe_init() return code [ Upstream commit 738f7f647371ff4cfc9646c99dba5b58ad142db3 ] The xgbe_init() routine returns a return code indicating success or failure, but the return code is not checked. Add code to xgbe_init() to issue a message when failures are seen and add code to check the xgbe_init() return code. Signed-off-by: Tom Lendacky Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit e99d86d76eed4f4bccc01e58e0bb3c96fbe88f67 Author: Zach Ploskey Date: Sun Jan 22 00:47:19 2017 -0800 platform/x86: ideapad-laptop: handle ACPI event 1 [ Upstream commit cfee5d63767b2e7997c1f36420d008abbe61565c ] On Ideapad laptops, ACPI event 1 is currently not handled. Many models log "ideapad_laptop: Unknown event: 1" every 20 seconds or so while running on battery power. Some convertible laptops receive this event when switching in and out of tablet mode. This adds and additional case for event 1 in ideapad_acpi_notify to call ideapad_input_report(priv, vpc_bit), so that the event is reported to userspace and we avoid unnecessary logging. Fixes bug #107481 (https://bugzilla.kernel.org/show_bug.cgi?id=107481) Fixes bug #65751 (https://bugzilla.kernel.org/show_bug.cgi?id=65751) Signed-off-by: Zach Ploskey Signed-off-by: Andy Shevchenko Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit e9a87e0f5bbb3f3fd28048b923b9941687c6233f Author: Jens Axboe Date: Tue Jan 17 14:22:24 2017 -0800 iwlwifi: fix kernel crash when unregistering thermal zone [ Upstream commit 92549cdc288f47f3a98cf80ac5890c91f5876a06 ] A recent firmware change seems to have enabled thermal zones on the iwlwifi driver. Unfortunately, my device fails when registering the thermal zone. This doesn't stop the driver from attempting to unregister the thermal zone at unload time, triggering a NULL pointer deference in strlen() off the thermal_zone_device_unregister() path. Don't unregister if name is NULL, for that case we failed registering. Do the same for the cooling zone. Signed-off-by: Jens Axboe Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 322baf72eed51cef55a61f5d4ac1b51bd7824c1a Author: Eric Farman Date: Fri Jan 13 12:48:06 2017 -0500 scsi: virtio_scsi: Reject commands when virtqueue is broken [ Upstream commit 773c7220e22d193e5667c352fcbf8d47eefc817f ] In the case of a graceful set of detaches, where the virtio-scsi-ccw disk is removed from the guest prior to the controller, the guest behaves quite normally. Specifically, the detach gets us into sd_sync_cache to issue a Synchronize Cache(10) command, which immediately fails (and is retried a couple of times) because the device has been removed. Later, the removal of the controller sees two CRWs presented, but there's no further indication of the removal from the guest viewpoint. [ 17.217458] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 17.219257] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 21.449400] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=2 [ 21.449406] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0 However, on s390, the SCSI disks can be removed "by surprise" when an entire controller (host) is removed and all associated disks are removed via the loop in scsi_forget_host. The same call to sd_sync_cache is made, but because the controller has already been removed, the Synchronize Cache(10) command is neither issued (and then failed) nor rejected. That the I/O isn't returned means the guest cannot have other devices added nor removed, and other tasks (such as shutdown or reboot) issued by the guest will not complete either. The virtio ring has already been marked as broken (via virtio_break_device in virtio_ccw_remove), but we still attempt to queue the command only to have it remain there. The calling sequence provides a bit of distinction for us: virtscsi_queuecommand() -> virtscsi_kick_cmd() -> virtscsi_add_cmd() -> virtqueue_add_sgs() -> virtqueue_add() if success return 0 elseif vq->broken or vring_mapping_error() return -EIO else return -ENOSPC A return of ENOSPC is generally a temporary condition, so returning "host busy" from virtscsi_queuecommand makes sense here, to have it redriven in a moment or two. But the EIO return code is more of a permanent error and so it would be wise to return the I/O itself and allow the calling thread to finish gracefully. The result is these four kernel messages in the guest (the fourth one does not occur prior to this patch): [ 22.921562] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=2 [ 22.921580] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0 [ 22.921978] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 22.921993] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK I opted to fill in the same response data that is returned from the more graceful device detach, where the disk device is removed prior to the controller device. Signed-off-by: Eric Farman Reviewed-by: Fam Zheng Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 5d5c293af8348b540ef721d810f7549ac3ab81c2 Author: Vineeth Remanan Pillai Date: Thu Jan 19 08:35:39 2017 -0800 xen-netfront: Fix Rx stall during network stress and OOM [ Upstream commit 90c311b0eeead647b708a723dbdde1eda3dcad05 ] During an OOM scenario, request slots could not be created as skb allocation fails. So the netback cannot pass in packets and netfront wrongly assumes that there is no more work to be done and it disables polling. This causes Rx to stall. The issue is with the retry logic which schedules the timer if the created slots are less than NET_RX_SLOTS_MIN. The count of new request slots to be pushed are calculated as a difference between new req_prod and rsp_cons which could be more than the actual slots, if there are unconsumed responses. The fix is to calculate the count of newly created slots as the difference between new req_prod and old req_prod. Signed-off-by: Vineeth Remanan Pillai Reviewed-by: Juergen Gross Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 72191c7d82e7a559ef05b1b89e6365911a8726aa Author: Stefano Stabellini Date: Thu Jan 19 10:39:09 2017 -0800 swiotlb-xen: update dev_addr after swapping pages [ Upstream commit f1225ee4c8fcf09afaa199b8b1f0450f38b8cd11 ] In xen_swiotlb_map_page and xen_swiotlb_map_sg_attrs, if the original page is not suitable, we swap it for another page from the swiotlb pool. In these cases, we don't update the previously calculated dma address for the page before calling xen_dma_map_page. Thus, we end up calling xen_dma_map_page passing the wrong dev_addr, resulting in xen_dma_map_page mistakenly assuming that the page is foreign when it is local. Fix the bug by updating dev_addr appropriately. This change has no effect on x86, because xen_dma_map_page is a stub there. Signed-off-by: Stefano Stabellini Signed-off-by: Pooya Keshavarzi Tested-by: Pooya Keshavarzi Reviewed-by: Boris Ostrovsky Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 884baf2abf6dd0273b821a1f9e06023438528a52 Author: G. Campana Date: Thu Jan 19 23:37:46 2017 +0200 virtio_console: fix a crash in config_work_handler [ Upstream commit 8379cadf71c3ee8173a1c6fc1ea7762a9638c047 ] Using control_work instead of config_work as the 3rd argument to container_of results in an invalid portdev pointer. Indeed, the work structure is initialized as below: INIT_WORK(&portdev->config_work, &config_work_handler); It leads to a crash when portdev->vdev is dereferenced later. This bug is triggered when the guest uses a virtio-console without multiport feature and receives a config_changed virtio interrupt. Signed-off-by: G. Campana Reviewed-by: Amit Shah Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit c3eab85ff11a8cd4def8cf2b4cc0610f6b47a8cd Author: Liu Bo Date: Thu Dec 1 13:43:31 2016 -0800 Btrfs: fix truncate down when no_holes feature is enabled [ Upstream commit 91298eec05cd8d4e828cf7ee5d4a6334f70cf69a ] For such a file mapping, [0-4k][hole][8k-12k] In NO_HOLES mode, we don't have the [hole] extent any more. Commit c1aa45759e90 ("Btrfs: fix shrinking truncate when the no_holes feature is enabled") fixed disk isize not being updated in NO_HOLES mode when data is not flushed. However, even if data has been flushed, we can still have trouble in updating disk isize since we updated disk isize to 'start' of the last evicted extent. Reviewed-by: Chris Mason Signed-off-by: Liu Bo Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit e8b5068b64d0505fe138e3db243e6e3385ae1a15 Author: Chandan Rajendra Date: Fri Dec 23 15:00:18 2016 +0530 Btrfs: Fix deadlock between direct IO and fast fsync [ Upstream commit 97dcdea076ecef41ea4aaa23d4397c2f622e4265 ] The following deadlock is seen when executing generic/113 test, ---------------------------------------------------------+---------------------------------------------------- Direct I/O task Fast fsync task ---------------------------------------------------------+---------------------------------------------------- btrfs_direct_IO __blockdev_direct_IO do_blockdev_direct_IO do_direct_IO btrfs_get_blocks_direct while (blocks needs to written) get_more_blocks (first iteration) btrfs_get_blocks_direct btrfs_create_dio_extent down_read(&BTRFS_I(inode) >dio_sem) Create and add extent map and ordered extent up_read(&BTRFS_I(inode) >dio_sem) btrfs_sync_file btrfs_log_dentry_safe btrfs_log_inode_parent btrfs_log_inode btrfs_log_changed_extents down_write(&BTRFS_I(inode) >dio_sem) Collect new extent maps and ordered extents wait for ordered extent completion get_more_blocks (second iteration) btrfs_get_blocks_direct btrfs_create_dio_extent down_read(&BTRFS_I(inode) >dio_sem) -------------------------------------------------------------------------------------------------------------- In the above description, Btrfs direct I/O code path has not yet started submitting bios for file range covered by the initial ordered extent. Meanwhile, The fast fsync task obtains the write semaphore and waits for I/O on the ordered extent to get completed. However, the Direct I/O task is now blocked on obtaining the read semaphore. To resolve the deadlock, this commit modifies the Direct I/O code path to obtain the read semaphore before invoking __blockdev_direct_IO(). The semaphore is then given up after __blockdev_direct_IO() returns. This allows the Direct I/O code to complete I/O on all the ordered extents it creates. Signed-off-by: Chandan Rajendra Reviewed-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 83571e9ef7c91ef6e249aae374de068b30963551 Author: Eric Dumazet Date: Wed Jan 18 19:44:42 2017 -0800 gianfar: Do not reuse pages from emergency reserve [ Upstream commit 69fed99baac186013840ced3524562841296034f ] A driver using dev_alloc_page() must not reuse a page that had to use emergency memory reserve. Otherwise all packets using this page will be immediately dropped, unless for very specific sockets having SOCK_MEMALLOC bit set. This issue might be hard to debug, because only a fraction of the RX ring buffer would suffer from drops. Fixes: 75354148ce69 ("gianfar: Add paged allocation and Rx S/G") Signed-off-by: Eric Dumazet Cc: Claudiu Manoil Acked-by: Claudiu Manoil Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit c48a862c47d481838b26f5d6cd5c29e2064339da Author: Jiri Slaby Date: Wed Jan 18 14:29:21 2017 +0100 objtool: Fix IRET's opcode [ Upstream commit b5b46c4740aed1538544f0fa849c5b76c7823469 ] The IRET opcode is 0xcf according to the Intel manual and also to objdump of my vmlinux: 1ea8: 48 cf iretq Fix the opcode in arch_decode_instruction(). The previous value (0xc5) seems to correspond to LDS. Signed-off-by: Jiri Slaby Acked-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20170118132921.19319-1-jslaby@suse.cz Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 251d00bf1309c65316f5bd3850b2ca523b46921c Author: Daniel Borkmann Date: Wed Jan 18 15:14:17 2017 +0100 bpf: don't trigger OOM killer under pressure with map alloc [ Upstream commit d407bd25a204bd66b7346dde24bd3d37ef0e0b05 ] This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(), that are to be used for map allocations. Using kmalloc() for very large allocations can cause excessive work within the page allocator, so i) fall back earlier to vmalloc() when the attempt is considered costly anyway, and even more importantly ii) don't trigger OOM killer with any of the allocators. Since this is based on a user space request, for example, when creating maps with element pre-allocation, we really want such requests to fail instead of killing other user space processes. Also, don't spam the kernel log with warnings should any of the allocations fail under pressure. Given that, we can make backend selection in bpf_map_area_alloc() generic, and convert all maps over to use this API for spots with potentially large allocation requests. Note, replacing the one kmalloc_array() is fine as overflow checks happen earlier in htab_map_alloc(), since it must also protect the multiplication for vmalloc() should kmalloc_array() fail. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit a7a2a6d34fe78261945a5eb5eeca6c4fa3ad800e Author: Michael Chan Date: Tue Jan 17 22:07:19 2017 -0500 bnxt_en: Fix "uninitialized variable" bug in TPA code path. [ Upstream commit 719ca8111402aa6157bd83a3c966d184db0d8956 ] In the TPA GRO code path, initialize the tcp_opt_len variable to 0 so that it will be correct for packets without TCP timestamps. The bug caused the SKB fields to be incorrectly set up for packets without TCP timestamps, leading to these packets being rejected by the stack. Reported-by: Andy Gospodarek Acked-by: Andy Gospodarek Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit da805bc788b0dfce728b22d2595e569d2ee9769e Author: Igor Druzhinin Date: Tue Jan 17 20:49:38 2017 +0000 xen-netback: protect resource cleaning on XenBus disconnect [ Upstream commit f16f1df65f1cf139ff9e9f84661e6573d6bb27fc ] vif->lock is used to protect statistics gathering agents from using the queue structure during cleaning. Signed-off-by: Igor Druzhinin Acked-by: Wei Liu Reviewed-by: Paul Durrant Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 7bdccaa5da12f294636de312c73d7d33dfaa947c Author: Igor Druzhinin Date: Tue Jan 17 20:49:37 2017 +0000 xen-netback: fix memory leaks on XenBus disconnect [ Upstream commit 9a6cdf52b85ea5fb21d2bb31e4a7bc61b79923a7 ] Eliminate memory leaks introduced several years ago by cleaning the queue resources which are allocated on XenBus connection event. Namely, queue structure array and pages used for IO rings. Signed-off-by: Igor Druzhinin Reviewed-by: Paul Durrant Acked-by: Wei Liu Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 5dcd085942761174f6ff1271fe707e4e2308d64c Author: Eran Ben Elisha Date: Tue Jan 17 19:19:17 2017 +0200 net: ethtool: Initialize buffer when querying device channel settings [ Upstream commit 31a86d137219373c3222ca5f4f912e9a4d8065bb ] Ethtool channels respond struct was uninitialized when querying device channel boundaries settings. As a result, unreported fields by the driver hold garbage. This may cause sending unsupported params to driver. Fixes: 8bf368620486 ('ethtool: ensure channel counts are within bounds ...') Signed-off-by: Eran Ben Elisha Signed-off-by: Tariq Toukan CC: John W. Linville Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 6e315b2b10b65022ce07e6ed3e2decf7678d58c2 Author: Gavin Shan Date: Fri Jan 6 10:39:49 2017 +1100 powerpc/eeh: Enable IO path on permanent error [ Upstream commit 387bbc974f6adf91aa635090f73434ed10edd915 ] We give up recovery on permanent error, simply shutdown the affected devices and remove them. If the devices can't be put into quiet state, they spew more traffic that is likely to cause another unexpected EEH error. This was observed on "p8dtu2u" machine: 0002:00:00.0 PCI bridge: IBM Device 03dc 0002:01:00.0 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.1 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.2 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.3 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) On P8 PowerNV platform, the IO path is frozen when shutdowning the devices, meaning the memory registers are inaccessible. It is why the devices can't be put into quiet state before removing them. This fixes the issue by enabling IO path prior to putting the devices into quiet state. Reported-by: Pridhiviraj Paidipeddi Signed-off-by: Gavin Shan Acked-by: Russell Currey Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit ea7b808165a5161fcd148c8b41fa03e79e65cb82 Author: Florian Fainelli Date: Fri Dec 23 19:56:56 2016 -0800 net: korina: Fix NAPI versus resources freeing commit e6afb1ad88feddf2347ea779cfaf4d03d3cd40b6 upstream. Commit beb0babfb77e ("korina: disable napi on close and restart") introduced calls to napi_disable() that were missing before, unfortunately this leaves a small window during which NAPI has a chance to run, yet we just freed resources since korina_free_ring() has been called: Fix this by disabling NAPI first then freeing resource, and make sure that we also cancel the restart task before doing the resource freeing. Fixes: beb0babfb77e ("korina: disable napi on close and restart") Reported-by: Alexandros C. Couloumbis Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman commit fded17be01abfefe7218a72df703d8fe6b28206f Author: Zhou Chengming Date: Mon Jan 16 11:21:11 2017 +0800 perf/x86/intel: Handle exclusive threadid correctly on CPU hotplug [ Upstream commit 4e71de7986386d5fd3765458f27d612931f27f5e ] The CPU hotplug function intel_pmu_cpu_starting() sets cpu_hw_events.excl_thread_id unconditionally to 1 when the shared exclusive counters data structure is already availabe for the sibling thread. This works during the boot process because the first sibling gets threadid 0 assigned and the second sibling which shares the data structure gets 1. But when the first thread of the core is offlined and onlined again it shares the data structure with the second thread and gets exclusive thread id 1 assigned as well. Prevent this by checking the threadid of the already online thread. [ tglx: Rewrote changelog ] Signed-off-by: Zhou Chengming Cc: NuoHan Qiao Cc: ak@linux.intel.com Cc: peterz@infradead.org Cc: kan.liang@intel.com Cc: dave.hansen@linux.intel.com Cc: eranian@google.com Cc: qiaonuohan@huawei.com Cc: davidcc@google.com Cc: guohanjun@huawei.com Link: http://lkml.kernel.org/r/1484536871-3131-1-git-send-email-zhouchengming1@huawei.com Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 3eeb3459b7e6ec77d0ca2ae1bc82ecefe16d4c50 Author: Alvaro G. M Date: Tue Jan 17 09:08:16 2017 +0100 net: phy: dp83848: add DP83620 PHY support [ Upstream commit 93b43fd137cd8865adf9978ab9870a344365d3af ] This PHY with fiber support is register compatible with DP83848, so add support for it. Signed-off-by: Alvaro Gamez Machado Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 10c24e89b2b86907fc9588db1fa7300e9a1a194a Author: Alex Deucher Date: Tue Jan 17 15:06:58 2017 -0500 drm/amdgpu: add support for new hainan variants [ Upstream commit 17324b6add82d6c0bf119f1d1944baef392a4e39 ] New hainan parts require updated smc firmware. Cc: Sonny Jiang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 9f2a36a7504c994f89a1fe8e4d94b8d43423816f Author: Rex Zhu Date: Tue Jan 10 20:03:59 2017 +0800 drm/amdgpu: fix program vce instance logic error. [ Upstream commit 50a1ebc70a2803deb7811fc73fb55d70e353bc34 ] need to clear bit31-29 in GRBM_GFX_INDEX, then the program can be valid. Signed-off-by: Rex Zhu Acked-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 0c9626619777f76a4a6761a259dcad263b1902d3 Author: Quinn Tran Date: Fri Dec 23 18:06:13 2016 -0800 qla2xxx: Fix erroneous invalid handle message [ Upstream commit 4f060736f29a960aba8e781a88837464756200a8 ] Termination of Immediate Notify IOCB was using wrong IOCB handle. IOCB completion code was unable to find appropriate code path due to wrong handle. Following message is seen in the logs. "Error entry - invalid handle/queue (ffff)." Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Christoph Hellwig [ bvanassche: Fixed word order in patch title ] Signed-off-by: Bart Van Assche Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 8cfcaa2899f322fa602e903e983389fd1de36fe8 Author: Quinn Tran Date: Fri Dec 23 18:06:11 2016 -0800 qla2xxx: Terminate exchange if corrupted [ Upstream commit 5f35509db179ca7ed1feaa4b14f841adb06ed220 ] Corrupted ATIO is defined as length of fcp_header & fcp_cmd payload is less than 0x38. It's the minimum size for a frame to carry 8..16 bytes SCSI CDB. The exchange will be dropped or terminated if corrupted. Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Reviewed-by: Christoph Hellwig [ bvanassche: Fixed spelling in patch title ] Signed-off-by: Bart Van Assche Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 42a1d5b47594eb846f709f6558082919dabc7344 Author: Johannes Thumshirn Date: Tue Jan 10 12:05:54 2017 +0100 scsi: lpfc: Set elsiocb contexts to NULL after freeing it [ Upstream commit 8667f515952feefebb3c0f8d9a9266c91b101a46 ] Set the elsiocb contexts to NULL after freeing as others depend on it. Signed-off-by: Johannes Thumshirn Acked-by: Dick Kennedy Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 7782ab228f64e7da4c47a90b40fbb80920ce722b Author: Julia Lawall Date: Tue Jan 17 12:23:21 2017 +0100 stmmac: add missing of_node_put [ Upstream commit a249708bc2aa1fe3ddf15dfac22bee519d15996b ] The function stmmac_dt_phy provides several possibilities for initializing plat->mdio_node, all of which have the effect of increasing the reference count of the assigned value. This field is not updated elsewhere, so the value is live until the end of the lifetime of plat (devm_allocated), just after the end of stmmac_remove_config_dt. Thus, add an of_node_put on plat->mdio_node in stmmac_remove_config_dt. It is possible that the field mdio_node is never initialized, but of_node_put is NULL-safe, so it is also safe to call of_node_put in that case. Signed-off-by: Julia Lawall Acked-by: Alexandre TORGUE Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit ee4494c6bda8ac530f85756e619c1727d2539b6c Author: Damien Le Moal Date: Thu Jan 12 15:25:10 2017 +0900 scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type [ Upstream commit 26f2819772af891dee2843e1f8662c58e5129d5f ] Zoned block devices force the use of READ/WRITE(16) commands by setting sdkp->use_16_for_rw and clearing sdkp->use_10_for_rw. This result in DPOFUA always being disabled for these drives as the assumed use of the deprecated READ/WRITE(6) commands only looks at sdkp->use_10_for_rw. Strenghten the test by also checking that sdkp->use_16_for_rw is false. Signed-off-by: Damien Le Moal Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 80b1a1180e4e72fed893e5aba73fe7ccea7aa30e Author: Dmitry Vyukov Date: Tue Jan 17 14:51:04 2017 +0100 KVM: x86: fix fixing of hypercalls [ Upstream commit ce2e852ecc9a42e4b8dabb46025cfef63209234a ] emulator_fix_hypercall() replaces hypercall with vmcall instruction, but it does not handle GP exception properly when writes the new instruction. It can return X86EMUL_PROPAGATE_FAULT without setting exception information. This leads to incorrect emulation and triggers WARN_ON(ctxt->exception.vector > 0x1f) in x86_emulate_insn() as discovered by syzkaller fuzzer: WARNING: CPU: 2 PID: 18646 at arch/x86/kvm/emulate.c:5558 Call Trace: warn_slowpath_null+0x2c/0x40 kernel/panic.c:582 x86_emulate_insn+0x16a5/0x4090 arch/x86/kvm/emulate.c:5572 x86_emulate_instruction+0x403/0x1cc0 arch/x86/kvm/x86.c:5618 emulate_instruction arch/x86/include/asm/kvm_host.h:1127 [inline] handle_exception+0x594/0xfd0 arch/x86/kvm/vmx.c:5762 vmx_handle_exit+0x2b7/0x38b0 arch/x86/kvm/vmx.c:8625 vcpu_enter_guest arch/x86/kvm/x86.c:6888 [inline] vcpu_run arch/x86/kvm/x86.c:6947 [inline] Set exception information when write in emulator_fix_hypercall() fails. Signed-off-by: Dmitry Vyukov Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Wanpeng Li Cc: kvm@vger.kernel.org Cc: syzkaller@googlegroups.com Signed-off-by: Radim Krčmář Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit afaee3ef513650b2f6cb9e2c860b9210875a8135 Author: Juergen Gross Date: Thu May 18 17:28:48 2017 +0200 xen/blkback: don't free be structure too early commit 71df1d7ccad1c36f7321d6b3b48f2ea42681c363 upstream. The be structure must not be freed when freeing the blkif structure isn't done. Otherwise a use-after-free of be when unmapping the ring used for communicating with the frontend will occur in case of a late call of xenblk_disconnect() (e.g. due to an I/O still active when trying to disconnect). Signed-off-by: Juergen Gross Tested-by: Steven Haigh Acked-by: Roger Pau Monné Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 13fa36f9fbc84c47cef6673d5e2f3a20693d6eff Author: Jerome Brunet Date: Fri Jan 20 08:20:24 2017 -0800 ARM64: dts: meson-gxbb-odroidc2: fix GbE tx link breakage [ Upstream commit feb3cbea0946c67060e2d5bcb7499b0a6f6700fe ] OdroidC2 GbE link breaks under heavy tx transfer. This happens even if the MAC does not enable Energy Efficient Ethernet (No Low Power state Idle on the Tx path). The problem seems to come from the phy Rx path, entering the LPI state. Disabling EEE advertisement on the phy prevent this feature to be negociated with the link partner and solve the issue. Signed-off-by: Jerome Brunet Signed-off-by: Kevin Hilman Signed-off-by: Arnd Bergmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 8bface142a8d4bc5766bc71c94a618f234ed2bc6 Author: jbrunet Date: Mon Dec 19 16:05:38 2016 +0100 dt: bindings: net: use boolean dt properties for eee broken modes [ Upstream commit 308d3165d8b2b98d3dc3d97d6662062735daea67 ] The patches regarding eee-broken-modes was merged before all people involved could find an agreement on the best way to move forward. While we agreed on having a DT property to mark particular modes as broken, the value used for eee-broken-modes mapped the phy register in very direct way. Because of this, the concern is that it could be used to implement configuration policies instead of describing a broken HW. In the end, having a boolean property for each mode seems to be preferred over one bit field value mapping the register (too) directly. Cc: Florian Fainelli Signed-off-by: Jerome Brunet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 3897ae12b706bfc47c07a1eef58fe6ce328784cf Author: jbrunet Date: Mon Dec 19 16:05:37 2016 +0100 net: phy: use boolean dt properties for eee broken modes [ Upstream commit 57f3986231bb2c69a55ccab1d2b30a00818027ac ] The patches regarding eee-broken-modes was merged before all people involved could find an agreement on the best way to move forward. While we agreed on having a DT property to mark particular modes as broken, the value used for eee-broken-modes mapped the phy register in very direct way. Because of this, the concern is that it could be used to implement configuration policies instead of describing a broken HW. In the end, having a boolean property for each mode seems to be preferred over one bit field value mapping the register (too) directly. Cc: Florian Fainelli Signed-off-by: Jerome Brunet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 40373d91a0f764c8ba5c56ea3dc88896faa4510d Author: jbrunet Date: Mon Dec 19 16:05:36 2016 +0100 net: phy: fix sign type error in genphy_config_eee_advert [ Upstream commit 3bb9ab63276696988d8224f52db20e87194deb4b ] In genphy_config_eee_advert, the return value of phy_read_mmd_indirect is checked to know if the register could be accessed but the result is assigned to a 'u32'. Changing to 'int' to correctly get errors from phy_read_mmd_indirect. Fixes: d853d145ea3e ("net: phy: add an option to disable EEE advertisement") Reported-by: Julia Lawall Signed-off-by: Jerome Brunet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 752ba680eb70ebc1e235b2ac1087ce471e2c800d Author: jbrunet Date: Mon Nov 28 10:46:47 2016 +0100 dt-bindings: net: add EEE capability constants [ Upstream commit 1fc31357ad194fb98691f3d122bcd47e59239e83 ] Signed-off-by: Jerome Brunet Tested-by: Yegor Yefremov Tested-by: Andreas Färber Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 97ace183074d306942b903a148aebd5d061758f0 Author: jbrunet Date: Mon Nov 28 10:46:46 2016 +0100 net: phy: add an option to disable EEE advertisement [ Upstream commit d853d145ea3e63387a2ac759aa41d5e43876e561 ] This patch adds an option to disable EEE advertisement in the generic PHY by providing a mask of prohibited modes corresponding to the value found in the MDIO_AN_EEE_ADV register. On some platforms, PHY Low power idle seems to be causing issues, even breaking the link some cases. The patch provides a convenient way for these platforms to disable EEE advertisement and work around the issue. Signed-off-by: Jerome Brunet Tested-by: Yegor Yefremov Tested-by: Andreas Färber Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 0e8eca987e27077fc2ade85aa402dbc177fdb026 Author: Pavel Belous Date: Sat Jan 28 22:53:28 2017 +0300 net: ethtool: add support for 2500BaseT and 5000BaseT link modes [ Upstream commit 94842b4fc4d6b1691cfc86c6f5251f299d27f4ba ] This patch introduce support for 2500BaseT and 5000BaseT link modes. These modes are included in the new IEEE 802.3bz standard. Signed-off-by: Pavel Belous Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 8886196a73204d167b7f8797eb6ebf61e76794d6 Author: Liam R. Howlett Date: Tue May 23 21:54:11 2017 -0400 sparc64: Zero pages on allocation for mondo and error queues. [ Upstream commit 7a7dc961a28b965a0d0303c2e989df17b411708b ] Error queues use a non-zero first word to detect if the queues are full. Using pages that have not been zeroed may result in false positive overflow events. These queues are set up once during boot so zeroing all mondo and error queue pages is safe. Note that the false positive overflow does not always occur because the page allocation for these queues is so early in the boot cycle that higher number CPUs get fresh pages. It is only when traps are serviced with lower number CPUs who were given already used pages that this issue is exposed. Signed-off-by: Liam R. Howlett Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 41172b772da4b9d875ed3fb90fe0e1a86742dc2a Author: Liam R. Howlett Date: Tue May 23 21:54:10 2017 -0400 sparc64: Handle PIO & MEM non-resumable errors. [ Upstream commit 047487241ff59374fded8c477f21453681f5995c ] User processes trying to access an invalid memory address via PIO will receive a SIGBUS signal instead of causing a panic. Memory errors will receive a SIGKILL since a SIGBUS may result in a coredump which may attempt to repeat the faulting access. Signed-off-by: Liam R. Howlett Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 2aa6d036b716c9242222e054d4ef34905ad45fd3 Author: Mark Rutland Date: Fri Jun 16 14:02:34 2017 -0700 mm: numa: avoid waiting on freed migrated pages commit 3c226c637b69104f6b9f1c6ec5b08d7b741b3229 upstream. In do_huge_pmd_numa_page(), we attempt to handle a migrating thp pmd by waiting until the pmd is unlocked before we return and retry. However, we can race with migrate_misplaced_transhuge_page(): // do_huge_pmd_numa_page // migrate_misplaced_transhuge_page() // Holds 0 refs on page // Holds 2 refs on page vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); /* ... */ if (pmd_trans_migrating(*vmf->pmd)) { page = pmd_page(*vmf->pmd); spin_unlock(vmf->ptl); ptl = pmd_lock(mm, pmd); if (page_count(page) != 2)) { /* roll back */ } /* ... */ mlock_migrate_page(new_page, page); /* ... */ spin_unlock(ptl); put_page(page); put_page(page); // page freed here wait_on_page_locked(page); goto out; } This can result in the freed page having its waiters flag set unexpectedly, which trips the PAGE_FLAGS_CHECK_AT_PREP checks in the page alloc/free functions. This has been observed on arm64 KVM guests. We can avoid this by having do_huge_pmd_numa_page() take a reference on the page before dropping the pmd lock, mirroring what we do in __migration_entry_wait(). When we hit the race, migrate_misplaced_transhuge_page() will see the reference and abort the migration, as it may do today in other cases. Fixes: b8916634b77bffb2 ("mm: Prevent parallel splits during THP migration") Link: http://lkml.kernel.org/r/1497349722-6731-2-git-send-email-will.deacon@arm.com Signed-off-by: Mark Rutland Signed-off-by: Will Deacon Acked-by: Steve Capper Acked-by: Kirill A. Shutemov Acked-by: Vlastimil Babka Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 08cb8e5f83fd2d4f6327173cc01322bc842806f1 Author: Guillaume Nault Date: Fri Mar 31 13:02:30 2017 +0200 l2tp: take a reference on sessions used in genetlink handlers commit 2777e2ab5a9cf2b4524486c6db1517a6ded25261 upstream. Callers of l2tp_nl_session_find() need to hold a reference on the returned session since there's no guarantee that it isn't going to disappear from under them. Relying on the fact that no l2tp netlink message may be processed concurrently isn't enough: sessions can be deleted by other means (e.g. by closing the PPPOL2TP socket of a ppp pseudowire). l2tp_nl_cmd_session_delete() is a bit special: it runs a callback function that may require a previous call to session->ref(). In particular, for ppp pseudowires, the callback is l2tp_session_delete(), which then calls pppol2tp_session_close() and dereferences the PPPOL2TP socket. The socket might already be gone at the moment l2tp_session_delete() calls session->ref(), so we need to take a reference during the session lookup. So we need to pass the do_ref variable down to l2tp_session_get() and l2tp_session_get_by_ifname(). Since all callers have to be updated, l2tp_session_find_by_ifname() and l2tp_nl_session_find() are renamed to reflect their new behaviour. Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman commit 599e6f038777c6733eef244d4aac192edb612aa6 Author: Guillaume Nault Date: Fri Mar 31 13:02:29 2017 +0200 l2tp: hold session while sending creation notifications commit 5e6a9e5a3554a5b3db09cdc22253af1849c65dff upstream. l2tp_session_find() doesn't take any reference on the returned session. Therefore, the session may disappear while sending the notification. Use l2tp_session_get() instead and decrement session's refcount once the notification is sent. Fixes: 33f72e6f0c67 ("l2tp : multicast notification to the registered listeners") Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman commit d9face6fc62a73059f0fc3a3de4dfe8f53536aa7 Author: Guillaume Nault Date: Fri Mar 31 13:02:27 2017 +0200 l2tp: fix duplicate session creation commit dbdbc73b44782e22b3b4b6e8b51e7a3d245f3086 upstream. l2tp_session_create() relies on its caller for checking for duplicate sessions. This is racy since a session can be concurrently inserted after the caller's verification. Fix this by letting l2tp_session_create() verify sessions uniqueness upon insertion. Callers need to be adapted to check for l2tp_session_create()'s return code instead of calling l2tp_session_find(). pppol2tp_connect() is a bit special because it has to work on existing sessions (if they're not connected) or to create a new session if none is found. When acting on a preexisting session, a reference must be held or it could go away on us. So we have to use l2tp_session_get() instead of l2tp_session_find() and drop the reference before exiting. Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman commit 806e98835683694cbb9e74c28641df8042792e27 Author: Guillaume Nault Date: Fri Mar 31 13:02:26 2017 +0200 l2tp: ensure session can't get removed during pppol2tp_session_ioctl() commit 57377d63547861919ee634b845c7caa38de4a452 upstream. Holding a reference on session is required before calling pppol2tp_session_ioctl(). The session could get freed while processing the ioctl otherwise. Since pppol2tp_session_ioctl() uses the session's socket, we also need to take a reference on it in l2tp_session_get(). Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman commit 6539c4f991c28e82d1eb0385d35cc9662985f61a Author: Guillaume Nault Date: Fri Mar 31 13:02:25 2017 +0200 l2tp: fix race in l2tp_recv_common() commit 61b9a047729bb230978178bca6729689d0c50ca2 upstream. Taking a reference on sessions in l2tp_recv_common() is racy; this has to be done by the callers. To this end, a new function is required (l2tp_session_get()) to atomically lookup a session and take a reference on it. Callers then have to manually drop this reference. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman commit d2da8d394147526a28c6d5bb83a72635e7f0a288 Author: Baolin Wang Date: Thu Dec 8 19:55:22 2016 +0800 usb: gadget: f_fs: Fix possibe deadlock commit b3ce3ce02d146841af012d08506b4071db8ffde3 upstream. When system try to close /dev/usb-ffs/adb/ep0 on one core, at the same time another core try to attach new UDC, which will cause deadlock as below scenario. Thus we should release ffs lock before issuing unregister_gadget_item(). [ 52.642225] c1 ====================================================== [ 52.642228] c1 [ INFO: possible circular locking dependency detected ] [ 52.642236] c1 4.4.6+ #1 Tainted: G W O [ 52.642241] c1 ------------------------------------------------------- [ 52.642245] c1 usb ffs open/2808 is trying to acquire lock: [ 52.642270] c0 (udc_lock){+.+.+.}, at: [] usb_gadget_unregister_driver+0x3c/0xc8 [ 52.642272] c1 but task is already holding lock: [ 52.642283] c0 (ffs_lock){+.+.+.}, at: [] ffs_data_clear+0x30/0x140 [ 52.642285] c1 which lock already depends on the new lock. [ 52.642287] c1 the existing dependency chain (in reverse order) is: [ 52.642295] c0 -> #1 (ffs_lock){+.+.+.}: [ 52.642307] c0 [] __lock_acquire+0x20f0/0x2238 [ 52.642314] c0 [] lock_acquire+0xe4/0x298 [ 52.642322] c0 [] mutex_lock_nested+0x7c/0x3cc [ 52.642328] c0 [] ffs_func_bind+0x504/0x6e8 [ 52.642334] c0 [] usb_add_function+0x84/0x184 [ 52.642340] c0 [] configfs_composite_bind+0x264/0x39c [ 52.642346] c0 [] udc_bind_to_driver+0x58/0x11c [ 52.642352] c0 [] usb_udc_attach_driver+0x90/0xc8 [ 52.642358] c0 [] gadget_dev_desc_UDC_store+0xd4/0x128 [ 52.642369] c0 [] configfs_write_file+0xd0/0x13c [ 52.642376] c0 [] vfs_write+0xb8/0x214 [ 52.642381] c0 [] SyS_write+0x54/0xb0 [ 52.642388] c0 [] el0_svc_naked+0x24/0x28 [ 52.642395] c0 -> #0 (udc_lock){+.+.+.}: [ 52.642401] c0 [] print_circular_bug+0x84/0x2e4 [ 52.642407] c0 [] __lock_acquire+0x2138/0x2238 [ 52.642412] c0 [] lock_acquire+0xe4/0x298 [ 52.642420] c0 [] mutex_lock_nested+0x7c/0x3cc [ 52.642427] c0 [] usb_gadget_unregister_driver+0x3c/0xc8 [ 52.642432] c0 [] unregister_gadget_item+0x28/0x44 [ 52.642439] c0 [] ffs_data_clear+0x138/0x140 [ 52.642444] c0 [] ffs_data_reset+0x20/0x6c [ 52.642450] c0 [] ffs_data_closed+0xac/0x12c [ 52.642454] c0 [] ffs_ep0_release+0x20/0x2c [ 52.642460] c0 [] __fput+0xb0/0x1f4 [ 52.642466] c0 [] ____fput+0x20/0x2c [ 52.642473] c0 [] task_work_run+0xb4/0xe8 [ 52.642482] c0 [] do_exit+0x360/0xb9c [ 52.642487] c0 [] do_group_exit+0x4c/0xb0 [ 52.642494] c0 [] get_signal+0x380/0x89c [ 52.642501] c0 [] do_signal+0x154/0x518 [ 52.642507] c0 [] do_notify_resume+0x70/0x78 [ 52.642512] c0 [] work_pending+0x1c/0x20 [ 52.642514] c1 other info that might help us debug this: [ 52.642517] c1 Possible unsafe locking scenario: [ 52.642518] c1 CPU0 CPU1 [ 52.642520] c1 ---- ---- [ 52.642525] c0 lock(ffs_lock); [ 52.642529] c0 lock(udc_lock); [ 52.642533] c0 lock(ffs_lock); [ 52.642537] c0 lock(udc_lock); [ 52.642539] c1 *** DEADLOCK *** [ 52.642543] c1 1 lock held by usb ffs open/2808: [ 52.642555] c0 #0: (ffs_lock){+.+.+.}, at: [] ffs_data_clear+0x30/0x140 [ 52.642557] c1 stack backtrace: [ 52.642563] c1 CPU: 1 PID: 2808 Comm: usb ffs open Tainted: G [ 52.642565] c1 Hardware name: Spreadtrum SP9860g Board (DT) [ 52.642568] c1 Call trace: [ 52.642573] c1 [] dump_backtrace+0x0/0x170 [ 52.642577] c1 [] show_stack+0x20/0x28 [ 52.642583] c1 [] dump_stack+0xa8/0xe0 [ 52.642587] c1 [] print_circular_bug+0x1fc/0x2e4 [ 52.642591] c1 [] __lock_acquire+0x2138/0x2238 [ 52.642595] c1 [] lock_acquire+0xe4/0x298 [ 52.642599] c1 [] mutex_lock_nested+0x7c/0x3cc [ 52.642604] c1 [] usb_gadget_unregister_driver+0x3c/0xc8 [ 52.642608] c1 [] unregister_gadget_item+0x28/0x44 [ 52.642613] c1 [] ffs_data_clear+0x138/0x140 [ 52.642618] c1 [] ffs_data_reset+0x20/0x6c [ 52.642621] c1 [] ffs_data_closed+0xac/0x12c [ 52.642625] c1 [] ffs_ep0_release+0x20/0x2c [ 52.642629] c1 [] __fput+0xb0/0x1f4 [ 52.642633] c1 [] ____fput+0x20/0x2c [ 52.642636] c1 [] task_work_run+0xb4/0xe8 [ 52.642640] c1 [] do_exit+0x360/0xb9c [ 52.642644] c1 [] do_group_exit+0x4c/0xb0 [ 52.642647] c1 [] get_signal+0x380/0x89c [ 52.642651] c1 [] do_signal+0x154/0x518 [ 52.642656] c1 [] do_notify_resume+0x70/0x78 [ 52.642659] c1 [] work_pending+0x1c/0x20 Acked-by: Michal Nazarewicz Signed-off-by: Baolin Wang Signed-off-by: Felipe Balbi Cc: Jerry Zhang Signed-off-by: Greg Kroah-Hartman commit ed96148d7f8e900b61d90de79bf3b273887ffa70 Author: Baoquan He Date: Thu May 4 10:25:47 2017 +0800 x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds() commit fc5f9d5f151c9fff21d3d1d2907b888a5aec3ff7 upstream. Jeff Moyer reported that on his system with two memory regions 0~64G and 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling KASLR will make the system hang intermittently during boot. While adding 'nokaslr' won't. The back trace is: Oops: 0000 [#1] SMP RIP: memcpy_erms() [ .... ] Call Trace: pmem_rw_page() bdev_read_page() do_mpage_readpage() mpage_readpages() blkdev_readpages() __do_page_cache_readahead() force_page_cache_readahead() page_cache_sync_readahead() generic_file_read_iter() blkdev_read_iter() __vfs_read() vfs_read() SyS_read() entry_SYSCALL_64_fastpath() This crash happens because the for loop count calculation in sync_global_pgds() is not correct. When a mapping area crosses PGD entries, we should calculate the starting address of region which next PGD covers and assign it to next for loop count, but not add PGDIR_SIZE directly. The old code works right only if the mapping area is an exact multiple of PGDIR_SIZE, otherwize the end region could be skipped so that it can't be synchronized to all other processes from kernel PGD init_mm.pgd. In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it makes this area be mapped inside one PGD entry. With KASLR enabled, this area could cross two PGD entries, then the next PGD entry won't be synced to all other processes. That is why we saw empty PGD. Fix it. Reported-by: Jeff Moyer Signed-off-by: Baoquan He Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dan Williams Cc: Dave Hansen Cc: Dave Young Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jinbum Park Cc: Josh Poimboeuf Cc: Kees Cook Cc: Kirill A. Shutemov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Garnier Cc: Thomas Gleixner Cc: Yasuaki Ishimatsu Cc: Yinghai Lu Link: http://lkml.kernel.org/r/1493864747-8506-1-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman commit 1c0fa383b3391f1e5528b264bd9b3ca9209054cf Author: Vallish Vaidyeshwara Date: Fri Jun 23 18:53:06 2017 +0000 dm thin: do not queue freed thin mapping for next stage processing commit 00a0ea33b495ee6149bf5a77ac5807ce87323abb upstream. process_prepared_discard_passdown_pt1() should cleanup dm_thin_new_mapping in cases of error. dm_pool_inc_data_range() can fail trying to get a block reference: metadata operation 'dm_pool_inc_data_range' failed: error = -61 When dm_pool_inc_data_range() fails, dm thin aborts current metadata transaction and marks pool as PM_READ_ONLY. Memory for thin mapping is released as well. However, current thin mapping will be queued onto next stage as part of queue_passdown_pt2() or passdown_endio(). This dangling thin mapping memory when processed and accessed in next stage will lead to device mapper crashing. Code flow without fix: -> process_prepared_discard_passdown_pt1(m) -> dm_thin_remove_range() -> discard passdown --> passdown_endio(m) queues m onto next stage -> dm_pool_inc_data_range() fails, frees memory m but does not remove it from next stage queue -> process_prepared_discard_passdown_pt2(m) -> processes freed memory m and crashes One such stack: Call Trace: [] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison] [] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool] [] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool] [] process_prepared+0x81/0xa0 [dm_thin_pool] [] do_worker+0xc5/0x820 [dm_thin_pool] [] ? __schedule+0x244/0x680 [] ? pwq_activate_delayed_work+0x42/0xb0 [] process_one_work+0x153/0x3f0 [] worker_thread+0x12b/0x4b0 [] ? rescuer_thread+0x350/0x350 [] kthread+0xca/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x25/0x30 The fix is to first take the block ref count for discarded block and then do a passdown discard of this block. If block ref count fails, then bail out aborting current metadata transaction, mark pool as PM_READ_ONLY and also free current thin mapping memory (existing error handling code) without queueing this thin mapping onto next stage of processing. If block ref count succeeds, then passdown discard of this block. Discard callback of passdown_endio() will queue this thin mapping onto next stage of processing. Code flow with fix: -> process_prepared_discard_passdown_pt1(m) -> dm_thin_remove_range() -> dm_pool_inc_data_range() --> if fails, free memory m and bail out -> discard passdown --> passdown_endio(m) queues m onto next stage Reviewed-by: Eduardo Valentin Reviewed-by: Cristian Gafton Reviewed-by: Anchal Agarwal Signed-off-by: Vallish Vaidyeshwara Reviewed-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 466877f2d25758e5ad007b292c1f225520f8c877 Author: Deepak Rawat Date: Mon Jun 26 14:39:08 2017 +0200 drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr commit 82fcee526ba8ca2c5d378bdf51b21b7eb058fe3a upstream. The hash table created during vmw_cmdbuf_res_man_create was never freed. This causes memory leak in context creation. Added the corresponding drm_ht_remove in vmw_cmdbuf_res_man_destroy. Tested for memory leak by running piglit overnight and kernel memory is not inflated which earlier was. Signed-off-by: Deepak Rawat Reviewed-by: Sinclair Yeh Signed-off-by: Thomas Hellstrom Signed-off-by: Greg Kroah-Hartman commit 78c4244f8bdbf3cefa1e01bfcfe7a53bcc45c0f3 Author: Bartosz Golaszewski Date: Fri Jun 23 13:45:16 2017 +0200 gpiolib: fix filtering out unwanted events commit ad537b822577fcc143325786cd6ad50d7b9df31c upstream. GPIOEVENT_REQUEST_BOTH_EDGES is not a single flag, but a binary OR of GPIOEVENT_REQUEST_RISING_EDGE and GPIOEVENT_REQUEST_FALLING_EDGE. The expression 'le->eflags & GPIOEVENT_REQUEST_BOTH_EDGES' we'll get evaluated to true even if only one event type was requested. Fix it by checking both RISING & FALLING flags explicitly. Fixes: 61f922db7221 ("gpio: userspace ABI for reading GPIO line events") Signed-off-by: Bartosz Golaszewski Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit cb2c6fdf620f4802c31d6577ff34391fdd949cc6 Author: Trond Myklebust Date: Tue Jun 27 17:33:38 2017 -0400 NFSv4.1: Fix a race in nfs4_proc_layoutget commit bd171930e6a3de4f5cffdafbb944e50093dfb59b upstream. If the task calling layoutget is signalled, then it is possible for the calls to nfs4_sequence_free_slot() and nfs4_layoutget_prepare() to race, in which case we leak a slot. The fix is to move the call to nfs4_sequence_free_slot() into the nfs4_layoutget_release() so that it gets called at task teardown time. Fixes: 2e80dbe7ac51 ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...") Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 7d0e27fe24c55dda16ad579db5a0234b3ff97770 Author: Hui Wang Date: Wed Jun 28 08:59:16 2017 +0800 ALSA: hda - set input_path bitmap to zero after moving it to new place commit a8f20fd25bdce81a8e41767c39f456d346b63427 upstream. Recently we met a problem, the codec has valid adcs and input pins, and they can form valid input paths, but the driver does not build valid controls for them like "Mic boost", "Capture Volume" and "Capture Switch". Through debugging, I found the driver needs to shrink the invalid adcs and input paths for this machine, so it will move the whole column bitmap value to the previous column, after moving it, the driver forgets to set the original column bitmap value to zero, as a result, the driver will invalidate the path whose index value is the original colume bitmap value. After executing this function, all valid input paths are invalidated by a mistake, there are no any valid input paths, so the driver won't build controls for them. Fixes: 3a65bcdc577a ("ALSA: hda - Fix inconsistent input_paths after ADC reduction") Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 093750c3dec46a1d440098341e531e3e7c17a96d Author: Takashi Iwai Date: Wed Jun 28 12:02:02 2017 +0200 ALSA: hda - Fix endless loop of codec configure commit d94815f917da770d42c377786dc428f542e38f71 upstream. azx_codec_configure() loops over the codecs found on the given controller via a linked list. The code used to work in the past, but in the current version, this may lead to an endless loop when a codec binding returns an error. The culprit is that the snd_hda_codec_configure() unregisters the device upon error, and this eventually deletes the given codec object from the bus. Since the list is initialized via list_del_init(), the next object points to the same device itself. This behavior change was introduced at splitting the HD-audio code code, and forgotten to adapt it here. For fixing this bug, just use a *_safe() version of list iteration. Fixes: d068ebc25e6e ("ALSA: hda - Move some codes up to hdac_bus struct") Reported-by: Daniel Vetter Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit dad3135e762bdb66318fe5ab902db5c4fbb1ad2f Author: Paul Burton Date: Fri Mar 3 15:26:05 2017 -0800 MIPS: Fix IRQ tracing & lockdep when rescheduling commit d8550860d910c6b7b70f830f59003b33daaa52c9 upstream. When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler from arch/mips/kernel/entry.S we disable interrupts. This is true regardless of whether we reach work_resched from syscall_exit_work, resume_userspace or by looping after calling schedule(). Although we disable interrupts in these paths we don't call trace_hardirqs_off() before calling into C code which may acquire locks, and we therefore leave lockdep with an inconsistent view of whether interrupts are disabled or not when CONFIG_PROVE_LOCKING & CONFIG_DEBUG_LOCKDEP are both enabled. Without tracing this interrupt state lockdep will print warnings such as the following once a task returns from a syscall via syscall_exit_partial with TIF_NEED_RESCHED set: [ 49.927678] ------------[ cut here ]------------ [ 49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8 [ 49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) [ 49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197 [ 49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4 [ 49.974431] 0000000000000000 0000000000000000 0000000000000000 000000000000004a [ 49.985300] ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8 [ 49.996194] 0000000000000001 0000000000000000 0000000000000000 0000000077c8030c [ 50.007063] 000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88 [ 50.017945] 0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498 [ 50.028827] 0000000000000000 0000000000000001 0000000000000000 0000000000000000 [ 50.039688] 0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc [ 50.050575] 00000000140084e0 0000000000000000 0000000000000000 0000000000040a00 [ 50.061448] 0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc [ 50.072327] ... [ 50.076087] Call Trace: [ 50.079869] [] show_stack+0x80/0xa8 [ 50.086577] [] dump_stack+0x10c/0x190 [ 50.093498] [] __warn+0xf0/0x108 [ 50.099889] [] warn_slowpath_fmt+0x3c/0x48 [ 50.107241] [] check_flags.part.41+0x1dc/0x1e8 [ 50.114961] [] lock_is_held_type+0x8c/0xb0 [ 50.122291] [] __schedule+0x8c0/0x10f8 [ 50.129221] [] schedule+0x30/0x98 [ 50.135659] [] work_resched+0x8/0x34 [ 50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]--- [ 50.148405] possible reason: unannotated irqs-off. [ 50.154600] irq event stamp: 400463 [ 50.159566] hardirqs last enabled at (400463): [] _raw_spin_unlock_irqrestore+0x40/0xa8 [ 50.171981] hardirqs last disabled at (400462): [] _raw_spin_lock_irqsave+0x30/0xb0 [ 50.183897] softirqs last enabled at (400450): [] __do_softirq+0x4ac/0x6a8 [ 50.195015] softirqs last disabled at (400425): [] irq_exit+0x110/0x128 Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off() when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking schedule() following the work_resched label because: 1) Interrupts are disabled regardless of the path we take to reach work_resched() & schedule(). 2) Performing the tracing here avoids the need to do it in paths which disable interrupts but don't call out to C code before hitting a path which uses the RESTORE_SOME macro that will call trace_hardirqs_on() or trace_hardirqs_off() as appropriate. We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling syscall_trace_leave() for similar reasons, ensuring that lockdep has a consistent view of state after we re-enable interrupts. Signed-off-by: Paul Burton Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15385/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman commit e9e24faf823e58713115974ab50102319c33a34d Author: Paul Burton Date: Thu Mar 2 14:02:40 2017 -0800 MIPS: pm-cps: Drop manual cache-line alignment of ready_count commit 161c51ccb7a6faf45ffe09aa5cf1ad85ccdad503 upstream. We allocate memory for a ready_count variable per-CPU, which is accessed via a cached non-coherent TLB mapping to perform synchronisation between threads within the core using LL/SC instructions. In order to ensure that the variable is contained within its own data cache line we allocate 2 lines worth of memory & align the resulting pointer to a line boundary. This is however unnecessary, since kmalloc is guaranteed to return memory which is at least cache-line aligned (see ARCH_DMA_MINALIGN). Stop the redundant manual alignment. Besides cleaning up the code & avoiding needless work, this has the side effect of avoiding an arithmetic error found by Bryan on 64 bit systems due to the 32 bit size of the former dlinesz. This led the ready_count variable to have its upper 32b cleared erroneously for MIPS64 kernels, causing problems when ready_count was later used on MIPS64 via cpuidle. Signed-off-by: Paul Burton Fixes: 3179d37ee1ed ("MIPS: pm-cps: add PM state entry code for CPS systems") Reported-by: Bryan O'Donoghue Reviewed-by: Bryan O'Donoghue Tested-by: Bryan O'Donoghue Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15383/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman commit f7d3d40ea1242f633bbf093b63181acff2da319e Author: James Hogan Date: Thu Jun 29 15:05:04 2017 +0100 MIPS: Avoid accidental raw backtrace commit 854236363370995a609a10b03e35fd3dc5e9e4a1 upstream. Since commit 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with usermode") show_backtrace() invokes the raw backtracer when cp0_status & ST0_KSU indicates user mode to fix issues on EVA kernels where user and kernel address spaces overlap. However this is used by show_stack() which creates its own pt_regs on the stack and leaves cp0_status uninitialised in most of the code paths. This results in the non deterministic use of the raw back tracer depending on the previous stack content. show_stack() deals exclusively with kernel mode stacks anyway, so explicitly initialise regs.cp0_status to KSU_KERNEL (i.e. 0) to ensure we get a useful backtrace. Fixes: 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with usermode") Signed-off-by: James Hogan Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16656/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman commit 3d4ac49a9538c36d9c3c121ddbcb4c3958dee5e9 Author: Karl Beldan Date: Tue Jun 27 19:22:16 2017 +0000 MIPS: head: Reorder instructions missing a delay slot commit 25d8b92e0af75d72ce8b99e63e5a449cc0888efa upstream. In this sequence the 'move' is assumed in the delay slot of the 'beq', but head.S is in reorder mode and the former gets pushed one 'nop' farther by the assembler. The corrected behavior made booting with an UHI supplied dtb erratic. Fixes: 15f37e158892 ("MIPS: store the appended dtb address in a variable") Signed-off-by: Karl Beldan Reviewed-by: James Hogan Cc: Jonas Gorski Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/16614/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman commit b1355226a64e6301ca63aee1e78728887e3527f1 Author: David Rientjes Date: Fri Apr 7 16:05:00 2017 -0700 mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() commit 460bcec84e11c75122ace5976214abbc596eb91b upstream. We got need_resched() warnings in swap_cgroup_swapoff() because swap_cgroup_ctrl[type].length is particularly large. Reschedule when needed. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704061315270.80559@chino.kir.corp.google.com Signed-off-by: David Rientjes Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Vladimir Davydov Cc: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman commit dbc808362b6cb2124f36b14ce354abcc64b6f1bb Author: Russell Currey Date: Fri Feb 17 14:33:01 2017 +1100 drm/ast: Handle configuration without P2A bridge commit 71f677a91046599ece96ebab21df956ce909c456 upstream. The ast driver configures a window to enable access into BMC memory space in order to read some configuration registers. If this window is disabled, which it can be from the BMC side, the ast driver can't function. Closing this window is a necessity for security if a machine's host side and BMC side are controlled by different parties; i.e. a cloud provider offering machines "bare metal". A recent patch went in to try to check if that window is open but it does so by trying to access the registers in question and testing if the result is 0xffffffff. This method will trigger a PCIe error when the window is closed which on some systems will be fatal (it will trigger an EEH for example on POWER which will take out the device). This patch improves this in two ways: - First, if the firmware has put properties in the device-tree containing the relevant configuration information, we use these. - Otherwise, a bit in one of the SCU scratch registers (which are readable via the VGA register space and writeable by the BMC) will indicate if the BMC has closed the window. This bit has been defined by Y.C Chen from Aspeed. If the window is closed and the configuration isn't available from the device-tree, some sane defaults are used. Those defaults are hopefully sufficient for standard video modes used on a server. Signed-off-by: Russell Currey Acked-by: Joel Stanley Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Dave Airlie Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman commit 8dc9f9dede5b92658a1bb32866e11905933d2b48 Author: Juergen Gross Date: Thu May 18 17:28:49 2017 +0200 xen/blkback: don't use xen_blkif_get() in xen-blkback kthread commit a24fa22ce22ae302b3bf8f7008896d52d5d57b8d upstream. There is no need to use xen_blkif_get()/xen_blkif_put() in the kthread of xen-blkback. Thread stopping is synchronous and using the blkif reference counting in the kthread will avoid to ever let the reference count drop to zero at the end of an I/O running concurrent to disconnecting and multiple rings. Setting ring->xenblkd to NULL after stopping the kthread isn't needed as the kthread does this already. Signed-off-by: Juergen Gross Tested-by: Steven Haigh Acked-by: Roger Pau Monné Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 4ebe28d23d35df2e69542c0146a74d21834ef235 Author: Kinglong Mee Date: Thu Apr 27 11:13:38 2017 +0800 NFSv4.x/callback: Create the callback service through svc_create_pooled commit df807fffaabde625fa9adb82e3e5b88cdaa5709a upstream. As the comments for svc_set_num_threads() said, " Destroying threads relies on the service threads filling in rqstp->rq_task, which only the nfs ones do. Assumes the serv has been created using svc_create_pooled()." If creating service through svc_create(), the svc_pool_map_put() will be called in svc_destroy(), but the pool map isn't used. So that, the reference of pool map will be drop, the next using of pool map will get a zero npools. [ 137.992130] divide error: 0000 [#1] SMP [ 137.992148] Modules linked in: nfsd(E) nfsv4 nfs fscache fuse tun bridge stp llc ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event vmw_balloon coretemp crct10dif_pclmul crc32_pclmul ppdev ghash_clmulni_intel intel_rapl_perf joydev snd_ens1371 gameport snd_ac97_codec ac97_bus snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore parport_pc parport nfit acpi_cpufreq tpm_tis tpm_tis_core tpm vmw_vmci i2c_piix4 shpchp auth_rpcgss nfs_acl lockd(E) grace sunrpc(E) xfs libcrc32c vmwgfx drm_kms_helper ttm crc32c_intel drm e1000 mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd] [ 137.992336] CPU: 0 PID: 4514 Comm: rpc.nfsd Tainted: G E 4.11.0-rc8+ #536 [ 137.992777] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 137.993757] task: ffff955984101d00 task.stack: ffff9873c2604000 [ 137.994231] RIP: 0010:svc_pool_for_cpu+0x2b/0x80 [sunrpc] [ 137.994768] RSP: 0018:ffff9873c2607c18 EFLAGS: 00010246 [ 137.995227] RAX: 0000000000000000 RBX: ffff95598376f000 RCX: 0000000000000002 [ 137.995673] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9559944aec00 [ 137.996156] RBP: ffff9873c2607c18 R08: ffff9559944aec28 R09: 0000000000000000 [ 137.996609] R10: 0000000001080002 R11: 0000000000000000 R12: ffff95598376f010 [ 137.997063] R13: ffff95598376f018 R14: ffff9559944aec28 R15: ffff9559944aec00 [ 137.997584] FS: 00007f755529eb40(0000) GS:ffff9559bb600000(0000) knlGS:0000000000000000 [ 137.998048] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 137.998548] CR2: 000055f3aecd9660 CR3: 0000000084290000 CR4: 00000000001406f0 [ 137.999052] Call Trace: [ 137.999517] svc_xprt_do_enqueue+0xef/0x260 [sunrpc] [ 138.000028] svc_xprt_received+0x47/0x90 [sunrpc] [ 138.000487] svc_add_new_perm_xprt+0x76/0x90 [sunrpc] [ 138.000981] svc_addsock+0x14b/0x200 [sunrpc] [ 138.001424] ? recalc_sigpending+0x1b/0x50 [ 138.001860] ? __getnstimeofday64+0x41/0xd0 [ 138.002346] ? do_gettimeofday+0x29/0x90 [ 138.002779] write_ports+0x255/0x2c0 [nfsd] [ 138.003202] ? _copy_from_user+0x4e/0x80 [ 138.003676] ? write_recoverydir+0x100/0x100 [nfsd] [ 138.004098] nfsctl_transaction_write+0x48/0x80 [nfsd] [ 138.004544] __vfs_write+0x37/0x160 [ 138.004982] ? selinux_file_permission+0xd7/0x110 [ 138.005401] ? security_file_permission+0x3b/0xc0 [ 138.005865] vfs_write+0xb5/0x1a0 [ 138.006267] SyS_write+0x55/0xc0 [ 138.006654] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 138.007071] RIP: 0033:0x7f7554b9dc30 [ 138.007437] RSP: 002b:00007ffc9f92c788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 138.007807] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f7554b9dc30 [ 138.008168] RDX: 0000000000000002 RSI: 00005640cd536640 RDI: 0000000000000003 [ 138.008573] RBP: 00007ffc9f92c780 R08: 0000000000000001 R09: 0000000000000002 [ 138.008918] R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000000004 [ 138.009254] R13: 00005640cdbf77a0 R14: 00005640cdbf7720 R15: 00007ffc9f92c238 [ 138.009610] Code: 0f 1f 44 00 00 48 8b 87 98 00 00 00 55 48 89 e5 48 83 78 08 00 74 10 8b 05 07 42 02 00 83 f8 01 74 40 83 f8 02 74 19 31 c0 31 d2 b7 88 00 00 00 5d 89 d0 48 c1 e0 07 48 03 87 90 00 00 00 c3 [ 138.010664] RIP: svc_pool_for_cpu+0x2b/0x80 [sunrpc] RSP: ffff9873c2607c18 [ 138.011061] ---[ end trace b3468224cafa7d11 ]--- Signed-off-by: Kinglong Mee Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit 955f270b6f5d7d830188de1f05f055180a8712dc Author: Kinglong Mee Date: Mon Mar 6 22:29:14 2017 +0800 NFSv4: fix a reference leak caused WARNING messages commit 366a1569bff3fe14abfdf9285e31e05e091745f5 upstream. Because nfs4_opendata_access() has close the state when access is denied, so the state isn't leak. Rather than revert the commit a974deee47, I'd like clean the strange state close. [ 1615.094218] ------------[ cut here ]------------ [ 1615.094607] WARNING: CPU: 0 PID: 23702 at lib/list_debug.c:31 __list_add_valid+0x8e/0xa0 [ 1615.094913] list_add double add: new=ffff9d7901d9f608, prev=ffff9d7901d9f608, next=ffff9d7901ee8dd0. [ 1615.095458] Modules linked in: nfsv4(E) nfs(E) nfsd(E) tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock f2fs snd_seq_midi snd_seq_midi_event fscrypto coretemp ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf vmw_balloon snd_ens1371 joydev gameport snd_ac97_codec ac97_bus snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore nfit parport_pc parport acpi_cpufreq tpm_tis tpm_tis_core tpm i2c_piix4 vmw_vmci shpchp auth_rpcgss nfs_acl lockd(E) grace sunrpc(E) xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel mptspi e1000 serio_raw scsi_transport_spi mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: nfs] [ 1615.097663] CPU: 0 PID: 23702 Comm: fstest Tainted: G W E 4.11.0-rc1+ #517 [ 1615.098015] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 1615.098807] Call Trace: [ 1615.099183] dump_stack+0x63/0x86 [ 1615.099578] __warn+0xcb/0xf0 [ 1615.099967] warn_slowpath_fmt+0x5f/0x80 [ 1615.100370] __list_add_valid+0x8e/0xa0 [ 1615.100760] nfs4_put_state_owner+0x75/0xc0 [nfsv4] [ 1615.101136] __nfs4_close+0x109/0x140 [nfsv4] [ 1615.101524] nfs4_close_state+0x15/0x20 [nfsv4] [ 1615.101949] nfs4_close_context+0x21/0x30 [nfsv4] [ 1615.102691] __put_nfs_open_context+0xb8/0x110 [nfs] [ 1615.103155] put_nfs_open_context+0x10/0x20 [nfs] [ 1615.103586] nfs4_file_open+0x13b/0x260 [nfsv4] [ 1615.103978] do_dentry_open+0x20a/0x2f0 [ 1615.104369] ? nfs4_copy_file_range+0x30/0x30 [nfsv4] [ 1615.104739] vfs_open+0x4c/0x70 [ 1615.105106] ? may_open+0x5a/0x100 [ 1615.105469] path_openat+0x623/0x1420 [ 1615.105823] do_filp_open+0x91/0x100 [ 1615.106174] ? __alloc_fd+0x3f/0x170 [ 1615.106568] do_sys_open+0x130/0x220 [ 1615.106920] ? __put_cred+0x3d/0x50 [ 1615.107256] SyS_open+0x1e/0x20 [ 1615.107588] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 1615.107922] RIP: 0033:0x7fab599069b0 [ 1615.108247] RSP: 002b:00007ffcf0600d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 [ 1615.108575] RAX: ffffffffffffffda RBX: 00007fab59bcfae0 RCX: 00007fab599069b0 [ 1615.108896] RDX: 0000000000000200 RSI: 0000000000000200 RDI: 00007ffcf060255e [ 1615.109211] RBP: 0000000000040010 R08: 0000000000000000 R09: 0000000000000016 [ 1615.109515] R10: 00000000000006a1 R11: 0000000000000246 R12: 0000000000041000 [ 1615.109806] R13: 0000000000040010 R14: 0000000000001000 R15: 0000000000002710 [ 1615.110152] ---[ end trace 96ed63b1306bf2f3 ]--- Fixes: a974deee47 ("NFSv4: Fix memory and state leak in...") Signed-off-by: Kinglong Mee Signed-off-by: Anna Schumaker Cc: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit b89bd0c715c148ea3cfef6b250482a77225573b5 Author: Eric Leblond Date: Thu May 11 18:56:38 2017 +0200 netfilter: synproxy: fix conntrackd interaction commit 87e94dbc210a720a34be5c1174faee5c84be963e upstream. This patch fixes the creation of connection tracking entry from netlink when synproxy is used. It was missing the addition of the synproxy extension. This was causing kernel crashes when a conntrack entry created by conntrackd was used after the switch of traffic from active node to the passive node. Signed-off-by: Eric Leblond Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit ced7689be60ddcac4b1746212c547e8817c5ae5e Author: Eric Dumazet Date: Mon Apr 3 10:55:11 2017 -0700 netfilter: xt_TCPMSS: add more sanity tests on tcph->doff commit 2638fd0f92d4397884fd991d8f4925cb3f081901 upstream. Denys provided an awesome KASAN report pointing to an use after free in xt_TCPMSS I have provided three patches to fix this issue, either in xt_TCPMSS or in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible impact. Signed-off-by: Eric Dumazet Reported-by: Denys Fedoryshchenko Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 8e2316399b8faa87496886506f145ed988cf5c68 Author: Serhey Popovych Date: Tue Jun 20 14:35:23 2017 +0300 rtnetlink: add IFLA_GROUP to ifla_policy [ Upstream commit db833d40ad3263b2ee3b59a1ba168bb3cfed8137 ] Network interface groups support added while ago, however there is no IFLA_GROUP attribute description in policy and netlink message size calculations until now. Add IFLA_GROUP attribute to the policy. Fixes: cbda10fa97d7 ("net_device: add support for network device groups") Signed-off-by: Serhey Popovych Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b9ca9b0f551080aeb5adf7ab1b5f0c47c3e83f57 Author: Serhey Popovych Date: Tue Jun 20 13:29:25 2017 +0300 ipv6: Do not leak throw route references [ Upstream commit 07f615574f8ac499875b21c1142f26308234a92c ] While commit 73ba57bfae4a ("ipv6: fix backtracking for throw routes") does good job on error propagation to the fib_rules_lookup() in fib rules core framework that also corrects throw routes handling, it does not solve route reference leakage problem happened when we return -EAGAIN to the fib_rules_lookup() and leave routing table entry referenced in arg->result. If rule with matched throw route isn't last matched in the list we overwrite arg->result losing reference on throw route stored previously forever. We also partially revert commit ab997ad40839 ("ipv6: fix the incorrect return value of throw route") since we never return routing table entry with dst.error == -EAGAIN when CONFIG_IPV6_MULTIPLE_TABLES is on. Also there is no point to check for RTF_REJECT flag since it is always set throw route. Fixes: 73ba57bfae4a ("ipv6: fix backtracking for throw routes") Signed-off-by: Serhey Popovych Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e4089baa08c4a1fba87c19f8d018ecf032cab0b5 Author: Bert Kenward Date: Fri Jun 16 09:45:08 2017 +0100 sfc: provide dummy definitions of vswitch functions efx_probe_all() calls efx->type->vswitching_probe during probe. For SFC4000 (Falcon) NICs this function is not defined, leading to a BUG with the top of the call stack similar to: ? efx_pci_probe_main+0x29a/0x830 efx_pci_probe+0x7d3/0xe70 vswitching_restore and vswitching_remove also need to be defined. Fixed in mainline by: commit 5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver") Fixes: 6d8aaaf6f798 ("sfc: create VEB vswitch and vport above default firmware setup") Signed-off-by: Bert Kenward Signed-off-by: Greg Kroah-Hartman commit 08058c258afba77abf1fe6f4d327d3154a2bc336 Author: Gao Feng Date: Fri Jun 16 15:00:02 2017 +0800 net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev [ Upstream commit 9745e362add89432d2c951272a99b0a5fe4348a9 ] The register_vlan_device would invoke free_netdev directly, when register_vlan_dev failed. It would trigger the BUG_ON in free_netdev if the dev was already registered. In this case, the netdev would be freed in netdev_run_todo later. So add one condition check now. Only when dev is not registered, then free it directly. The following is the part coredump when netdev_upper_dev_link failed in register_vlan_dev. I removed the lines which are too long. [ 411.237457] ------------[ cut here ]------------ [ 411.237458] kernel BUG at net/core/dev.c:7998! [ 411.237484] invalid opcode: 0000 [#1] SMP [ 411.237705] [last unloaded: 8021q] [ 411.237718] CPU: 1 PID: 12845 Comm: vconfig Tainted: G E 4.12.0-rc5+ #6 [ 411.237737] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 411.237764] task: ffff9cbeb6685580 task.stack: ffffa7d2807d8000 [ 411.237782] RIP: 0010:free_netdev+0x116/0x120 [ 411.237794] RSP: 0018:ffffa7d2807dbdb0 EFLAGS: 00010297 [ 411.237808] RAX: 0000000000000002 RBX: ffff9cbeb6ba8fd8 RCX: 0000000000001878 [ 411.237826] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000000 [ 411.237844] RBP: ffffa7d2807dbdc8 R08: 0002986100029841 R09: 0002982100029801 [ 411.237861] R10: 0004000100029980 R11: 0004000100029980 R12: ffff9cbeb6ba9000 [ 411.238761] R13: ffff9cbeb6ba9060 R14: ffff9cbe60f1a000 R15: ffff9cbeb6ba9000 [ 411.239518] FS: 00007fb690d81700(0000) GS:ffff9cbebb640000(0000) knlGS:0000000000000000 [ 411.239949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 411.240454] CR2: 00007f7115624000 CR3: 0000000077cdf000 CR4: 00000000003406e0 [ 411.240936] Call Trace: [ 411.241462] vlan_ioctl_handler+0x3f1/0x400 [8021q] [ 411.241910] sock_ioctl+0x18b/0x2c0 [ 411.242394] do_vfs_ioctl+0xa1/0x5d0 [ 411.242853] ? sock_alloc_file+0xa6/0x130 [ 411.243465] SyS_ioctl+0x79/0x90 [ 411.243900] entry_SYSCALL_64_fastpath+0x1e/0xa9 [ 411.244425] RIP: 0033:0x7fb69089a357 [ 411.244863] RSP: 002b:00007ffcd04e0fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 411.245445] RAX: ffffffffffffffda RBX: 00007ffcd04e2884 RCX: 00007fb69089a357 [ 411.245903] RDX: 00007ffcd04e0fd0 RSI: 0000000000008983 RDI: 0000000000000003 [ 411.246527] RBP: 00007ffcd04e0fd0 R08: 0000000000000000 R09: 1999999999999999 [ 411.246976] R10: 000000000000053f R11: 0000000000000202 R12: 0000000000000004 [ 411.247414] R13: 00007ffcd04e1128 R14: 00007ffcd04e2888 R15: 0000000000000001 [ 411.249129] RIP: free_netdev+0x116/0x120 RSP: ffffa7d2807dbdb0 Signed-off-by: Gao Feng Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f1a0e7d172b01e258a8c0ca6c67d003fbac54f64 Author: Wei Wang Date: Fri Jun 16 10:46:37 2017 -0700 decnet: always not take dst->__refcnt when inserting dst into hash table [ Upstream commit 76371d2e3ad1f84426a30ebcd8c3b9b98f4c724f ] In the existing dn_route.c code, dn_route_output_slow() takes dst->__refcnt before calling dn_insert_route() while dn_route_input_slow() does not take dst->__refcnt before calling dn_insert_route(). This makes the whole routing code very buggy. In dn_dst_check_expire(), dnrt_free() is called when rt expires. This makes the routes inserted by dn_route_output_slow() not able to be freed as the refcnt is not released. In dn_dst_gc(), dnrt_drop() is called to release rt which could potentially cause the dst->__refcnt to be dropped to -1. In dn_run_flush(), dst_free() is called to release all the dst. Again, it makes the dst inserted by dn_route_output_slow() not able to be released and also, it does not wait on the rcu and could potentially cause crash in the path where other users still refer to this dst. This patch makes sure both input and output path do not take dst->__refcnt before calling dn_insert_route() and also makes sure dnrt_free()/dst_free() is called when removing dst from the hash table. The only difference between those 2 calls is that dnrt_free() waits on the rcu while dst_free() does not. Signed-off-by: Wei Wang Acked-by: Martin KaFai Lau Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c7d422d68fe98627ea9f60d06e38dc7f1af302b9 Author: Maor Dickman Date: Thu May 18 15:15:08 2017 +0300 net/mlx5e: Fix timestamping capabilities reporting [ Upstream commit f0b381178b01b831f9907d72f467d6443afdea67 ] Misuse of (BIT) macro caused to report wrong flags for "Hardware Transmit Timestamp Modes" and "Hardware Receive Filter Modes" Fixes: ef9814deafd0 ('net/mlx5e: Add HW timestamping (TS) support') Signed-off-by: Maor Dickman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 25ff35074e276b457f16c00f97afea41b6d5051d Author: Eli Cohen Date: Thu Jun 8 11:33:16 2017 -0500 net/mlx5: Wait for FW readiness before initializing command interface [ Upstream commit 6c780a0267b8a1075f40b39851132eeaefefcff5 ] Before attempting to initialize the command interface we must wait till the fw_initializing bit is clear. If we fail to meet this condition the hardware will drop our configuration, specifically the descriptors page address. This scenario can happen when the firmware is still executing an FLR flow and did not finish yet so the driver needs to wait for that to finish. Fixes: e3297246c2c8 ('net/mlx5_core: Wait for FW readiness on startup') Signed-off-by: Eli Cohen Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 176b9874a203ae170912b063999e2c00d56b9ee6 Author: Or Gerlitz Date: Thu Jun 15 20:08:32 2017 +0300 net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it [ Upstream commit 31ac93386d135a6c96de9c8bab406f5ccabf5a4d ] The error flow of mlx5e_create_netdev calls the cleanup call of the given profile without checking if it exists, fix that. Currently the VF reps don't register that callback and we crash if getting into error -- can be reproduced by the user doing ctrl^C while attempting to change the sriov mode from legacy to switchdev. Fixes: 26e59d8077a3 '(net/mlx5e: Implement mlx5e interface attach/detach callbacks') Signed-off-by: Or Gerlitz Reported-by: Sabrina Dubroca Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 4c246863e7b42eaecbaf90c319720bbf426b5958 Author: Xin Long Date: Thu Jun 15 17:49:08 2017 +0800 sctp: return next obj by passing pos + 1 into sctp_transport_get_idx [ Upstream commit 988c7322116970696211e902b468aefec95b6ec4 ] In sctp_for_each_transport, pos is used to save how many objs it has dumped. Now it gets the last obj by sctp_transport_get_idx, then gets the next obj by sctp_transport_get_next. The issue is that in the meanwhile if some objs in transport hashtable are removed and the objs nums are less than pos, sctp_transport_get_idx would return NULL and hti.walker.tbl is NULL as well. At this moment it should stop hti, instead of continue getting the next obj. Or it would cause a NULL pointer dereference in sctp_transport_get_next. This patch is to pass pos + 1 into sctp_transport_get_idx to get the next obj directly, even if pos > objs nums, it would return NULL and stop hti. Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc") Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fded2d74a3505f7daad70db4e8ffd87ceb366ecb Author: Xin Long Date: Thu Jun 15 16:33:58 2017 +0800 ipv6: fix calling in6_ifa_hold incorrectly for dad work [ Upstream commit f8a894b218138888542a5058d0e902378fd0d4ec ] Now when starting the dad work in addrconf_mod_dad_work, if the dad work is idle and queued, it needs to hold ifa. The problem is there's one gap in [1], during which if the pending dad work is removed elsewhere. It will miss to hold ifa, but the dad word is still idea and queue. if (!delayed_work_pending(&ifp->dad_work)) in6_ifa_hold(ifp); <--------------[1] mod_delayed_work(addrconf_wq, &ifp->dad_work, delay); An use-after-free issue can be caused by this. Chen Wei found this issue when WARN_ON(!hlist_unhashed(&ifp->addr_lst)) in net6_ifa_finish_destroy was hit because of it. As Hannes' suggestion, this patch is to fix it by holding ifa first in addrconf_mod_dad_work, then calling mod_delayed_work and putting ifa if the dad_work is already in queue. Note that this patch did not choose to fix it with: if (!mod_delayed_work(delay)) in6_ifa_hold(ifp); As with it, when delay == 0, dad_work would be scheduled immediately, all addrconf_mod_dad_work(0) callings had to be moved under ifp->lock. Reported-by: Wei Chen Suggested-by: Hannes Frederic Sowa Acked-by: Hannes Frederic Sowa Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cac2a9bb4034f2395bdbe1ad2bd3f29a470e14f0 Author: WANG Cong Date: Tue Jun 20 10:46:27 2017 -0700 igmp: add a missing spin_lock_init() [ Upstream commit b4846fc3c8559649277e3e4e6b5cec5348a8d208 ] Andrey reported a lockdep warning on non-initialized spinlock: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 4099 Comm: a.out Not tainted 4.12.0-rc6+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x292/0x395 lib/dump_stack.c:52 register_lock_class+0x717/0x1aa0 kernel/locking/lockdep.c:755 ? 0xffffffffa0000000 __lock_acquire+0x269/0x3690 kernel/locking/lockdep.c:3255 lock_acquire+0x22d/0x560 kernel/locking/lockdep.c:3855 __raw_spin_lock_bh ./include/linux/spinlock_api_smp.h:135 _raw_spin_lock_bh+0x36/0x50 kernel/locking/spinlock.c:175 spin_lock_bh ./include/linux/spinlock.h:304 ip_mc_clear_src+0x27/0x1e0 net/ipv4/igmp.c:2076 igmpv3_clear_delrec+0xee/0x4f0 net/ipv4/igmp.c:1194 ip_mc_destroy_dev+0x4e/0x190 net/ipv4/igmp.c:1736 We miss a spin_lock_init() in igmpv3_add_delrec(), probably because previously we never use it on this code path. Since we already unlink it from the global mc_tomb list, it is probably safe not to acquire this spinlock here. It does not harm to have it although, to avoid conditional locking. Fixes: c38b7d327aaf ("igmp: acquire pmc lock for ip_mc_clear_src()") Reported-by: Andrey Konovalov Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ecd6627f48bd2d8e0f85eee703b5b4609ed6f744 Author: WANG Cong Date: Mon Jun 12 09:52:26 2017 -0700 igmp: acquire pmc lock for ip_mc_clear_src() [ Upstream commit c38b7d327aafd1e3ad7ff53eefac990673b65667 ] Andrey reported a use-after-free in add_grec(): for (psf = *psf_list; psf; psf = psf_next) { ... psf_next = psf->sf_next; where the struct ip_sf_list's were already freed by: kfree+0xe8/0x2b0 mm/slub.c:3882 ip_mc_clear_src+0x69/0x1c0 net/ipv4/igmp.c:2078 ip_mc_dec_group+0x19a/0x470 net/ipv4/igmp.c:1618 ip_mc_drop_socket+0x145/0x230 net/ipv4/igmp.c:2609 inet_release+0x4e/0x1c0 net/ipv4/af_inet.c:411 sock_release+0x8d/0x1e0 net/socket.c:597 sock_close+0x16/0x20 net/socket.c:1072 This happens because we don't hold pmc->lock in ip_mc_clear_src() and a parallel mr_ifc_timer timer could jump in and access them. The RCU lock is there but it is merely for pmc itself, this spinlock could actually ensure we don't access them in parallel. Thanks to Eric and Long for discussion on this bug. Reported-by: Andrey Konovalov Cc: Eric Dumazet Cc: Xin Long Signed-off-by: Cong Wang Reviewed-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 059686754c1870f182ce55495b81728763732d48 Author: Christian Perle Date: Mon Jun 12 10:06:57 2017 +0200 proc: snmp6: Use correct type in memset [ Upstream commit 3500cd73dff48f28f4ba80c171c4c80034d40f76 ] Reading /proc/net/snmp6 yields bogus values on 32 bit kernels. Use "u64" instead of "unsigned long" in sizeof(). Fixes: 4a4857b1c81e ("proc: Reduce cache miss in snmp6_seq_show") Signed-off-by: Christian Perle Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 78b24ab695abafe4c5754a661a591b841661df8b Author: Tal Gilboa Date: Mon May 29 17:02:55 2017 +0300 net/mlx5e: Fix wrong indications in DIM due to counter wraparound [ Upstream commit 53acd76ce571e3b71f9205f2d49ab285a9f1aad8 ] DIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for changing the channel interrupt moderation values in order to reduce CPU overhead for all traffic types. Each iteration of the algorithm, DIM calculates the difference in throughput, packet rate and interrupt rate from last iteration in order to make a decision. DIM relies on counters for each metric. When these counters get to their type's max value they wraparound. In this case the delta between 'end' and 'start' samples is negative and when translated to unsigned integers - very high. This results in a false indication to the algorithm and might result in a wrong decision. The fix calculates the 'distance' between 'end' and 'start' samples in a cyclic way around the relevant type's max value. It can also be viewed as an absolute value around the type's max value instead of around 0. Testing show higher stability in DIM profile selection and no wraparound issues. Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Tal Gilboa Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 9854e58659908b4923d95b0fe3cd1db7ea62fe39 Author: Tal Gilboa Date: Mon May 15 14:13:16 2017 +0300 net/mlx5e: Added BW check for DIM decision mechanism [ Upstream commit c3164d2fc48fd4fa0477ab658b644559c3fe9073 ] DIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for changing the channel interrupt moderation values in order to reduce CPU overhead for all traffic types. Until now only interrupt and packet rate were sampled. We found a scenario on which we get a false indication since a change in DIM caused more aggregation and reduced packet rate while increasing BW. We now regard a change as succesfull iff: current_BW > (prev_BW + threshold) or current_BW ~= prev_BW and current_PR > (prev_PR + threshold) or current_BW ~= prev_BW and current_PR ~= prev_PR and current_IR < (prev_IR - threshold) Where BW = Bandwidth, PR = Packet rate and IR = Interrupt rate Improvements (ConnectX-4Lx 25GbE, single RX queue, LRO off) -------------------------------------------------- packet size | before[Mb/s] | after[Mb/s] | gain | 2B | 343.4 | 359.4 | 4.5% | 16B | 2739.7 | 2814.8 | 2.7% | 64B | 9739 | 10185.3 | 4.5% | Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Tal Gilboa Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 57360bc3c7a6fc9c7422e422508bf77166a05028 Author: Jia-Ju Bai Date: Sat Jun 10 17:03:35 2017 +0800 net: tipc: Fix a sleep-in-atomic bug in tipc_msg_reverse [ Upstream commit 343eba69c6968190d8654b857aea952fed9a6749 ] The kernel may sleep under a rcu read lock in tipc_msg_reverse, and the function call path is: tipc_l2_rcv_msg (acquire the lock by rcu_read_lock) tipc_rcv tipc_sk_rcv tipc_msg_reverse pskb_expand_head(GFP_KERNEL) --> may sleep tipc_node_broadcast tipc_node_xmit_skb tipc_node_xmit tipc_sk_rcv tipc_msg_reverse pskb_expand_head(GFP_KERNEL) --> may sleep To fix it, "GFP_KERNEL" is replaced with "GFP_ATOMIC". Signed-off-by: Jia-Ju Bai Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bb566ce3a60eded40ae4a3421a59c0f5f1c7ef20 Author: Jia-Ju Bai Date: Sat Jun 10 16:49:39 2017 +0800 net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx [ Upstream commit f146e872eb12ebbe92d8e583b2637e0741440db3 ] The kernel may sleep under a rcu read lock in cfpkt_create_pfx, and the function call path is: cfcnfg_linkup_rsp (acquire the lock by rcu_read_lock) cfctrl_linkdown_req cfpkt_create cfpkt_create_pfx alloc_skb(GFP_KERNEL) --> may sleep cfserl_receive (acquire the lock by rcu_read_lock) cfpkt_split cfpkt_create_pfx alloc_skb(GFP_KERNEL) --> may sleep There is "in_interrupt" in cfpkt_create_pfx to decide use "GFP_KERNEL" or "GFP_ATOMIC". In this situation, "GFP_KERNEL" is used because the function is called under a rcu read lock, instead in interrupt. To fix it, only "GFP_ATOMIC" is used in cfpkt_create_pfx. Signed-off-by: Jia-Ju Bai Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8cda426a7cfa61b902c4335d1d1ab945bbcb41b6 Author: Xin Long Date: Sat Jun 10 14:48:14 2017 +0800 sctp: disable BH in sctp_for_each_endpoint [ Upstream commit 581409dacc9176b0de1f6c4ca8d66e13aa8e1b29 ] Now sctp holds read_lock when foreach sctp_ep_hashtable without disabling BH. If CPU schedules to another thread A at this moment, the thread A may be trying to hold the write_lock with disabling BH. As BH is disabled and CPU cannot schedule back to the thread holding the read_lock, while the thread A keeps waiting for the read_lock. A dead lock would be triggered by this. This patch is to fix this dead lock by calling read_lock_bh instead to disable BH when holding the read_lock in sctp_for_each_endpoint. Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc") Reported-by: Xiumei Mu Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c6d4ff85722b25877af48b311eda944dcc8c6feb Author: Krister Johansen Date: Thu Jun 8 13:12:38 2017 -0700 Fix an intermittent pr_emerg warning about lo becoming free. [ Upstream commit f186ce61bb8235d80068c390dc2aad7ca427a4c2 ] It looks like this: Message from syslogd@flamingo at Apr 26 00:45:00 ... kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4 They seem to coincide with net namespace teardown. The message is emitted by netdev_wait_allrefs(). Forced a kdump in netdev_run_todo, but found that the refcount on the lo device was already 0 at the time we got to the panic. Used bcc to check the blocking in netdev_run_todo. The only places where we're off cpu there are in the rcu_barrier() and msleep() calls. That behavior is expected. The msleep time coincides with the amount of time we spend waiting for the refcount to reach zero; the rcu_barrier() wait times are not excessive. After looking through the list of callbacks that the netdevice notifiers invoke in this path, it appears that the dst_dev_event is the most interesting. The dst_ifdown path places a hold on the loopback_dev as part of releasing the dev associated with the original dst cache entry. Most of our notifier callbacks are straight-forward, but this one a) looks complex, and b) places a hold on the network interface in question. I constructed a new bcc script that watches various events in the liftime of a dst cache entry. Note that dst_ifdown will take a hold on the loopback device until the invalidated dst entry gets freed. [ __dst_free] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282115677036183 __dst_free rcu_nocb_kthread kthread ret_from_fork Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bb84290cd2967a5774a97fa44381713e20a7924c Author: Mateusz Jurczyk Date: Thu Jun 8 11:13:36 2017 +0200 af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers [ Upstream commit defbcf2decc903a28d8398aa477b6881e711e3ea ] Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in bind() and connect() handlers of the AF_UNIX socket. Since neither syscall enforces a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing .sa_family. Signed-off-by: Mateusz Jurczyk Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 386ed38f0f28b5dffe11c5665997882115fb788e Author: David Ahern Date: Thu Jun 8 11:31:11 2017 -0600 net: vrf: Make add_fib_rules per network namespace flag [ Upstream commit 097d3c9508dc58286344e4a22b300098cf0c1566 ] Commit 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create") adds the l3mdev FIB rule the first time a VRF device is created. However, it only creates the rule once and only in the namespace the first device is created - which may not be init_net. Fix by using the net_generic capability to make the add_fib_rules flag per network namespace. Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create") Reported-by: Petr Machata Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b5cc68e0c1905a3cb94677a4d1b3e03f65881231 Author: Mintz, Yuval Date: Wed Jun 7 21:00:33 2017 +0300 net: Zero ifla_vf_info in rtnl_fill_vfinfo() [ Upstream commit 0eed9cf58446b28b233388b7f224cbca268b6986 ] Some of the structure's fields are not initialized by the rtnetlink. If driver doesn't set those in ndo_get_vf_config(), they'd leak memory to user. Signed-off-by: Yuval Mintz CC: Michal Schmidt Reviewed-by: Greg Rose Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fd9b13e6c175b01d61f0f234502919c6c40e4dd2 Author: Mateusz Jurczyk Date: Wed Jun 7 16:14:29 2017 +0200 decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb [ Upstream commit dd0da17b209ed91f39872766634ca967c170ada1 ] Verify that the length of the socket buffer is sufficient to cover the nlmsghdr structure before accessing the nlh->nlmsg_len field for further input sanitization. If the client only supplies 1-3 bytes of data in sk_buff, then nlh->nlmsg_len remains partially uninitialized and contains leftover memory from the corresponding kernel allocation. Operating on such data may result in indeterminate evaluation of the nlmsg_len < sizeof(*nlh) expression. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. The patch prevents this and other similar tools (e.g. KMSAN) from flagging this behavior in the future. Signed-off-by: Mateusz Jurczyk Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d2f459e3feb0f73d2e95ab7892adcf22f21fe9ef Author: Alexander Potapenko Date: Tue Jun 6 15:56:54 2017 +0200 net: don't call strlen on non-terminated string in dev_set_alias() [ Upstream commit c28294b941232931fbd714099798eb7aa7e865d7 ] KMSAN reported a use of uninitialized memory in dev_set_alias(), which was caused by calling strlcpy() (which in turn called strlen()) on the user-supplied non-terminated string. Signed-off-by: Alexander Potapenko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 98184bbb8daea6af32208d63831e66023db4bb58 Author: Willem de Bruijn Date: Sat Feb 18 19:00:45 2017 -0500 ipv6: release dst on error in ip6_dst_lookup_tail commit 00ea1ceebe0d9f2dc1cc2b7bd575a00100c27869 upstream. If ip6_dst_lookup_tail has acquired a dst and fails the IPv4-mapped check, release the dst before returning an error. Fixes: ec5e3b0a1d41 ("ipv6: Inhibit IPv4-mapped src address on the wire.") Signed-off-by: Willem de Bruijn Acked-by: Eric Dumazet Signed-off-by: David S. Miller Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman